[nylug-talk] LOWMEM vs. HIGHMEM performance advantage?

Bryan J Smith b.j.smith at ieee.org
Fri Apr 18 21:16:40 EDT 2008


rom: Alex Pilosov <alex at pilosoft.com>
> will strongly disagree. 
>x86-64 (i will call it x64 and spare me) - is *bad* idea *unless* you
> *have* to. because pointers are 64-bit,

Are they now?  Wow! I guess I shouldn't have read those x86-64/IA-32e programmer manuals.

I fully expect an application-level developer to disagree. But anyone who has worked at the system level knows this fact ...

I686: 6 bytes (48-bit segmented)
X86-64: 6 bytes (48-bit flat)

When paging is the biggest factor in performance, what you speak is laughable.

> lots of structures are much larger
in size,
> so your binaries and data are much larger in size, so you blow
> l2/l3 cache much faster, and use up more memory for same work.
> in empirical tests, bloat is anywhere between 25% to 50% depending on application.

You're trying to apply a "from afar" assumption about ALU realities to a discussion on paging.
This is about paging, not what user-space application developers think what is going on.

> also, the border is not 1GB. Border is 3GB. 3GB apps work fine with 3/1. split.

Paging is still in operation beyond 1GB. Your user-space app may not have additional overhead itself, but the contact switches very much do.
If a lot of IPC is going on, and not just system-calls, forget it.

Context switching tends to be a major factor in performance. Especially as you are running a VM like for Java and Python.  It really adds up for than this side-assumption you have. Honestly.

Application assumption != to system reality.

> Kinda apples and oranges. X4 is for large number of cores (32 cores, 8
sockets),
> large amounts of ram (1TB). However, the X4 servers are xeon
7xxx series which are fail compared to 5xxx

IBM's X4 is creaming the reference-based competition by over 30% in our tests with 8-16 cores (2-4 sockets) with 16-32GiB.
You don't have to go multi-board to see it, Intel's platform sux without help.
Luckily IBM provides it, leverging Intel's superior ALU/SSE for a great majority of applications, while mitigating (or eliminating) the platform limitations.

> The above is excellent description "why this is complicated" :)

Yep. You only addressed user-space applications.

> But yes, there are issues with addressing >32-bit in PCI on _some_
> hardware platforms.
> I'd like to make a point that it is device and driver specific.

It's protection-generic.

> In fact, your higher-performance hardware is likely to already work fine
> without bounce buffers. (most scsi controllers and e1000 cards understand
> this).

And you're entrusting them to work responsibly.
They don't always. ;)

Enterprise Linux distros side with that concern for a reason.

> Yeah, AMD has hw bounce buffers. Which intel said
> are not important. Intel was right. Suck it, amd boy.

Depends on how much I/O you're doing. For home consumers, it's not. For web servers, it's not.

When your grpahics card has 2GiB of frame buffer, it's AMD-only.  Same deal when you have serious caching going on in an I/O capability.

If it's "not important," then why is Intel working on it? Why does IBM address some things in their chipset? 

> Suck it, amd boy.

My new servers are IBM X4 Intel designs, not AMD.

--  
Bryan J Smith - mailto:b.j.smith at ieee.org  
http://thebs413.blogspot.com  
Sent via BlackBerry from T-Mobile  
    


More information about the nylug-talk mailing list