[nylug-talk] x86-64 for 1GiB+ programs (especially in VMs) -- WAS: LOWMEM vs. HIGHMEM performance advantage?
Alex Pilosov
alex at pilosoft.com
Tue Apr 22 02:34:51 EDT 2008
On Mon, 21 Apr 2008, Bryan J. Smith wrote:
> Top-post portion ...
>
> No matter how many times I revisit it, some people keep forgetting the
> clear, simple context I made my statement in at the very beginning of my
> post:
> [ From: http://nylug.org/pipermail/nylug-talk/2008-April/037689.html ]
>
> "If you have a program that eats more than 1GiB,
> you should be running an OS built for x86-64/IA-32e
> on AMD64/EM64T."
And I've debunked this.
a) If you have a program that uses >3GB you have no choice but to do that.
b) If you have a program that uses <3GB, you shouldn't.
In other words, "unless you absolutely *have* to, don't do it".
> I wasn't making a blanket statement that any x86-64 (aka IA-32e) program
> is faster, smaller or more optimized than x86 (aka IA-32), only programs
> that start eating more than 1GiB. Furthermore, I even latter re-pointed
> out that it is a major consideration to programs running under VMs like
> Java and Python ...
What the heck does it have to do about java and python?
fwiw, it is actually *possible* for java/python or other bytecoded
languages to do user-space >4GB addressing via segmentation. (I don't know
if there are VMs that *do* that but its theoretically possible. It is also
very possible that such VM might outperform "native" 64-bit VM due to lack
of 64bitness bloat)
> The original poster's issue. ;)
>
> I wasn't making a statement on the application considerations, but
> clearly system/OS considerations. I wasn't trying to make it a
> meta-discussion on general developer ideal/theoretical concepts (even
> though many presented were made upon assumptions that were dead wrong --
> especially lacking on the reality that 48-bit "far" pointers result,
> especially in VMs ;), but about how the processor page tables start to
> operate when a program is constantly eating up more than 1GiB.
And I've repeatedly pointed out its a red herring. There's nothing special
about 1GB barrier. 1GB is 28 bits. What's so special?
> The original poster's issue. ;)
>
> This thread has lead some people to utterly forgetting that context,
> making statements and taking views that vary quite differently from
> issues where far less memory is in usage, etc... Several people have
> varied from not merely what I not only address to start, but I have kept
> my focus on the entire time.
the original poster's issue is either
a) motherboard that doesn't set up MTTR right
b) some userland application stupidity (O(N^2) algorithm or somesuch)
c) some swap-related stupidity
d) I guarantee you it has nothing to do with page tables. But original
poster can't figure out how to diagnose and troubleshoot performance (not
even vmstat data, much less using oprofile) - so we end up conjecturing
about page tables and other crap.
> It is based on my experience, just as I'm sure yours has been. This is
> about system-level/OS, not theoretical assumptions that developers have.
> When you start hammering 1GiB+ of memory, your system is using "far"
> pointers, which results in little to no difference (if not an
> improvement, depending on the application/memory models in use) by going
> to x86-64. Why? Because attention to the processor page tables now
> becomes far more of an issue.
there is nothing. nothing. nothing. "far" about 1GB. Nothing. Nothing.
Argh. There is no need to attend to page tables.
> The original poster's issue. ;)
>
> Only benchmarks with the original poster's application will tell the
> tale. I have been involved with 1G/3G, 3G/1G, 4G/4G and late kernel
> 2.6.x PAE changes for i686, as well as x86-64, when it comes to
> benchmarking Java and Python applications in the financial industry
> since 2004. Time and time again, when I'm hitting over 1GiB of memory
> usage, there is no "silver bullet" in i686 memory models, and x86-64
> typically bests them all -- at least for the applications I have been
> executing.
If you've really done this, post the *(#@$ benchmarks.
I am not perfect. But logic above suggests that there's nothing special
about 1GB barrier, nor there's anything magically worse about PSE/PAE/etc
in real world. If there's something that is seriously unexpected, you
should say it.
*I admit there may be a corner case where if you benchmark
specifically *something* about page-table-related-behavior, accessing one
more level of page tables MIGHT be noticable, but HIGHLY unlikely. (and
probably easily worked around by using 2MB pages instead of 4kB pages,
resulting in net performance increase)
> Heck (now I'm going to sidetrack), the biggest "performance issue" with
> x86-64 seems to be good amount of legacy C codebase out there using
> "long" instead of the GNU GCC and LibC standard [u]int types.
> Furthermore, when it comes to actual register storage, especially in
> instruction cache, pointers are not 64-bit for x86-64 (let alone not
> when encoded into the opcodes), and there are "far" calls of 48-bit for
> both i686 at large heap sizes (especially in VMs). These are things you
> may not see when stepping in GDB (although some modes or alternatives,
> let alone when using a target board where you can see the "actual"
> processor registers and memory "raw"), but they are very much the case.
....
pointers...are...not...64...bit? WHAT?
registers...stored...in...instruction...cache? WHAT?
pointers...encoded...into...opcodes? WHAT?
not...seeing...actual...registers...or...raw...memory...in...gdb? WHAT?
far...calls...48...bit? WHAT?
I mean, seriously. Are you using a markov chain generator?
<snip>
> Please, look at that context, consider what I said, and understand this
> is about running a program, atop of a VM, eating up more than 1GiB. ;)
a) there's nothing special about 1GB barrier
b) there's nothing special about VM
c) see above about what's really poster's issue.
More information about the nylug-talk
mailing list