[nylug-talk] LOWMEM vs. HIGHMEM performance advantage?

Bryan J. Smith b.j.smith at ieee.org
Fri Apr 18 19:31:28 EDT 2008


My simple answer (top post) ...

If you have a program that eats more than 1GiB, you should be running an
OS built for x86-64/IA-32e on AMD64/EM64T.

If you have a program that is constantly sucking up over 2GiB and is
very I/O bound, you should really consider a x86-64 implementation like
AMD64.  This is less of a consideration 2007+ though, as Intel's raw
performance and MCH in newer EM64T processors mitigates a great majority
of this (despite "bounce buffers" and the like).  I.e., it's not like it
was per-2007.  IBM's X4 architecture for Intel is even more impressive
in latency and throughput than Intel's reference as well (30%+ faster).

For computational-intensive applications, it's hard not to go Intel
these days, and AMD cannot compete -- with one exception.  If you need
the absolute in 64/128 bit precession, it's best to go AMD for its
3-issue (2-complex + 1-ADD/MULT) FPU.  Intel may be faster if you're
throwing a lot of SSE instructions (the majority of applications, and
100% of home consumer ones), but you must be willing to put up with the
precession issues of its dedicated SSE pipes as there is only a 2-issue
(1-complex _or_ 2-ADD) FPU.  Fine for a great number of apps, but not
for engineering, scientific or high precision operations (e.g., some
encryption operations).


The complex answer (several bottom posts) ...


jh wrote:
> I have an application that appears to slow down once it starts using 
> about 900 megs of memory.
> Interestingly, from dmesg:
> [42949372.960000] 1150MB HIGHMEM available.
> [42949372.960000] 896MB LOWMEM available.
> Is HIGHMEM slower, or are there circumstances or kernel configs such 
> that I could be seeing such a slowdown once memory usage goes above a 
> certain level?

"Paging" causes performance hits.  You should consider changing your
kernel memory model, although that really suggests x86-64 when you're
already at 1GiB.

There are many issues with various i686 36-bit Processor Address
Extension (PAE) paging memory models.  Some really cause issues above
4-8GiB, and definitely havoc at 16GiB+.  Others, like the 4G/4G
kernel/user(s) page model (which Red Hat used for 2.4 and earlier 2.6
releases) solve a lot of those, but caused an even bigger performance
hit.

For more on an introduction to kernel memory models, and related paging
considerations, see my post here -- although it won't go very deep into
the 1G/3G, 2G/2G, 3G/1G or 4G/4G models for 32-bit or 36-bit PAE:
"What is x86-64 'Long Mode' Memory Model" (from 2005 October):  
http://thebs413.blogspot.com/2005/10/what-is-x86-64-long-mode-memory-model.html  


Ruben Safir wrote:
> Why do we still have this ancient DOS ritual?

It's not specific to DOS.  It's just memory models.  Again, read my blog
post above.

A kernel must be compatible with multiple memory models.  That's why
there is a "hard" limitation at 48-bit/256TiB for x86-64.  The current
x86-64 (Intel calls the model IA-32"e" -- IA-32 Enhanced) uses its "Long
Mode" memory model.  The 16-bit segment + 32-bit offset register is used
"flat" for 48-bit addressing, not a segmented 32-bit register.  The
Processor itself uses a 4-level variation of Processor Address
Extensions, 52-bit, to be compatible with the prior, 3-level PAE used
for prior 36-bit (i686) paging.

Physically AMD x86-64 processors support 40-bit/1TiB, except the new
x3xx (Processor 10h) models which now do a full 48-bit/256TiB.  Intel
was still stuck at old 36-bit/64GiB, which we are now breaking.  So they
"hacked on" another 2-bits for 38-bit/256GiB, on the latest "G0"
stepping of their Xeon processors.  It's been a real issue for them
too,** although AMD has had their share with their NUMA complexity.


Ruben Safir wrote:
> Thanks Eric.  I almost know how that relates, but can you explain that
> in more detail?

It's actually a _number_ of factors, many not simple to explain at all.

Taking from the prior thread on "memory density," 9 times out of 10
concepts, the terms/phrases you hear are _dead_wrong_ engineering-wise.
E.g,. I've personally designed memory controllers myself (at the
transistor level).  There are at least 3 "simple" factors when it comes
to DIMMs -- technology of the individual ICs (in Mb/Gb), datapath width
of individual ICs and the total width that results of using a
combination of ICs.  The last is usually what causes the incompatibility
issues.

E.g., 128/144-bit (non-ECC/ECC), instead of 64/72-bit (non-ECC/ECC),
that requires a "buffer/register," but it can also be the use of 4-bit
v. 8-bit v. 16-bit v. 32-bit v. 64-bit datapath ICs too.  A number of
32/36-IC x 4-bit DIMMs were produced in the PC133 days that work with
ViA, but not Intel (except registered for i440BX/GX, but not
i810/815/845).  Likewise, there were some other 8-bit or 16-bit
combinations that wouldn't work with various memory controllers.

People have a tendency to oversimplify and get things _dead_wrong_.

Physical platform/interconnect (always a big limitation) v.
  processor addressing/paging registers (with legacy issues) v.
    OS/kernel memory models (what is supports) v.
      user-space program memory model (what is uses)

Those are 4 factors that are not easy to explain on their own, and even
more complex when you start talking about X + Y but not Z
combinations.  ;)


Ruben Safir wrote:
> Can you explain this in more detail, because my understand was
> that GAS assembler essential addresses all ram linearly.
> I'm not sure how these things affect the memory addressing.

You're _only_ looking at that last factor ...
  "user-space program memory model"

>From the standpoint of the majority of programs, it looks "flat."  The
reality is that has _nothing_ to do with how the kernel is paging at
all, much less how the processor is maintaining its page tables, its
more limiting TLB considerations, etc... and that's before we get to the
physical platform/interconnect.

So if you only understand the "programmer" level, "oh yeah, I have a
flat 32-bit/4GiB" (x86/IA-32) or "oh yeah, I have a flat
48-bit/256TiB" (x86-64/IA-32e), then you don't see it.  E.g., there is a
_huge_ reason why Intel calls its programmer model "IA-32e" and its
hardware implementation "EM64T".  ;)


Peter Norton wrote:  
> There is a 2^32 * 2*4 memory address locations that the CPU can
> address that regular PCI devices can't?

No and yes.

No, there is >32-bit addressing in PCI.

But yes, there are issues with addressing >32-bit in PCI on _some_
hardware platforms.

E.g., due to lack of hardware enforced I/O memory mapping, OSes on most
EM64T hardware platforms (Intel's IA-32e processors) _always_ "bounce
buffer" memory mapped I/O under 32-bit.  Yes, this includes when running
a processor in 52-bit PAE / 48-bit flat "Long Mode" (aka "x86-64"
64-bit).  That is a significant performance hit when something is I/O
bound.

AMD processors with their fully x86-64 command set (which is a superset
of IA-32"e" for EM64T) have an I/O MMU.  Unfortunately AMD's hardware
approach is far more complicated, and there are more errata as a result.
It's been a headache for AMD, much like it was most RISC/UNIX vendors
years ago, but they've made it through most of it.  At the same time,
Intel is just beginning it's journey.**  ;)

-- Bryan

**NOTE:  Intel's new designs are adopting some and running into some of
the same issues all over again, things AMD solved long ago.  I.e.,
TLB-related errata not only plagued AMD Processor 10h B2 stepping (new,
true 48-bit/256TiB physical addressing processors -- prior only being
40-bit/1TiB), but a very under-reported and major set of errata (and a
seemingly rolling set ;) on Intel Xeon 5000 series G0 steppings (new
38-bit/256GiB physical addressing processors -- prior only being
36-bit/64GiB capable).  I can't comment any more, but there are now
(finally) a number of public documents on the Internet related to these
issues (although they don't go too deep).



-- 
Bryan J  Smith              Professional, Technical Annoyance
mailto:b.j.smith at ieee.org  http://www.linkedin.com/in/bjsmith
-------------------------------------------------------------
           Fission Power:  An Inconvenient Solution



More information about the nylug-talk mailing list