[nylug-talk] RedHat AS4 and wacky out-of-memory behavior

Bryan J. Smith b.j.smith at ieee.org
Thu Apr 3 17:12:41 EDT 2008


On Wed, 2008-04-02 at 21:42 -0400, Frank D. Greco wrote:
> I had dual-proc/dual-core X86 rack go bonkers on me the other 
> day.  For some reason, this box ran out of memory (it had about 8GB 
> of RAM and 8GB of swap).

Define "ran out of memory"?  There are different types.

You have not only ...
- Physical
- Virtual

But you also have ...
- Over-commit (lets you allocate more than total virtual)
- Other VM tweaks (that avoid the kernel killing processes)

An aged article on over-commit (Red Hat Enterprise Linux 3 / kernel
2.4.21 era) can be found in a 2004 issue of Red Hat Magazine (RHM):    
  http://www.redhat.com/magazine/001nov04/features/vm/  

As well as whitepapers on RHEL 3 and 4:    
  http://people.redhat.com/nhorman/papers/rhel3_vm.pdf  
  http://people.redhat.com/nhorman/papers/rhel4_vm.pdf  

There are also two (2) more things to be aware of.

1.  "Hugemem" kernels on x86/IA-32 (32-bit flat or 36-bit PAE)

To deal with the great number of issues with 3G/1G user/kernel mapping
when Processor Address Extensions (PAE) beyond 4GiB, Red Hat introduced
a 4G/4G user/kernel memory model.  It makes a good performance hit,
anywhere from 5-30%.  But it solves a crapload of issues beyond 4GiB.
In my experience, at 8GiB, it's virtually required.

As of Red Hat Enterprise Linux (RHEL) 5, there is no separate "hugemem"
kernel, just a very "patched" PAE kernel with the 3G/1G model and the
very strong urge "use x86-64/IA-32e" (aka AMD64/EM64T, respectively, or
commonly "64-bit" which is really a 52-bit PAE memory model) releases
instead if you have more that 4GiB, and virtually mandatory for 16GiB.

I've personally dealt with a client that sent many patches to Red Hat
with 3G/1G fixes, and some would do more harm than good.  The lack of
addressing much of the 3G/1G memory model until later in 2.6 was why Red
Hat pushed hugemem with its 4G/4G pretty hard.  Now it's really just
pushing x86-64/IA-32e and leaving the "best effort" 3G/1G PAE kernel for
stragglers who want legacy x86/IA-32.

2.  "Errata" on newer AMD and Intel processors

Intel's new microcore has had a lot of issues.  Through early 2007,
Intel was pretty open with these issues.  After getting beat up by a lot
of enthusiasts sites, they are no longer publicly releasing their errata
information (unlike AMD, who is now far more transparent than Intel).
Unfortunately, we started getting a lot of feedback on the new G0
stepping Xeons (back when I was under NDA on this), especially in
multi-sockets, and a great number of TLB errors.  Although some newer
kernel 2.6 releases offer hacks, there are still BIOS and/or microcode
updates required.  Intel started releasing it's microcode files
(although with 0 errata info) in 2007 as a result of OEM feedback
(although formal recognition and BIOS updates finally started hitting in
early 2008).

Especially for Intel's new 38-bit/256GiB physical addressing processors
(their legacy processors offer 36-bit/64GiB), and massive shifts in how
Intel maintains coherence in its interconnect.  I.e., Intel is now
"discovering" all of the issues AMD started tackling back in 1999.  Keep
your BIOS updated, and consider loading the latest Intel microcode at
boot under Linux.  Go to http://developer.intel.com and search downloads
for "Linux Processor Microcode Data File" to find the latest (yes, Intel
stopped making this easy to find, and the urbanmyth.org site does _not_
have the latest, which come out almost weekly now ;).

AMD's new Processor 10h with 48-bit/256TiB physical addressing (legacy
AMD offer 40-bit/1TiB) has similar TLB issues, especially in
multi-socket with the B2 stepping.  Why AMD moved to a full 48-bit
(256TiB -- that's over 262,144GiB!) physical addressing, I have no idea,
because 40-bit (1TiB = 1,024GiB) was enough -- it's not like Intel where
36-bit was only 64GiB and they had to create a "stop gap" increase to
38-bit and 256GiB.  There is a BIOS workaround for Processor 10h that
makes a performance hit, so I wouldn't use it unless you really run into
it, especially on uni-socket.  I.e., AMD purposely halted all LGA-1207
(multi-socket) B2 stepping shipments, as it was suseptible, and kept the
Socket-AM2+ B2 going as it was rare on it.  If you're running x86/IA-32
and not a x86_64/IA-32e (52-bit PAE addressing, also called AMD64/EM64T,
respectively) OS, you shouldn't run into this with the AMD Processor
10h, as only the legacy 32/36-bit TLB is used and it should have any
issues.

NOTE:  Every processor has _hundreds_ of errata.  Unlike a 6 second
recompile with software, it takes 6 weeks to get back from the fab when
you find something wrong.  It's impossible to simulate everything.
Intel makes it easier to update with its "soft microcode" though.  AMD
is no less suseptible when it comes to bugs either, and even more so
with its NUMA, I/O MMU and other hardware-based coherency approaches
(although Intel is doing more of that too now).

-- 
Bryan J  Smith              Professional, Technical Annoyance
mailto:b.j.smith at ieee.org  http://www.linkedin.com/in/bjsmith
-------------------------------------------------------------
           Fission Power:  An Inconvenient Solution



More information about the nylug-talk mailing list