[nylug-talk] Paper IT certs and disk drive fabrication differences -- WAS: Slim home server

Bryan J. Smith b.j.smith at ieee.org
Sat May 17 17:36:17 EDT 2008


Chris Knadle wrote:  
> I understand and I don't see anything wrong with that.

I wasn't asking for anyone's "approval," although I'm not saying you
were giving it either.  ;)

I was just merely stating that it helps me "shut down" the most
trouble-making Microsoft advocates who try to label me as a "Linux
only" guy, even though I was an original NT 3.1 Beta Tester (at the
largest installed app of the very first native NT app).  God knows I
tire of the paper certs.  But as a consultant, I finally "gave in"
back in 2002, taking, and passing, 40 exams over two (2), three (3)
month periods.

> I don't have an MCSE personally.  It just doesn't interest me. 
Some
> say it may be worth money if I had it, but money alone isn't enough
> of a motavator for me to go through the effort.

When times are tough, HR managers use all sorts of "filters."

So far, the most they can ever come up with is, "oh, we need a CS
major," which is pathetic because I have an EE with computer option
and CS minor.  I don't hold it over anyone, as there are always
others with more credentials.  Same deal on the certs.  Just because
I went on a massive "exam passing barrage" (with 0 training, 100%
costs of exams paid by myself) doesn't mean I know any more.

But it does tend to cause some of the biggest Microsoft bigots to
STFU.  Especially when I know more about Microsoft's own internal
operations, including some of Microsoft's own products they don't use
themselves.  Remember, I am a consultant selling myself against other
vendors and products, and have been doing so for a long time.  I've
delivered a crapload of Linux and open source solutions in doing so
(overwhelmingly Red Hat-centric).

> I'm not saying you're wrong,

I'm merely relaying information from both fabrication product
managers and their QA departments.  I'm just an engineer, and I read
engineering documentation.  It's not about "wrong," it's about
understanding actual fabrication and QA operations.

> but what I've read on the subject seems to disagree at least some.

Disagree with what?

I can't "make up" how the drives are designed or tested.  There are
literally two (2) types of fabrication techniques, and the more
commodity of those two (2) are now available in two classes.  I'm not
making this up.

> As far as I can tell in terms of longevity it doesn't matter (much)
> what kind of drive is used,

Please define "type"?  You haven't yet.  I will, again, though in
bullet form so everyone knows what the fabrication plants and vendor
QA processes use ...  ;)

1. Enterprise fabrication (more costly materials)
2. Commodity fabrication (least costly materials)
  A. Enterprise QA/rating (samples testing to highest tolerances)
  B. Consumer release (everything else)

Virtually no one makes a 7200rpm drive in #1 class of materials
anymore.  It's all 10-15Krpm now.  Using commodity materials and
fabrication techniques at those speeds would result in way too many
bad samples.  Heck, virtually all 15Krpm drives are 2.5" platters
now, because 3" has too high of a failure rate.

Drives of 10-15,000rpm spindle have greatly reduced density.  When
commodity hitting 500-750GB disks, enterprise fab was just hitting
292GB disks.  Enterprise fabrication has been 9, 18, 36 and
73GB/platter, finally hitting 146GB/platter more recently.  Commodity
fabrication density has been 40, 80, 160GB/platter, now
320GB/platter, in the same timespan.

Today, there is virtually no reason to fab enterprise unless you
absolutely need 10-15,000rpm.  It may not be long before 10,000rpm
becomes commodity in 3.5".  Of course, 7200rpm is starting to become
commodity in 2.5" as well.  And since 15,000rpm drives are 2.5"
platters, the new "datacenter standard" has become a 15Krpm 2.5"
drive.  The days of 3.5" drives are numbered.  ;)

> or what interface,

Interface means jack, I would never suggest otherwise.  In fact,
proliferation of that non-sense is part of the problem.

But because of the costs involved with "enterprise fabrication" at
10-15,000rpm, they typically have SAS or FC interfaces, and virtually
never ATA.  The sole exception I know of is the Western Digital's
Raptor line, which is Hitachi's Deskstar 10,000rpm fabrication with a
simple ATA logic, instead of standard SAS.  At least Hitachi Global
Storage Technologies were the ones fab'ing for WD last time I
checked, it could have changed.

Everything that is 7200rpm or slower today is commodity fabrication. 
So it's offered in every interface standard.

[ SIDE NOTE:  eSATA mechanical and electrical specifications,
however, are varying and non-standard, and I don't trust them one
bit.  I hope that changes in the future.  SAS, on the other hand,
received proper QA by major adopters.  It's really more of a matter
of the vendors involved, and the costs.  Every fly-by-night Chinese
fab is doing eSATA, whereas far fewer are doing SAS. ]

> and I would tend to doubt that drives marked "enterprise" or
> "near-line" would make a huge impact in longevity.

It just means the samples tested of a batch were ones that tested to
higher tolerances, and may have different firmware loaded, ones
designed to work with managing storage controllers (for various
reasons).  Doesn't mean a "near-line" or "RAID" or "Enterprise"
labeled 7200rpm commodity disk will fail any less.  It just means,
statistically, there is less chance of doing so.

Statistics is what this is all about, especially when it comes to
product management.  Hitachi, Seagate, etc... won't let HP, IBM,
etc... sell servers or RAID subsystems with consumer-rated drives, at
least they won't while still offering a warranty.  They will only let
HP, IBM, etc... and other resellers sell them with the commodity
fabbed units rated for 24x7 operation.

It was IBM, before they sold their unit to Hitachi, that came up with
their internal study on the 7200rpm Deskstar.  It's where the 50,000
restart, 8 hour/day operation = 400,000 hour MTBF came from on
commodity storage.  IBM further forbade sale of the Deskstar in units
that would have more than 14 hours of operation, and started
enforcing it (they can read SMART data ;).  IBM also showed that
commodity fabbed drives that test to higher vibration and thermal
tolerances (although this was before the commodity thermal operation
temp changed from 40 to 55C) could last as long as 1Mh MTBF.

> This is just one example of studies I've gleaned through:
> There is no guarantee of reduced failure or not.  But, again, there
> are clear differences in materials between the two fabrication
> techniques, and clear differences between QA done between the two
> commodity classes.
>  
http://storagemojo.com/2007/02/19/googles-disk-failure-experience/

Everyone has read the Google study.  It's largely not surprising.  It
only surprises those who don't know the first thing about today's
commodity fabrication.

Google is using many whitebox and consumer-grade drives.  Their study
was not surprising in the least bit.  Of the more limited tier-1 and
tier-2 products they were using, it was still a bit surprising to see
they aren't reaching the 1Mh MTBF though.  So the more recent
attitude these days is that everyone should be running with RAID-1
mirroring, so much so that Dell and other Tier-1 OEMs offer to ship
Fake (software) RAID-1 on many models now.

In the 21st century, even commodity drives are designed to take up to
55C operating.  Many materials have become more commodity, enough for
reliable 7200rpm drives.  As such, ambient temperature is no longer
an issue, as even the Google study showed.  Of course, if you're
mashing 4+ drives with no airflow, they will exceed 55C -- something
even the Google study did not account for well.  They added vibration
is also a major issue as well.

The other thing from the Google study is the fact that virtually
_all_ of their drives under study were 7200rpm commodity.  Google is
not a shop where they care about longevity, because they are
concerned about per unit cost.  Everything they buy is commodity. 
That includes even their SCSI drives, they were almost all commodity
7200rpm.

> But again I have never used "Enterprise" nor "Near-line" drives as
far
> as I know, so I only have second-hand knowledge of this from
reading
> various studies.

Have you bought a tier-1 PC OEM server from HP, IBM, etc...?  If so,
and the spindle is 10,000rpm or higher, then you have an enterprise
fabbed drive with lower densities (smaller platters in the case of
15,000rpm).  If they are 7200rpm, then you're running with the
higher-rated, commodity fabbed drives.  Statistically the failure
rates will be lower over their lifespan.

Has nothing to do with interface, 100% agreed.  Again, not only do I
never debate that, but the problem is that anyone does.  But there
are clear fabrication differences (and costs) between 10Krpm and
higher and 7200rpm or lower.  And then there is a sampling difference
for commodity as well, including firmware changes to accommodate.

In reality, you must _always_ study the specifications _and_ research
the post-release reliability statistics of a _specific_ model line. 
E.g., Seagate Barracuda 7200.7 drives were quite reliable, but the
Maxtor fabbed Seagate Barracuda 7200.8/9 drives utterly dipped, and
quite badly.  Seagate bought Maxtor outright, and improved
fabrication techniques on the Seagate Barraduca 7200.10 line.  The
NL35 and Barracuda ES QA'd versions of these drives will follow the
reliability of their base fabrication (i.e., the 8/9-based NL35's
suck, and the 10-based Barracuda ES' are improved).  Seagate's more
recent issues with the Barracuda 7200.11 matches other issues in the
industry -- varying ATA implementations of NCQ and other interfaces.

Frankly, I'm tired of too many vendors implementing ATA specs
sub-standardly -- from the drive (IDE) to controller (bus arbitrator)
to OS driver (often implement due to lack of disclosure).  But that's
another issue that has nothing to do with the drive fabrication or
related costs to it.


-- 
Bryan J. Smith       Professional, Technical Annoyance
b.j.smith at ieee.org  http://www.linkedin.com/in/bjsmith
------------------------------------------------------
       Fission Power:  An Inconvenient Solution


More information about the nylug-talk mailing list