Reliability study tracks 25,000 hard drives

Way back in 2007, Google published a study on hard drive failure trends. The data revealed that failures typically occur very early in life or after several years of use. The study is a little dated, though, and so is similar research (PDF) conducted by Carnegie Mellon University. Fortunately, we have fresh data from online backup provider Backblaze, which has published failure statistics for 25,000 hard drives bought in the last five years.

According to this data, infant mortality is still a problem. The failure rate for the first three months of operation is higher than for any other quarter until after the three-year mark. Backblaze reports that 5.1% of its drives failed within the first 18 months, followed by only 1.4% for the following 18 months. After three years of use, the failure rate jumps to 11.8%.

Nearly 80% of the drives are still operational after four years. Backblaze doesn’t have data points beyond that, but the current trend suggests a median drive life of six years.

Interestingly, the bulk of Backblaze’s drives are consumer-grade models rather than enterprise variants with server-specific features and longer warranties. In fact, 8% of the firm’s 75PB storage capacity comes from “shucked” drives that began their lives in external enclosures. Backblaze doesn’t break down failure rates by drive type, but it promises to detail the differences between consumer- and enterprise-grade models in a future post. Since the company has “standardized” on consumer drives, it seems to be happy with their longevity versus the server-specific alternatives.

Even if enterprise-grade drives fail less frequently, the difference may not be large enough to justify the price premium. Pricing also appears to motivate Backblaze’s harvesting of external drives. The firm started shucking portable drives in response to the high prices and limited availability of internal drives that immediately followed 2011’s Thailand flooding. A recent blog post suggests the practice continues to this day, perhaps because portable drives are often cheaper to buy than equivalent internal products.

Although Backblaze has pledged to update its reliability statistics every quarter, it doesn’t look like we’ll get a manufacturer breakdown. I’d be very curious to see whether any makes or models are failing more often than others.

Comments closed
    • TwoEars
    • 7 years ago

    A single hard drive, no matter how reliable, will still never be reliable enough for critical data.

    I’ve been very happy with synology cloud station, it’s pretty much plug and play and keeps four devices up to date with the same files. Which means I essentially have three back-up copies in case one should fail. It even keeps deleted files on the nas server.

    • ronch
    • 7 years ago

    Well, if you’ve been tinkering with computers long enough you should know this. With RAM, bad sticks usually die within 24 hours. After that they’ll probably (yes, PROBABLY) last for the entire computer’s lifetime. With hard drives, it’s as Backblaze says it is. That’s why back up and redundancy schemes have been invented — no data storage medium you can affordably buy off the shelf is really failsafe as with most other things in life.

    • Krogoth
    • 7 years ago

    It is a practically a miracle that failures don’t happen more “often” given the fine tolerances that modern HDDs are built with.

    • GENiEBEN
    • 7 years ago

    Comes as no surprise, probably anyone that worked with computers long enough noticed this. It either dies quick or you wish it did so you could upgrade to newer stuff without feeling guilty.

      • PrincipalSkinner
      • 7 years ago

      I did not ever wish for my HDD to die nor will I ever!

      • indeego
      • 7 years ago

      You do realize guilt is an emotion one [b<]chooses[/b<] to have and it's not mandatory at all?

        • Krogoth
        • 7 years ago

        You can argue the same thing about every emotion.

        It is a learn behavior.

    • Rza79
    • 7 years ago

    I can tell from my line of work (hundreds of computers a year), that the quality of hdd’s has gone up. Failure rate has decreased dramatically in the last two years (meaning drives built 3-4 years ago). Especially laptop drives. Up until 2-3 years ago, many laptops came into repair with SMART errors but lately it’s barely happening.

      • Flatland_Spider
      • 7 years ago

      Accelerometers probably have something to do with more durable laptop hard drives.

    • odizzido
    • 7 years ago

    I don’t understand why they never seem to release specifics on manufactures. All this study tells me is that hard drives die, which I already knew. What I don’t know is who is the most reliable.

      • llisandro
      • 7 years ago

      FYI, way back in 2011 their blog said the 3TB Hitachi Deskstar 5K3000 HDS5C3030ALA630 was their pick for most reliable drive. (it has a 3/5 rating on newegg). I don’t think I’ve seen any newer info.

      [url<]http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/[/url<]

    • Star Brood
    • 7 years ago

    This doesn’t surprise me. You always see a drive rating of 4/5 stars on even the best HDD’s on product reviews.

    • derFunkenstein
    • 7 years ago

    What’s interesting to me is that if Backblaze is “shucking” drives, they’re “shucking” the warranty along with it. Most of the commercial external drives I’ve taken apart have warranty stickers inside that when broken void them. So they’re losing out on the 10-ish percent of warranty RMAs that they could have had by not buying internal drives in the first place.

      • Ravynmagi
      • 7 years ago

      I can’t remember if it was Seagate or Western Digital. But they had no problem with me doing a warranty claim on a shucked hard drive. I just registered the claim using the serial number on the hard drive, not the external enclosure. And when it failed,I just sent in the bare drive and they sent me a bare drive replacement.

        • Deanjo
        • 7 years ago

        Done it on both WD and Hitachi without issues as well

      • GENiEBEN
      • 7 years ago

      If someone buys 75PB of storage from you, will you void their warranty claims on such silly basis?

        • Scrotos
        • 7 years ago

        Sure. Why not? An OEM like Dell or HP is buying the stuff direct, most likely. Backblaze is going through distributors and the money isn’t going directly to the drive manufacturer.

        If I were WD or Seagate, I’d want to keep the direct customers happier than someone piecemealing drives from Costco.

        • derFunkenstein
        • 7 years ago

        If they’re buying external drives and stripping the cases, yes, they’re probably not buying from the manufacturer.

      • Flatland_Spider
      • 7 years ago

      They might also be destroying the dead drives to keep data leaks from happening.

        • derFunkenstein
        • 7 years ago

        That’s probably a really good point.

        • PainIs4ThaWeak1
        • 7 years ago

        That wouldn’t necessarily indicate that they’re not utilizing any warranty benefits.

        The Department of Defense, and its subordinates, are only required to turn over the face plate of a failed drive in order to obtain a RMA replacement, and has been doing so for at least a decade. (Ask me how I know.)

        The failed drive platters/PCB/casing are then degaussed or destroyed.

      • dextrous
      • 7 years ago

      They do not RMA the dead drives anyway. It would probably cost them more in labor and shipping than just buying another pallet of them from NewEgg anyway.

      Here is a previous blog post about the lifecycle of a typical drive: [url<]http://blog.backblaze.com/2013/10/28/alas-poor-stephen-is-dead/[/url<]

        • Chrispy_
        • 7 years ago

        Interesting tidbit from that blog post:

        [quote<]One of the observations we’ve seen over the years is that if a Hitachi drive is going to fail during the load test, it will usually fail early and hard – it just dies. On the other hand if a Seagate drive is going to fail during a load test, it will usually fail later on in the test and often it fails soft, meaning it continues to operate but one or more of its SMART attributes are out of compliance.[/quote<] It mirrors my own experience over the last 8 years of enterprise storage (we use consumer sata for tier-3 storage), but it was only a hunch. Turns out that the hunch also applies to companies using 25000 drives, not just 200 of them.

    • Wirko
    • 7 years ago

    Ah, those pesky facts rising up again. Both WD and Seagate are assuring me that the MTBF is ~100 years, and that’s enough for my peace of mind.

    It’s funny however that MTBF is given, which stands for mean time between failures – between two consecutive failures! – but isn’t the first failure of a hard drive usually fatal?

    • sschaem
    • 7 years ago

    When I built my raid10 array (4 drives) I bought 5 disk knowing the drives would fail sooner and/or later.

    And sure enough after a year, I got an automated email that Drive 3 failed.
    I put in the spare drive on the spot, and sent the bad one for a free replacement.

    I think that this wont be the last time… but I think next time this will happen out of warranty.

    I also expect that in 5 years SSD will become affordable, and I can then build a brand new array on 2TB SSDs… until then I’m stuck using mechanical drives, waiting for my next email….

      • Airmantharp
      • 7 years ago

      Affordable in two years, I hope…

      I want a PCIe card to RAID PCIe SSDs… 4GB/s in both directions? Please :).

        • BIF
        • 7 years ago

        oh sure, as long as they give me enough free space to make it worth my while. PCIe slots are limited, so I’ll need about … oh, 20 TB per slot. 😀

          • Airmantharp
          • 7 years ago

          If they can easily make 1TB PCIe SSDs today, 4TB per stick shouldn’t be too hard soon enough. Put eight of those on a PCIe card, and you have 28TB in RAID5 :).

    • FuturePastNow
    • 7 years ago

    If I recall correctly, that Google study also showed that one manufacturer had a significantly higher failure rate than others, although Google declined to name names.

      • ClickClick5
      • 7 years ago

      *cough* Seagate *cough*

      We almost have to order ten for just one. 3 will be DOA, 5 will fail within a month or so, 1 will fail in about a year, the last one will go until the warranty expires.

        • MadManOriginal
        • 7 years ago

        I thought the 7200.11 drives reputation was behind Seagate now?

          • Bensam123
          • 7 years ago

          Much like many things, once you gain a reputation for something it follows you around no matter what and people apply it to everything. Pretty sure it’s just the 7200.11s, although reviews on Newegg for a lot of newer drives (for all manufacturers) are pretty poor. I made a forum post about that.

      • Deanjo
      • 7 years ago

      IIRC Fujitsu had the worst failure rate during that period. It wasn’t a case of if their hard drives would fail but how long before failure. It got so bad that if your hard drive failed, Fujitsu on an RMA would send you a check instead of a replacement drive so that you could purchase another manufacturers drive.

    • albundy
    • 7 years ago

    not sure why the failure rate would jump after 3 years of use. were the drives constantly filled with data? once a drive passes after a certain amount of time, i cant see it failing out of the blue.

    • egon
    • 7 years ago

    Ever curious about that external vs internal price difference. Here in Australia, it could be largely explained by the fact that the biggest retailers only sell externals, and their bargaining power at wholesale would keep prices low relative to internals which are sold by comparatively small, specialist computer hardware stores.

    However, that theory doesn’t seem to hold for the US, where you have major retailers selling internal drives, so if the same disparity exists there, it leaves me wondering why.

      • Kougar
      • 7 years ago

      No idea why, but major retailers have always been offering the steepest discounts on externals for years. The control boards in the enclosures are so cheap that they often die before the drive itself does, so adding an enclosure to a drive probably costs almost nothing for them though.

      If I had to guess it was just because externals, when they first became popular, only offered a 1-year warranty when 3-5 years was the standard for a basic off-the-shelf internal mech drive. So manufacturers saved there overall.

        • yevp
        • 7 years ago

        Yev from Backblaze here -> We think it might have to do with price elasticity. Consumers (who generally don’t build computers anymore and just order them from Dell or other manufacturers) typically buy externals to house extra data, and don’t bother with internals cause it means they have to open their machine and possibly void the warranty. So external drives tend to cost less (at least in the States) because companies tend to buy internal storage for computers and servers, and they can “afford” to pay more. Though not always the case, it was linked in the article, but take a look at: [url<]http://blog.backblaze.com/2013/10/28/alas-poor-stephen-is-dead/[/url<] for some more info!

      • Aliasundercover
      • 7 years ago

      Market Segmentation

      Customers of external hard drives have more price sensitivity, or at least the vendors see them that way.

      • Wirko
      • 7 years ago

      But you don’t go to a small specialist store if you’re up to buying a [url=http://www.dansdata.com/gz105.htm<]Volvoful[/url<] - or two - of hard disks at once.

    • NeelyCam
    • 7 years ago

    Puts the SSD reliability in perspective, doesn’t it…

      • Deanjo
      • 7 years ago

      You mean how SSD’s have not proven to be anymore reliable then mechanical drives?

        • yevp
        • 7 years ago

        Yev from Backblaze here -> We’re looking in to those statistics as well 🙂

        • Bensam123
        • 7 years ago

        But it takes forever for them to reach the write maximum! They couldn’t possibly fail before then… XD

    • hiro_pro
    • 7 years ago

    [quote<]Backblaze reports that 5.1% of its drives failed within the first 18 months[/quote<] I was surprised to see it took up to 18 months for the early failures to show up. i would have thought there would be a steeper drop-off in failures after the first to three months.

      • Chrispy_
      • 7 years ago

      Yeah, me too – but I guess infant mortality rates there are skewed by DOA’s that never even make it into service.

      My guess is that the sharp levelling-off of the failure graph at 18 months is because accelerated wear caused by out-of-tolerance mechanical parts falls within the design threshold of things like the damping and the bearing balance systems etc, making it a closed-system, negative feedback loop that is stable. – If it’s not too bad the design completely copes with the flaw.

      However, even if it’s just even a tiny bit over that threshold of tolerance, the drive is pushed into a positive feedback cycle that starts the drive of a path to self-destruction, and the threshold between negative feedback and positive feedback is such that the cutoff is 18 months of constant use.

      I dunno, I’m just guessing but I’m an engineer so I love to hypothesize….

    • tipoo
    • 7 years ago

    [quote<]Nearly 80% of the drives are still operational after four years.[/quote<] That doesn't inspire much confidence, despite the positive phrase framing. 20% of drives, dead, in what should be about an average PCs lifespan (perhaps on average less for laptops, more for desktops). Back up ALL THE THINGS, folks. I'd be interested in seeing this for SSDs (and we have the Techreport working on that one 😛 ). People are so scared of the new technology, but aside from bad controllers early on they seem like a better bet than HDDs nowadays.

      • nico1982
      • 7 years ago

      [quote<]That doesn't inspire much confidence, despite the positive phrase framing. 20% of drives, dead, in what should be about an average PCs lifespan (perhaps on average less for laptops, more for desktops).[/quote<] I guess you should consider that laptop and desktop HDDs don't run 24/7.

        • Scrotos
        • 7 years ago

        Yeah, this is a business after all. I keep my computer on at home 24-7 but I know it’s not hitting the drive nearly as much as a cloud storage provider is gonna see.

        • tipoo
        • 7 years ago

        I’ve read that powering them on and off more often decreases life over 24/7 operation,rather than increasing it.

      • superjawes
      • 7 years ago

      All depends on usage. A drive being used 24/7 isn’t going to last as long as others, and laptop hard drives probably fail at much higher rates than desktop ones because users will end up moving the machine while in operation (bad for mechanical drives).

      But yes, backing up ALL THE THINGS is a good idea no matter what the reliability is.

        • tipoo
        • 7 years ago

        I’m not sure that’s true, at least about the 24/7 being on thing. Higher activity may kill the actuator motor on the read head faster, but turning drives on and off is actually more stressful on the platter motor than leaving it on 24/7, for the same reasons accelerating to 60 in your car is more stressful on the engine than just staying at 60.

          • Stickmansam
          • 7 years ago

          I guess depends how often you’re turning it on and off compared to the 24/7 activity. I would expect turning them on and off once a day would cause less stress than leaving them on 24/7.

          If you’re turning them off multiple times during a day, I can see that being worse than just leaving it on 24/7

            • Entroper
            • 7 years ago

            [quote<]If you're turning them off multiple times during a day, I can see that being worse than just leaving it on 24/7[/quote<] In other words, check your power saving options?

      • Deanjo
      • 7 years ago

      [quote<]People are so scared of the new technology, but aside from bad controllers early on they seem like a better bet than HDDs nowadays.[/quote<] So far there is no data to say A is more reliable then B. Tech Reports test is fine but hardly conclusive with such a small sample size on a component that already has a relatively low percentage of infant mortality. Because of the sample size of TR's test, you cannot really use it for an assessment of failure rate.

        • Airmantharp
        • 7 years ago

        Other than the old mechanic’s advice ‘the fewer moving parts the better’ you mean?

          • Deanjo
          • 7 years ago

          So far that adage hasn’t proven true. There is a reason why those mechanics are retired now. Vehicles have more moving parts now then the ever did and they are also more reliable then the simpler vehicles of years past.

          Simpler to repair != more reliable.

            • superjawes
            • 7 years ago

            [quote<]Vehicles have more moving parts now then the ever did...[/quote<] Is that true? Yes, some mechanical componets have become more complex, but so to has the electrical system, which has inheirited some of the responsibility of older mechanical components. And that "old mechanic's" advice does live on in perception. Even if it is more reliable, being able to repair on the spot versus getting a tow to a dealership/certified mechanic sticks in the drivers mind...

            • Deanjo
            • 7 years ago

            [quote<]Is that true?[/quote<] Many many many more parts. The vehicles of yesteryears did not have items like 4 valves per cylinder, multiple cams, antilock breaks, 6 speed automatics, etc etc. [quote<] And that "old mechanic's" advice does live on in perception. Even if it is more reliable, being able to repair on the spot versus getting a tow to a dealership/certified mechanic sticks in the drivers mind... [/quote<] You can only repair it on the spot if you have the facilities and equipment to do so. In the case of comparing it to a harddrive, 99.9999999% don't have such experience or facilities. In vehicles, you also had a lot more maintenance to do on the simpler vehicles. When a modern vehicle does have problems, more times then not, it is electrical related, not mechanical. Ask any mechanic what they do when a sensor goes off. First thing to check, if the sensor is actually working properly. 100,000 miles in an old vehicle used to mean it was soon time for the crusher, now days the more complex modern vehicles can go multiple times over that and still run fine.

      • Bauxite
      • 7 years ago

      Hard drives are consumables, have spares on hand, they will die. Its been that way for ages to anyone that runs enterprise data storage, no surprise there.

      Backups are nice, but you really need something that recognizes all hardware sucks (ZFS) if you care about integrity of the [b<]live[/b<] data, otherwise you're just archiving junk data. That backup from 6 months ago? yeah its corrupt too... When is the last time you hashed your data? For most people reading this, probably never. At least once a week for me.

    • superjawes
    • 7 years ago

    [quote<]Even if enterprise-grade drives fail less frequently, the difference may not be large enough to justify the price premium.[/quote<] But said drives will continue to be popular in those circles because downtime is expensive. There will obviously be redundancy, but businesses would prefer to have peace of mind in the first place.

      • tipoo
      • 7 years ago

      Isn’t there also a difference in how enterprise vs consumer drives handle failure? iirc enterprise drives will try to continue to operate even throwing up errors while consumer drives will just tell the system they’re dead? Something like that, I forget.

        • Grigory
        • 7 years ago

        Interesting if true!

        • Scrotos
        • 7 years ago

        You are correct. More information:

        [url<]http://www.smallnetbuilder.com/nas/nas-features/31202-should-you-use-tler-drives-in-your-raid-nas[/url<] The SAS standard versus the SATA standard, basically. SAS pretty much assumes that drives will be used as part of a RAID and has some extra mojo to help the RAID controller manage the drives. If you put a regular SATA drive in there, it could either jack up the entire RAID waiting for an error timeout or the controller will think the drive is bad and then just drop it from the array. If memory serves, Backblaze does a JBOD thing and software RAID so they probably don't care about this issue since they can configure the software RAID however they need to. People running standard HP or IBM or Dell servers with dedicated RAID controllers care more about this kinda thing and thus pretty much need to stick to SAS drives. In theory if you got a "cheap" drive that adhered to the SAS standard, go ahead and use that, but any "cheap" drive is always SATA and SAS is always paired with the most expensive drives. I don't know what SAS gets you besides better error handling and in some cases dual path/dual port for redundancy.

          • Whispre
          • 7 years ago

          We can, and do, run SATA drives in a Dell Hardware array… on a PERC H710P we run 12 disk raid 5 array’s with 12 4TB SATA drives. These drives are Hot Swappable, and offer pre-failure notifications about 95% of the time.

            • Scrotos
            • 7 years ago

            Some of those NAS specific drives like a WD RED?

        • nanoflower
        • 7 years ago

        It’s more of a NAS thing where the firmware is altered to control how long a drive will spend retrying a weak sector before giving up and marking it bad. On a consumer system it’s usually okay if the computer spends a long time waiting for the drive to come back but on a RAID system you don’t want that happening because it could cause the drive to be taken offline just because the drive was busy retrying a weak spot and took too long to mark it bad. So the drive may only retry a weak spot a few times before marking it bad.

      • MadManOriginal
      • 7 years ago

      I suspect that this particular company has some redundancy… 75PB!

      • derFunkenstein
      • 7 years ago

      Eventually when you’re on a large enough scale, you can afford to buy extra drives as hot swappable spares so your arrays can just rebuild when your drives die.

      • Flatland_Spider
      • 7 years ago

      To add to your comment, scale needs to be taken into account.

      If you have 25,000 disks, some serious redundancy can be built up, and dropping a drive, or ten, doesn’t matter.

      On the other hand, if you have five disks dropping one does matter, and the more durable ones are the better option.

Pin It on Pinterest

Share This