HDD study finds little correlation between temperatures and failure rates

Online backup provider Backblaze has over 100 petabytes of storage spread across more than 34,000 hard drives. The company has shared some interesting reliability data based on those drives, including manufacturer-specific failure rates. Today, it published new information examining the correlation between failure rates and drive temperatures. In short, there isn’t much of one—at least not with most of the drives in Backblaze’s environment.

Backblaze’s data center is loaded with storage pods that house 45 drives each. These pods are generally populated with the same kinds of drives, and the cooling appears to be sufficient. Average drive temperatures range from 22-31°C, which is well within the acceptable range for mechanical storage. As one might expect, 7,200-RPM models typically have higher temperatures than 5,400-RPM ones.

For the most part, drive temperatures are unrelated to failure rates. Backblaze only found statistically significant correlations with four of 19 drive models. Among those, the worst offender was easily Seagate’s Barracuda LP 1.5TB:

Source: Backblaze

Barracuda LP drives with temperatures below the average for that model failed at a rate of 15.6% annually, while those above the average had a 34.6% failure rate. Even this correlation was deemed to be weak, though. "There’s a lot of overlap between the temperatures of the failed drives and the temperatures of the working drives, so you can’t predict for sure which drives will fail," Backblaze says.

Backblaze also found a weak correlation between temperature and failure rates for the Barracuda LP 1.5TB’s 7,200-RPM sibling. The remaining two drives, the Barracuda 3TB and Hitachi Deskstar 7K2000, exhibited "very weak" correlations. In fact, the cooler Deskstar drives failed slightly more frequently than the warmer ones. Statistics nerds can consult the Backblaze blog post for a model-by-model breakdown of the data.

Backblaze’s drives run an atypical workload in well-cooled servers, so the results aren’t necessarily indicative of what folks might see in desktop systems. The WD Black hard drives in my main PC usually run around 42°C, which is hotter than the warmest drives in this study. I’m still encouraged by Backblaze’s results, though, and I suspect server admins will be particularly interested.

Comments closed
    • entropy13
    • 5 years ago

    [quote<]The WD Black hard drives in my main PC usually run around 42°C[/quote<] My Seagate 7200.12 500GB and Seagate SV35.5 1TB currently run at 38°C. If it's in the afternoon, it could reach 48°C.

    • sli
    • 5 years ago

    Oh another Backblaze report? Let me hit that back button.

    • cynan
    • 5 years ago

    Those 7200 RPM Hitachi 7k2000s really are amazing. The hotter they’re run, the more reliable they become!

    Thanks for continuing to rub it in that we can no longer buy them.

    • Waco
    • 5 years ago

    Anyone else cringe at this?

    Model Correlation Significant? p-value # dead # alive Avg. Age
    Seagate Barracuda 7200.14 0.03 YES 0.02 638 4031 1.4

    That’s 638 drives dead out of 4669…nearly 14%. Damn.

    • albundy
    • 5 years ago

    must be seagate’s poor build quality. i’m running out of options here.

    • just brew it!
    • 5 years ago

    [quote<]Backblaze's drives run an atypical workload in well-cooled servers, so the results aren't necessarily indicative of what folks might see in desktop systems.[/quote<] This. All of my drives run significantly warmer than 28C. Unless you're running in a climate-controlled server room yours probably are too. If there's a knee in the temperature vs. failure rate curve somewhere in the 28C to 60C range this study would not have found it.

      • continuum
      • 5 years ago

      Agreed. Most drives are spec’ed to 50C or 55C or somewhere up there, generally they’re okay– it’s when you exceed the maximum designed limit that things tend to drop like flies…

      At 28C or even 40C I would not expect to see a significant difference, and that’s exactly what this confirms. Confirms within the specific use case and setups that Backblaze has, of course– much like the earlier Google report this data is best taken with an asterisk noting the very specific conditions and use case.

    • HisDivineOrder
    • 5 years ago

    You know what we need?

    Every company everywhere that runs tons of drives like this putting their data together into an overall study of all drives in corporate environments.

    This would be the single most driving force to compel these hard drive companies to improve the quality of their product. Right now, they can “fly under the radar” with lots of junk drives and most won’t notice it very much. They’ll think of those failed drives as just “acceptable losses” but if even the “acceptable losses” were seen for the obvious trends they are, suddenly companies that usually let those losses go by without worrying will start worrying and start improving their drives to fix it. They won’t want to be last in the overall metric even if its well within what they previously considered “acceptable.”

    Even if mostly about drives built for specialized environments, said improvements would then trickle down into even consumer variants, too.

    That said, inter-corporate cooperation would have to be higher plus they’d be risking alienating drive suppliers for a while. Too bad Backblaze, Crashplan, Carbonite, Google, Amazon, Facebook, Microsoft, and Apple couldn’t get together to build a database of all the drives they have and how they fare.

    That’d be some impressive amounts of data on hard drive reliability in a corporate environment.

      • willmore
      • 5 years ago

      Make them like the RR engines used in airliners that have a deditated telemetry feed back to RR where they moinitor it all and can make meaningful infrences and design things better for the next version.

    • TwoEars
    • 5 years ago

    Testing at 28 degrees max is pointless.

    Try 65 degrees max.

      • SomeOtherGeek
      • 5 years ago

      Yea, especially when

      [quote<]Average drive temperatures range from 22-31°C...[/quote<] and the graph stops at 28. It obviously showed that drives were failing at higher temps, they should have just kept going and there might have been higher failure rates. Whatever, just sounds like poor research to me.

        • Farting Bob
        • 5 years ago

        They werent researching it, it’s just what they found in their data centre’s during normal operation and decided to publish the results. They didnt buy 34,000 HDD’s just to test when they fail.

          • SomeOtherGeek
          • 5 years ago

          Yea, I understand. Maybe we all should post results and share it all over the web. It is almost like a selfie.

          • TwoEars
          • 5 years ago

          I would have snuck in there and unplugged some fans…

          FOR GREAT SCIENCE!! 😀

      • oldDummy
      • 5 years ago

      This isn’t a test.
      It’s empirical data which was recorded.

    • Krogoth
    • 5 years ago

    Interesting data, but it would be more meaningful if the data was about the difference in ambient temperate.

    I also suspect there that’s a stronger correlation between temperature and failure rate when HDD go beyond 50C (the data doesn’t include this), but vibrations/torque make a far larger impact on failure rate than temperature.

    • Chrispy_
    • 5 years ago

    If there are 45 drives of the same brand/model all doing the same job, it stands to reason that the hottest ones are the lowest-quality ones with imbalances or out-of-tolerance bearings.

    I also find it amusing that there are a load of ads for Seagate drives underneath an article showing that Seagates have a statistically undesirable failure rate.

      • Saribro
      • 5 years ago

      [quote<]If there are 45 drives of the same brand/model all doing the same job, it stands to reason that the hottest ones are the lowest-quality ones with imbalances or out-of-tolerance bearings.[/quote<] Or a fan in that part of the rack moves a little less air for whatever reason, or the rack is farther away from an airco vent, or the drive is installed in a different area of the enclosure that receives less airflow, or the drive was installed a little dodgy and makes worse thermal contact with the casing, or.....

        • Deanjo
        • 5 years ago

        Bingo or the drive is under heavier usage.

          • Chrispy_
          • 5 years ago

          They’ll be in a striped RAID set all with identical workloads.
          The airflow differences in the drive shelf could have an impact though….

            • Deanjo
            • 5 years ago

            Each of their “storage pods” consisting of 45 drives have 3 arrays (15 in each array running Raid 6). They also spread the physical location of the drives across back planes. Loads can vary from array to array so not all drives see identical loads.

      • superjawes
      • 5 years ago

      [quote<]I also find it amusing that there are a load of ads for Seagate drives underneath an article showing that Seagates have a statistically undesirable failure rate.[/quote<] Yeah, that is quite amusing. Although I would kind of like that ad section moved. I know it's supposed to flash the "Support TR!" one every so often, but sometimes it feels like the ad is part of the article.

        • NeelyCam
        • 5 years ago

        It’s a premium location for possibly a premium price.

          • Chrispy_
          • 5 years ago

          It’s a shame they’re not localised though – prices need to be in € or £ for me to pay any attention.

    • iatacs19
    • 5 years ago

    16-28C is a very low operating temperature range. Most drives will be operating at a much higher temperature in normal non-enterprise applications/use. Even the maximum 38C the report is showing is rather low… In my 5 bay actively cooled NAS my Samsung 2TB 5400rpm drives run around 34C.

      • stdRaichu
      • 5 years ago

      I think that’s probably due to the fact that their drives are probably running in a data centre that’s kept around 15-20°C – and most drives will generally run about 15°C over ambient. Outside of a data centre or the (ant)arctic circles I think it would generally be impossible to keep a drive at below 20°C, and even if you are at the poles you’ll continually have polar bears or penguins sitting on your RAID array for warmth.

      • SomeOtherGeek
      • 5 years ago

      But even at most enterprise settings there is not data-center thingy, just a box with drives to store data. And a lot of small businesses don’t have the setup to cool these things down.

      I remember working in places where they would throw the “data-center” into a small room and put in a AC unit and close the door. Even then, the room was blasting hot.

    • The Egg
    • 5 years ago

    I’d be interested to see a comparison of 5400rpm vs 7200rpm failure rates when both drives are enterprise-class and have the same build quality. Most of your lower-rpm drives are “budget” drives, but I wonder if having a lower-rpm would make them more reliable — everything else being equal.

      • stdRaichu
      • 5 years ago

      I think the problem with that is that it’s impossible to know whether everything else is equal unless you’re actually the manufacturer. Even in the consumer line, there’s significant differences between the 7200 and 5400rpm models other than just the spindle speed.

      Extend that across all the possible combinations of being enterprisey, controllers, memory chips, soldering plant and the phase of the moon when the drive was built and such a test becomes impractical – there’s too many variables in play. Best to just stick to known good/bad families – for example, I forget the name of the product family, but I have a friend who works in data forensics who continually dread seeing a range of maxtor drives that could be guaranteed to die of sudden controller failure if they ever got to about 50°C – since the only fix they can try is swapping in a “new” (read: taken from a working drive that bought off ebay) and similarly unreliable controller and hoping that it works.

      Personally, IME the failure rates between “enterprise” and “non-enterprise” versions of similar drive families are near enough as to make no odds, there are just good and bad “families” from the various manufacturers. Sure, the 10k or 15k drive might be mechanically more robust, but they’ll often share the same controller technology and a bunch of other ICs which may prove to be the weak link instead. Seagate have had a bad run of those lately but every HDD manufacturer has a few skeletons in the closet in that regard.

      • Corrado
      • 5 years ago

      Backblaze has already stated that they see zero difference in failure rates of enterprise drives vs consumer drives. The only difference is the length of warranty.

    • not@home
    • 5 years ago

    I would not think drive temperature would make that much difference as long as it stays withing the manufacturer’s recommended range.I think many large heating and cooling cycles are harmful.

    • stdRaichu
    • 5 years ago

    Correlates with my own observations, and I think the google study from 2007 said the same thing. Whilst I generally try and keep my discs below 40°C, I’ve still got a bunch of ten or so 2TB drives from various manufacturers; to test an HBA whilst between enclosures (I’d been getting lots of controller hangs and dropouts but wanted to eliminate the SATA bay because that was what eventually turned out to be the culprit), I stacked all these drives on top of one another with some corrugated cardboard in between to ease any vibration, plugged them in and powered up my testbed. 10mins of testing across all the ports and great, all done, drives still not gone over 40, time to shut down.

    Except I forgot to shut down the testbed and, several hours later, was horrified to feel all the heat coming off the piles. A quick check of SMART showed that disc temperature was between 70 and 80°C and the drives were too hot to touch without gloves.

    All of those drives are still running fine over three years later.

      • ozymandias
      • 5 years ago

      something that concerns me about this study is that the lowest failure rate of this seagate drive was still 15%. This looks quite high to me?

        • Visigoth
        • 5 years ago

        The 1.5 TB Seagates are known POS drives. Horrible reliability all-around.

    • geekl33tgamer
    • 5 years ago

    Did I miss something? Taking the sample of drives, and those temps vs. failure rates – There’s a definite failure rate of about 1 in 5 when temps exceed 25C or more. Hmmm…

      • dmjifn
      • 5 years ago

      The chart is from one of the 4 specific models that *do* have a correlation. So it’s an exception to what’s called out by the article’s title.

Pin It on Pinterest

Share This