Single page Print

Postponing the pain of drive failure
If a single hard drive that is not part of a RAID array fails, the data on that drive is almost certainly lost. Backups can save the day and give you back the previous day or week, but recent changes will likely still be lost. To many people or organizations, the loss of any data is unacceptable.

The term Mean Time To Failure (MTTF) is used to describe the failure rate of components; hard drive manufacturers use the term Mean Time Between Failures (MTBF), but for our purposes, the terms are basically synonymous. Typical MTBF/MTTF rates for hard drives can vary from 100,000 hours all the way up to 1,200,000 hours.

Of course, MTTF is more complicated than simply saying "Wow, my new drive should last me 100,000 hours!" For a good explanation of how MTTF relates to the real world, check out this link at PCGuide.com. Regardless, a full and complete understanding of the relationship between MTTF and reliability isn't necessary at this point. We're more concerned with the profound effect (in both directions) that RAID can have on the chance of data loss due to hardware failure.

One other important note: a high MTTF is hardly a substitute for data redundancy. Just ask our own Dr. Evil, who had a nine-month-old hard drive shoot craps on him a couple of months ago, with a total loss of its data. Fortunately, the diabolical data was saved thanks to a DAT drive, which he has since affectionately named Mr. Bigglesworth.

Obviously, the MTTF of a single drive is going to be a given number for any particular drive. Determining the MTTF for a RAID array consisting of several of those drives, however, is more complicated. There are a couple of formulas that the gracious Dr. Evil was able to dig up to calculate the MTTF of RAID arrays from this publication.


As mentioned previously, for single drive configurations, the MTTF is simply the MTTF of the disk, a specification provided by the manufacturer. MTTR refers to the Mean Time To Repair, and is a measure of how quickly you can replace a failed drive. Now, let's take a look at how the MTTFs stack up. For the sake of simplicity, a disk MTTF of 100,000 hours, and an MTTR of 2 days (48 hours) were used in the calculations.


As the number of drives increases, the MTTF drops precipitously for RAID 0 arrays. Remember that drive failure on a RAID 0 is an all for one deal; lose one drive and kiss all your data goodbye. All that extra storage space is great, but you're really leaving yourself open to data loss as your drive count goes up. What about RAID 1?


The difference here is simply massive. RAID 1 offers tremendous protection against data loss. Essentially, the formula allows you to measure the chance that one drive would fail, then the other would fail before the first could be replaced. Even if you assume a two-week wait to replace the failed drive, the RAID 1's MTTF is still over 148 times that of a single drive.

Of course, RAID 1 isn't complete salvation. Remember that it only protects you against data loss from a drive failure—you'll still be susceptible to inadvertent deletions, data corruption, and other factors (such as total machine loss due to lightning or viruses) that would affect both drives simultaneously.