We have built an infrastructure that collects vital information about all Google's systems every few minutes, and a repository that stores these data in timeseries format (essentially forever) for further analysis. The information collected includes environmental factors (such as temperatures), activity levels and many of the Self-Monitoring Analysis and Reporting Technology (SMART) parameters that are believed to be good indicators of disk drive health. We mine through these data and attempt to find evidence that corroborates or contradicts many of the commonly held beliefs about how various factors can affect disk drive lifetime.The results are surprising. For instance, Google's data suggest that high drive temperatures and high utilization don't necessarily translate to higher failure rates. The data also suggest that the highest failure rates occur in drives that are three years old.
Our paper is unique in that it is based on data from a disk population size that is typically only available from vendor warranty databases, but has the depth of deployment visibility and detailed lifetime follow-up that only an end-user study can provide.
Disappointingly, Google omits to mention what might be the most important piece of information of all: which manufacturers have the most failure-prone drives. Perhaps the search giant doesn't want a lawsuit on its hands, or perhaps it doesn't want to risk compromising any juicy discounts it might receive from hard drive makers. Nevertheless, Google claims differences in failure rates between drive models or brands are not significant. "In contrast to age-related results, we note that all results shown in the rest of the paper are not affected significantly by the population mix," the paper says.