Single page Print

The SSD Endurance Experiment: Testing data retention at 300TB

A new wrinkle on the long road
— 10:02 PM on November 25, 2013

Solid-state drives are everywhere, and we shouldn't be surprised. SSDs have long been much faster than mechanical hard drives—and the difference striking enough for even casual users to perceive. The major holdup was pricing, which has become much more reasonable in recent years. Most modern SSDs slip under the arbitrary dollar-per-gigabyte threshold, and many good ones can be had for 70 cents per gig or less.

Higher bit densities are largely responsible for driving down SSD prices. As flash manufacturers transition to finer fabrication techniques, they're able to cram more gigabytes onto each silicon wafer. This lowers the per-gig cost for SSD makers, but it also has an undesirable side effect. The higher the bit density, the lower the endurance. The very process that's making SSDs more affordable is also shortening their life spans.

All flash memory is living on borrowed time. Writing data breaks down the physical structure of individual NAND cells until they're no longer viable and have to be retired. SSDs have overprovisioned "spare area" to stand in for failed flash, but that runs out eventually, and then what? More importantly, how many writes can current drives take before they fail?

Seeking answers to those questions, we started our SSD Endurance Experiment. This long-term test is in the midst of hammering six SSDs with an unrelenting stream of writes. We won't stop until all the drives are dead, but we're pausing at regular intervals to monitor health and performance. Our subjects have now reached the 300TB mark, so it's time for another check-up—and a new wrinkle. We've added an unpowered retention test to see if the drives can hold data when left unplugged for a few days.

If you're unfamiliar with our experiment, I suggest reading our introductory article on the subject. It outlines the specifics of our setup and subjects in far more detail than I'll indulge here.

The basics are pretty simple. Our subjects include five different models: Corsair's Neutron GTX 240GB, Intel's 335 Series 240GB, Kingston's HyperX 3K 240GB, and Samsung's 840 Series 250GB and 840 Pro 256GB. Anvil's Storage Utilities software provides the endurance test, which writes a series of incompressible files to each drive. We're also testing a second HyperX SSD with the software's 46% incompressible "applications" setting to gauge the impact of the write compression tech built into SandForce controllers.

With the exception of the Samsung 840 Series, all of the SSDs have MLC flash with two bits per cell. The 840 Series has TLC NAND, which delivers a 50% boost in storage density by packing an extra bit into each cell. The extra bit makes verifying the contents of the cell more difficult, especially as write cycling takes its toll. That's why TLC flash typically has lower endurance than its MLC counterpart.

We expected the 840 Series to be the first to show weakness, and that's exactly what happened. After 100TB of writes, we noticed the first evidence of flash failures in the drive's SMART attributes. The attribute covering reallocated sectors tallies the number of flash blocks have been retired and replaced by reserves in the overprovisioned area. There were only a few reallocated sectors at first, but the number grew dramatically on the way to 200TB, and the pace quickened on the path to 300TB.

At our most recent milestone, the 840 Series reports 833 reallocated sectors. Samsung remains tight-lipped about the size of each sector, but if AnandTech's 1.5MB estimate is accurate, our drive has used 1.2GB of its spare area to replace retired sectors. The 840 Series still has lots of overprovisioned flash in reserve, and it still offers the same user-accessible capacity as it did fresh out of the box. That said, its flash is clearly degrading at a much higher rate than the MLC NAND in the other SSDs—no surprise there.

Only two other SSDs have registered reallocated sectors thus far. The HyperX drive we're testing with incompressible data reported four reallocated sectors after 200TB of writes. That number hasn't changed since. However, the HyperX has been joined by the Intel 335 series, which now has one reallocated sector.

The Corsair Neutron GTX and the HyperX drive with compressible data are the only two that remain free of bad blocks after 300TB. Of course, the HyperX has written only 215TB to the flash thanks to its compression mojo.

In addition to tracking reallocated sectors, we're monitoring each drive's health using the included utility software. Samsung's SSD Magician app reports that the 840 Series and 840 Pro are both in "good" health despite the former's high reallocated sector count. Intel's SSD Toolbox says the 335 Series is in good health, as well. Corsair's SSD utility doesn't have a general health indicator, and Kingston's software doesn't cooperate with the Intel storage drivers on our test rigs. However, we can get health estimates for all the drives using Hard Disk Sentinel, which makes its own judgments based on SMART data.

  100TB 200TB 300TB
Corsair Neutron GTX 240GB 100% 100% 100%
Intel 335 Series 240GB 88% 73% 58%
Kingston HyperX 3K 240GB 100% 98% 98%
Kingston HyperX 3K 240GB (Comp) 100% 100% 100%
Samsung 840 Pro 256GB 78% 51% 26%
Samsung 840 Series 250GB 66% 19% 1%

Well, that's not very helpful. HD Sentinel seems to assess health using different SMART attributes for each SSD. The ratings for the 335 Series correspond to the "estimated life remaining" values produced by Intel's own software. There's no correlation between the Samsung software and HD Sentinel's assessment of the 840 Series and 840 Pro, though. It's unclear why HD Sentinel has such little faith in the 840 Pro, which hasn't suffered any flash failures. Even the low health ratings for the 840 Series seem a tad pessimistic given the amount of spare area in reserve.

The lack of standardization for wear- and health-related attributes seems to be part of the problem here. Each SSD maker exposes a different mix of variables, making comparisons difficult. We'd like to see SSD vendors agree to offer a common set of attributes covering reallocated sectors, accumulated errors, overall health, and both host and flash writes. Some SSDs don't even have SMART attributes to track total writes. The Crucial M500 is one example, and we left that drive out of the endurance experiment as a result.