Single page Print

The SSD Endurance Experiment: 500TB update


Halfway to a petabyte
— 9:34 PM on January 9, 2014

I am running out of ways to introduce our SSD Endurance Experiment. This long-term write endurance test began in August, and we've published numerous updates since. Now that our subjects have crossed the 500TB mark, it's time for another checkup.

The rationale for our endurance test hasn't changed, which is why these intros tend to channel the same theme. Solid-state drives use flash memory that has limited write endurance. Every time data is written, the physical structure of the NAND cells degrades. The cells eventually erode to the point where they become unusable, forcing SSDs to poach replacement blocks from their overprovisioned spare areas.

This dynamic raises several questions. What happens when drives run out of overprovisioned area? How long does it take? And do they slow down along the way? We're seeking answers in our endurance experiment, which is subjecting a collection of drives—the Corsair Neutron GTX 240GB, Intel 335 Series 240GB, Kingston HyperX 3K 240GB, Samsung 840 Series 250GB, and Samsung 840 Pro 256GB—to a merciless onslaught of writes.

This introductory article explains the finer details of the experiment, so I won't rehash them here. Our approach is pretty straightforward, though. We're using the endurance test built into Anvil's Storage Utilities to write a series of incompressible files to each SSD. We're also writing compressible data to a second HyperX unit to test the impact of SandForce's write compression technology. As the experiment progresses, we're monitoring the health and performance of each drive at regular intervals.

All but one of our test subjects is based on two-bit MLC NAND. The Samsung 840 Series has three-bit TLC flash, and that puts it at a distinct disadvantage versus the others. TLC NAND's higher bit density increases the storage capacity of the cells, but it also makes the flash more sensitive to wear. Flash memory stores data using a range of voltages that narrows down as the cells degrade. TLC NAND has to differentiate between eight possible values within that narrowing window, which is more difficult than tracking the four values required by the MLC alternative. (Likewise, MLC flash has lower endurance than one-bit SLC NAND.)

We're keeping an eye on flash health using several methods. So far, the best one seems to be monitoring the raw SMART data reported by each drive. One of the SMART attributes counts the number of sectors that have been reallocated from the spare area to replace retired flash from the user-accessible storage. The reallocated sector count is a death toll of sorts, and it lets us highlight TLC NAND's more limited endurance rather neatly. The following graph depicts the number of reallocated sectors for each drive over the course of the experiment thus far.

The 840 Series reported its first reallocated sectors after 100TB of writes, and it's been burning through flash steadily ever since. After 500TB of writes, the 840 Series is up to 1722 reallocated sectors. Meanwhile, the other SSDs have only a handful of flash failures between them. And two of the drives, the Neutron GTX and the HyperX 3K being tested with compressible data, haven't logged a single reallocated sector.

Samsung won't confirm the size of the 840 Series' sectors, but we're pretty sure it's 1.5MB. That means the drive has lost 2.5GB of its total flash capacity already. Fortunately, those flash failures haven't affected the amount of user-accessible capacity. The 840 Series has extra overprovisioned spare area specifically to offset the lower endurance of its TLC NAND. So far, at least, those flash reserves seem to be sufficient.

Although the 840 Series is clearly in worse shape than the competition, these results need to be put into context. 500TB works out to 140GB of writes per day for 10 years. That's an insane amount even for power users, and it far exceeds the endurance specifications of our candidates. The HyperX 3K, which has the most generous endurance rating of the bunch, is guaranteed for 192TB of writes.

To be fair, we should note that the 840 Series has another blemish on its record. The drive failed several hash checks during the setup process for the unpowered retention test we performed after 300TB of writes. The 840 Series ultimately passed the retention test, but its SMART attributes logged a spate of unrecoverable errors that likely caused the hash failures. In a real-world setting, unrecoverable errors could result in data corruption or even a system crash.

Worryingly, Samsung's Magician utility seems unaware of the 840 Series' degrading condition. The software can access the SMART data, but its main interface still proclaims that the drive is in "good" health. The 840 Pro, which has recorded only two reallocated sectors and no unrecoverable errors, has the same "good" health rating.

Third-party software like Hard Disk Sentinel doesn't necessarily do a better job of monitoring drive health, either. We're using the app to read the SMART data on each drive, and it has a separate health indicator of its own. The thing is, that value seems to be based on different attributes for each drive. We're getting wildly different ratings for units that otherwise appear to be in similar condition. HD Sentinel also gives the 840 Series and 840 Pro identical 1% ratings, so it's hard to take the health indicator seriously.

Now that our health checkup is complete, it's time to look at performance.