Single page Print

The SSD Endurance Experiment: Data retention after 600TB


A three-week reprieve from the onslaught
— 11:35 PM on February 23, 2014

Six weeks have passed since our last SSD endurance update. When we last visited our heroes, they had just crossed the half-petabyte threshold—no small feat for a collection of consumer-grade drives that includes the Corsair Neutron GTX, Intel 335 Series, Kingston HyperX 3K, and Samsung 840 Series and 840 Pro. Those drives have now left the 600TB mark in the rear-view mirror, so it's time for another update.

If you think it's taken longer than usual to add 100TB to the total, you're right. The truth is, the SSDs have been on hiatus, and so have I. The drives hit the 600TB mark about a week before I was scheduled to escape to Thailand for a two-week vacation. Since our subjects have been working pretty much non-stop since August, I figured they could use a break, too. Their vacation would be a working one, though. Instead of spending their time on sun-soaked beaches with cold beers in hand, the drives participated in another unpowered data retention test.

The Samsung 840 Series kicked out a spate of unrecoverable errors when we conducted our first retention test after 300TB of writes, so we were curious to see how it and the others would fare in a longer unpowered test. We have more bad blocks to report for several of our candidates, including a couple of MLC-based drives, plus another set of performance results. Let's get started.

Those unfamiliar with our endurance experiment would do well to start with this introductory article, which explains our methods in greater detail than we'll indulge here. The concept is pretty simple. Flash-based storage has a limited tolerance for writes, so we're writing loads of data to a bunch of SSDs to see how much they can take. We're also monitoring drive health and performance to observe what happens as the flash degrades.

Our test subjects include six SSDs designed for consumer desktops and notebooks: the Corsair Neutron GTX 240GB, Intel 335 Series 240GB, Kingston HyperX 3K 240GB, and Samsung 840 Pro 256GB, which are all based on two-bit MLC flash, and the Samsung 840 Series 250GB, which uses three-bit TLC NAND. The drives are being tested with incompressible data, and we have a second HyperX unit that's being tested with compressible data to explore the impact of SandForce's write compression tech.

Flash cells fail when the effects of accumulated write cycling prevent data from being stored reliably. In addition to degrading the physical structure of the NAND cells, write cycling causes a negative charge to build up within them. This charge reduces the range of voltages that can be used to define the contents of individual cells, making it more difficult to write and verify data. Eventually, cells become unreliable and have to be retired. They're replaced with reserve blocks culled from the SSD's overprovisioned spare area.

TLC NAND squeezes an extra bit into each flash cell, so it's more prone to wear than the MLC alternative. That puts the Samsung 840 Series at a theoretical disadvantage versus its MLC competition, and this relative weakness has been reflected in the experiment's results already. The 840 Series has registered far more reallocated sectors—bad blocks of flash that have been replaced—than any of the MLC-based drives. It was also the only drive to stumble in our first retention test after 300TB of writes. (That's not to say that the 840 Series flunked this class; 300TB is well beyond what the average consumer is likely to write to any SSD during its lifetime.)

In that initial retention test, we copied a 200GB file to each drive and performed a hash check to verify its integrity. We then unplugged the drives for a week before performing the same hash check again.

The 840 Series failed repeated hash checks after we first copied our test file to the drive. Its SMART attributes logged a considerable number of unrecoverable errors around the same time, indicating serious failures that could have led to corrupt or lost data in a real-world situation. The hash errors disappeared when we copied the file over a second time, and the 840 Series ultimately passed the unpowered component of the retention test without issue.

We followed the same test procedure after 600TB of writes, except this time, the drives were left in an unpowered state for three weeks. Each drive passed the hash checks we ran before and after the unpowered period. Even the 840 Series had no issues—and no more unrecoverable errors. Whatever caused the problem in our first retention test didn't affect this latest one.

Interestingly, the 840 Series logged one reallocated sector during the retention test. Copying our 200GB file to the drive apparently pushed one of its flash blocks beyond the brink. That brings the number of bad blocks to date up to 2192, far more than for any of the MLC-based SSDs. Allow me to illustrate:

So, yeah, no contest here. As expected, the 840 Series' TLC flash is eroding at a much higher rate than the two-bit alternatives. The decay rate has been pretty linear since failures starting piling up after 100TB of writes.

According to my calculations, those reallocated sectors add up to over 3GB lost to flash failures. Thanks to overprovisioned spare area, though, Windows reports the same total capacity as when the 840 Series was in its factory-fresh state. Samsung allocated additional spare area to make up for TLC NAND's more limited lifespan; the 840 Series 250GB devotes 23GB of flash to overprovisioned area, while the MLC-based 840 Pro 256GB sets aside less than 18GB. Even with nearly 2200 block failures, the 840 Series still has a long way to go before it runs out of reserves.

In a moment, we'll see if the 840 Series' flash failures have resulted in any performance slowdowns. First, we have to address the part of the graph that's squished to keep up with the mounting flash failures. Although it's hard to see, two other SSDs suffered additional flash failures over the last 100TB of writes. The number of reallocated sectors reported by the Kingston HyperX 3K jumped from four to 10, and the 840 Pro's total climbed from two to 28. Those numbers are still very low, suggesting that the drives have loads of life left in them.

The other MLC drives are faring even better. The Intel 335 Series continues to report only one bad block, and the Corsair Neutron GTX remains fully intact. So does the HyperX drive we're testing with compressible data. Thanks to write compression, that drive has written less to the flash than its twin being tested with incompressible data.

Now, let's look at the state of the drives from another angle, with a quick battery of performance tests.