Single page Print

The SSD Endurance Experiment: Two freaking petabytes

The survivors soldier on to another really big number
— 9:53 AM on December 4, 2014

More than a year ago, we drafted six SSDs for a suicide mission. We were curious about how many writes they could survive before burning out. We also wanted to track how each one's performance characteristics and health statistics changed as the writes accumulated. And, somewhat morbidly, we wanted to watch what happened when the drives finally expired.

Our SSD Endurance Experiment has left four casualties in its wake so far. Representatives from the Corsair Neutron Series GTX, Intel 335 Series, Kingston HyperX 3K, and Samsung 840 Series all perished to satisfy our curiosity. Each one absorbed far more damage than its official endurance specification promised—and far more than the vast majority of users are likely to inflict.

The last victim fell at 1.2PB, which is barely a speck in the rear-view mirror for our remaining subjects. The 840 Pro and a second HyperX 3K have now reached two freaking petabytes of writes. To put that figure into perspective, the SSDs in my main desktop have logged less than two terabytes of writes over the past couple years. At this rate, it'll take me a thousand years to reach that total.

So, yeah. Pretty insane. It's time for another check-up.

The story so far
If this is your first encounter with our endurance experiment, I recommend reading this introductory article. It has more details about our subjects, methods, and test rigs than we'll rehash here. Here's the TL;DR version:

The experiment explores a weakness inherent to the very core of flash memory. NAND stores data by trapping electrons inside billions of individual memory cells. The cells are walled off by an insulating layer that normally prevents electrons from getting in or out. Applying voltage to a cell induces electron flow through that barrier via a process called tunneling. Electrons are drawn in when data is written and expelled when data is erased.

Tunneling is a pretty slick feat of nanoscale engineering, but it comes at a cost. The accumulated traffic slowly breaks down the physical integrity of the insulator, degrading its ability to trap electrons in the cell. Some electrons also get caught in the insulator, imparting a negative charge that narrows the cell's usable voltage range. The more that window shrinks, the more difficult it is to read and write data reliably—and quickly.

When cells become more trouble than they're worth, fresh blood is called up from the SSD's overprovisioned "spare" area. These replacement cells ensure the drive maintains the same user-accessible capacity regardless of any underlying flash failures.

Although all SSDs are living on borrowed time, they can take different paths to the end of the road. Intel's 335 Series is designed to go out on its own terms, after a pre-determined volume of writes. Ours took its own life after 750TB—but not before its wear indicator bottomed out and multiple SMART warnings were issued.

Our first HyperX 3K only made it to 728TB. Unlike the 335 Series, which was almost entirely free of failed flash, the HyperX reallocated nearly a thousand sectors before it ultimately expired. Again, though, the wear indicator and SMART warnings provided plenty of notice that the end was nigh.

All but a few of the HyperX's reallocated sectors hit after 600TB of writes. The Samsung 840 Series started reporting reallocated sectors after just 100TB, likely because its TLC NAND is more sensitive to voltage-window shrinkage than the MLC flash in the other SSDs. The 840 Series went on to log thousands of reallocated sectors before veering into a ditch on the last stretch before the petabyte threshold. There was no warning before it died, and the SMART attributes said ample spare flash lay in reserve. The SMART stats also showed two batches of uncorrectable errors, one of which hit after only 300TB of writes. Even though the 840 Series technically made it past 900TB, its reliability was compromised long before that.

Corsair's Neutron GTX was our most recent casualty. Despite being the picture of health up to 1.1PB, it suffered a rash of flash failures over the next 100TB. SMART errors also began to appear, foretelling the drive's imminent doom. The Neutron ultimately reached 1.2PB, and it completed the usual round of tests at that milestone. However, it failed to power up properly after a subsequent reboot.

After the Neutron GTX failed to answer the bell, the 840 Pro and second HyperX 3K pressed on to 2PB without issue. They also completed their fifth unpowered retention test. This time, the SSDs were left unplugged for 10 days. Both maintained the integrity of our 200GB test file.

To be fair, the official JEDEC specs require that drives accurately retain data for much longer unpowered periods. We had to make a few concessions to accelerate the timeline for this experiment.

Our two remaining subjects have passed the same retention tests and absorbed the same volume of writes, but their individual stories are very different. On the next page, we'll take a closer look at how each one is coping with the continuous barrage of incoming data.