SSDs are pretty awesome. They're fast enough to provide a palpable improvement in overall system responsiveness and affordable enough that even budget rigs can get in on the action. Without moving parts, SSDs also tolerate rough handling much better than mechanical drives, making them particularly appealing for mobile devices. That's a pretty good all-around combination.
Despite the perks, SSDs have a dirty little secret. Their flash memory may be inherently robust, but it's also fundamentally weak. Writing data erodes the nano-scale structure of the individual memory cells, imposing a ceiling on drive life that can be measured in terabytes. Solid-state drives are living on borrowed time. The question is: how much?
Drive makers typically characterize lifespans in total bytes written. Their estimates usually range from 20-40GB per day for the length of the three- or five-year warranty. However, based on user accounts all over the web, those figures are fairly conservative. They don't tell us what happens to SSDs as they approach the end of the road, either.
Being inquisitive types, we've decided to seek answers ourselves. We've concocted a long-term test that will track a handful of modern SSDs—the Corsair Neutron Series GTX, Intel 335 Series, Kingston HyperX 3K, and Samsung 840 and 840 Pro Series—as they're hammered with an unrelenting torrent of data over the coming weeks and months. And we won't stop until they're all dead. Welcome to the SSD Endurance Experiment.
Why do SSDs die?
Before we dive into the specifics of our experiment, it's important to understand why SSDs wear out. The problem lies within the very nature of flash memory. NAND is made up of individual cells that store data by trapping electrons inside an insulated floating gate. Applied voltages shuffle these electrons back and forth through the otherwise insulating oxide layer separating the gate from the silicon substrate. This two-way traffic slowly weakens the physical structure of the insulator, a layer that is only getting thinner as Moore's Law drives the adoption of finer fabrication techniques.
Another side effect of this electron traffic—tunneling, as it's called—is that some of the negatively charged particles get stuck in the insulator layer. As this negative charge accumulates over time, it narrows the range of voltages that can be used to represent data within the cell. This form of flash wear is especially troublesome for three-bit TLC NAND, which must differentiate between eight discrete values within that shrinking window. Two-bit MLC NAND has only four values to consider.
Flash cells are typically arranged in 4-16KB pages grouped into 512-8096KB blocks. SSDs can write to empty pages directly. However, they can only write to occupied pages through a multi-step process that involves reading, modifying, and then writing the entire block. To offset this block-rewrite penalty, the TRIM command and garbage collection routines combine to move data around in the flash, ensuring a fresh supply of empty pages for incoming writes. Meanwhile, wear-leveling routines distribute writes and relocate static data to spread destructive cycling more evenly across the flash cells. All of these factors conspire to inflate the number of flash writes associated with each host write, a phenomenon known as write amplification.
SSD makers tune their algorithms to minimize write amplification and to make the most efficient use of the flash's limited endurance. They also lean on increasingly advanced signal processing and error correction to read the flash more reliably. Some SSD vendors devote more of the flash to overprovisioned spare area that's inaccessible to the OS but can be used to replace blocks that have become unreliable and must be retired. SandForce goes even further, employing on-the-fly compression to minimize the flash footprint of host writes. Hopefully, this experiment will give us a sense of whether those techniques are winning the war against flash wear.
Clearly, many factors affect SSD endurance. Perhaps that's why drive makers are so conservative with their lifespan estimates. Intel's 335 Series 240GB is rated for 20GB of writes per day for three years, which works out to just under 22TB of total writes. If we assume modest write amplification and a 3,000-cycle write/erase tolerance for the NAND, this class of drive should handle hundreds of terabytes of flash writes. With similarly wide discrepancies between the stated and theoretical limits of most SSDs, it's no wonder users have reported much longer lifespans. Our experiment intends to find out just how long modern drives actually last.
The ideal workload for endurance testing would be a trace of real-world I/O like our DriveBench 2.0 benchmark, which comprises nearly two weeks of typical desktop activity. There's just one problem: it's too darned slow. Reaching the 335 Series' stated limit would take more than a month, and we'd have to wait substantially longer to approach the theoretical limits of the NAND.
We can push SSD endurance limits much faster with synthetic benchmarks. There are myriad options, but the best one is Anvil's imaginatively named Storage Utilities.
Developed by a frequenter of the XtremeSystems forums, this handy little app includes a dedicated endurance test that fills drives with files of varying sizes before deleting them and starting the process anew. We can tweak the payload of each loop to write the same amount of data to each drive. There's an integrated MD5 hash check that verifies data integrity, and the write speed is more than an order of magnitude faster than DriveBench 2.0's effective write rate.
Anvil's endurance test writes files sequentially, so it's not an ideal real-world simulation. However, it's the best tool we have, and it allows us to load drives with a portion of static data to challenge wear-leveling routines. We're using 10GB of static data, including a copy of the Windows 7 installation folder, a handful of application files, and a few movies.
The Anvil utility also has an adjustable incompressibility scale that can be set to 0, 8, 25, 46, 67, or 100%. Among our test subjects, only the SandForce-based Intel 335 Series and Kingston HyperX 3K SSD can compress incoming data on the fly. We'll be testing all the SSDs with incompressible data to even the playing field. To assess the impact of SandForce's DuraWrite tech, we'll also be testing a second HyperX drive with Anvil's 46% "applications" compression setting.
Since the endurance benchmark tracks the number of gigabytes written to the drive, we can easily keep tabs on how the SSDs are progressing. We can also monitor the total bytes written by reading each drive's SMART attributes. All the SSDs we're testing have attributes that tally host writes and provide general health estimates.
There's also a SMART attribute the counts bad blocks, giving us a sort of body count we can attribute to flash wear. As mounting cell failures compromise entire blocks, replacements will be pulled from overprovisioned spare area, reducing the amount of flash available to accelerate performance. To measure how this spare area shrinkage slows down our drives, we'll stop periodically to benchmark the SSDs in four areas: sequential reads, sequential writes, random reads, and random writes. The drives will be secure-erased before each test session, ensuring a full slate of available flash pages. (The static data will be copied back after each endurance test.)
We're not that interested in the performance differences between our guinea pigs; our reviews of each drive cover that subject in much greater detail. Instead, we want to observe how flash wear takes its toll on each drive. Some SSDs may age more gracefully than others.
To make testing practical, we've limited ourselves to one example of each SSD, plus the extra HyperX. Our sample size is too small to provide definitive answers about reliability, but testing six drives will give us a decent sense of the endurance of modern SSDs. Now, let's meet our subjects.