The primary goal of this experiment is to see how many writes each SSD can take before it dies. Problems may crop up before the drives stop responding completely, though. We need to know if the SSDs are still viable—not just if they're still alive.
Anvil's endurance benchmark has an integrated MD5 test that provides some help on this front. We have it configured to verify the integrity of a 700MB video file pre-loaded on each drive. The file is part of 10GB of static data that sits on the SSDs during the endurance test. Even though that data isn't disturbed as the endurance test runs, wear-leveling algorithms should move it around in the flash as writes accumulate.
Thus far, the built-in hash check hasn't reported any errors. As several of our readers have pointed out, though, the integrated test doesn't tell us whether data is retained accurately when the system is powered off. We actually considered making unpowered retention testing a staple of our regular check-ups. However, that kind of testing involves days of inactive downtime that we'd rather spend writing to the drives.
With our 840 Series sample clearly wilting, we decided it was worth sacrificing some time on an unplugged retention test. Our 700MB movie file is relatively small, so we swapped in a 200GB TrueCrypt file nearly large enough to fill each drive. Then something odd happened. While running an initial MD5 check on the file we copied, the 840 Series produced an unexpectedly incorrect result. We hashed the file again, and the result was still incorrect. This time, the hash test produced an entirely different string. Third time's the charm? Nope. Strike three, and another different result.
All the other SSDs passed the initial MD5 check, so we started over with the 840 Series. We re-copied our TrueCrypt file, and the results were correct the first, second, and third time we hashed it. So we repeated the process again. Once more, the 840 Series passed three times in a row. We couldn't reproduce the initial mismatches.
Puzzled, we shut down our test systems and proceeded with the unpowered portion of the retention test. Five days later, we fired them up again and checked the files. All the drives passed, including the 840 Series.
For a moment, I thought I'd imagined those initial errors. But no, I took screenshots. The SMART attributes also provide corroborating evidence. Before the retention test, the 840 Series' unrecoverable error count was zero. The drive now says it's suffered 172 unrecoverable errors. Something went seriously wrong, and Samsung's error correction mechanism was unable to compensate.
Even though our 840 Series drive appears to have rebounded, it suffered a serious failure. In a normal desktop system, unrecoverable errors could result in permanent file corruption and data loss. I certainly wouldn't trust our test subject with my own data anymore. Since the drive appears to be operating normally again, we'll keep it in the experiment, albeit with a black mark on its record.
Disappointingly, only the Samsung and Kingston SSDs have SMART attributes that track unrecoverable errors. So far, the 840 Pro and HyperX drives are free of unrecoverable errors. The Corsair Neutron GTX only tallies "soft ECC correction" events, and it doesn't report any of those. We're in the dark with the Intel 335 Series, whose SMART attributes are devoid of error-related variables.
We benchmarked all the SSDs before we began our endurance experiment, and we've gathered more performance data at every milestone since. It's important to note that these tests are far from exhaustive. Our in-depth SSD reviews are a much better resource for comparative performance data. What we're looking for here is how each SSD's benchmark scores change as the writes add up.
Apart from a few anomalies tied to the HyperX drives in the 4KB random read test, all the SSDs are maintaining reasonably consistent performance as the endurance experiment progresses. Even the Samsung 840 Series shows no ill effects.
These tests were conducted with the SSDs connected to the same SATA port in the same system. The drives were secure-erased before testing, giving us a nice apples-to-apples comparison. We also have performance data from the endurance test itself. These numbers track the speed of each loop, which writes about 190GB to the drives. The results are somewhat less reliable, because the endurance test is running simultaneously on six drives split between two test machines. The Corsair, Intel, and Samsung SSDs are connected to 6Gbps SATA ports, while the Kingston drives are limited to 3Gbps connectivity. Keeping those caveats in mind, we can still get a sense of how each SSD's write speed changes over the course of the experiment.
The Samsung 840 Pro's write speed spiked dramatically in the first run after our 200TB check-up. Since we secure-erase the drives after each threshold, that result isn't unexpected. Performance typically increases after a secure erase, and some of the other SSDs exhibit similar behavior. The 840 Pro spiked higher than it did previously because it only wrote 145GB during that first run. There's no indication of why the Anvil test stopped short of the prescribed 190GB, and there were no issues with subsequent runs. The 840 Pro's SMART attributes don't report any errors or programming failures, either.
Apart from that outlier, there's little change from our post-200TB results. All the SSDs are running the endurance test at about the same speed as they were at the last milestone.
Lessons learned so far
The most important thing to take away from our experiment is that modern SSDs can survive an awful lot of writes without issue. We're up to 300TB, and all the drives remain functional. The MLC-based models are holding up nicely, with only a handful of bad blocks between them. The TLC NAND in the Samsung 840 Series is degrading much faster, which we expected given the flash's higher bit density. However, the drive still has plenty of overprovisioned spare area in reserve. And, like the other SSDs, the 840 Series has maintained largely consistent performance overall.
Only a couple of the SSDs have published endurance specifications, and we've already blown past those figures. The Kingston HyperX 3K is rated for only 192TB of total writes, while the Intel 335 Series is good for 20GB of "typical client" writes for three years, or just 22TB overall. We've also far exceeded the volume of writes I'd expect my own SSD to endure over its useful lifetime. The solid-state system drive in my primary desktop has logged a mere 1.3TB of writes since I installed it 18 months ago.
To be fair, our endurance experiment has lower write amplification than typical client workloads. Anvil's test is comprised almost entirely of sequential writes, while real-world desktop activity involves a lot of random I/O. There isn't a whole lot of data on the typical write amplification for client workloads, but everything I've seen and heard from SSD makers suggests a multiplication factor below 10X. If we take my personal usage patterns as an example and use 10X write amplification as a worst-case scenario, it would take nearly 35 years to write 300TB to the flash.
So, yeah, that's why we're not using real-world I/O in our endurance experiment. We wouldn't be able to get results within a reasonable timeframe.
The data we've collected suggests that modern SSDs can easily survive many years of typical desktop use. Even TLC-based offerings should have more than enough endurance to handle what the vast majority of consumers will throw at them. That said, mounting flash failures appear to be responsible for the data integrity errors we encountered on the 840 Series. I would have no qualms about using TLC-based SSDs in my own systems, but I would check the SMART attributes periodically to keep an eye out for reallocated sectors. If those start piling up, it's a good idea to replace the drive. As we saw with the 840 Series, error correction can't necessarily keep up as flash failures accelerate.
From the beginning, we knew the 840 Series would be at a disadvantage versus its MLC-based rivals. The results bear that out, and they indicate we probably have a long way to go before the other SSDs start to falter. That's good news overall, but it means there's much more writing to do. Stay tuned.