I am running out of ways to introduce our SSD Endurance Experiment. This long-term write endurance test began in August, and we’ve published numerous updates since. Now that our subjects have crossed the 500TB mark, it’s time for another checkup.
The rationale for our endurance test hasn’t changed, which is why these intros tend to channel the same theme. Solid-state drives use flash memory that has limited write endurance. Every time data is written, the physical structure of the NAND cells degrades. The cells eventually erode to the point where they become unusable, forcing SSDs to poach replacement blocks from their overprovisioned spare areas.
This dynamic raises several questions. What happens when drives run out of overprovisioned area? How long does it take? And do they slow down along the way? We’re seeking answers in our endurance experiment, which is subjecting a collection of drives—the Corsair Neutron GTX 240GB, Intel 335 Series 240GB, Kingston HyperX 3K 240GB, Samsung 840 Series 250GB, and Samsung 840 Pro 256GB—to a merciless onslaught of writes.
This introductory article explains the finer details of the experiment, so I won’t rehash them here. Our approach is pretty straightforward, though. We’re using the endurance test built into Anvil’s Storage Utilities to write a series of incompressible files to each SSD. We’re also writing compressible data to a second HyperX unit to test the impact of SandForce’s write compression technology. As the experiment progresses, we’re monitoring the health and performance of each drive at regular intervals.
All but one of our test subjects is based on two-bit MLC NAND. The Samsung 840 Series has three-bit TLC flash, and that puts it at a distinct disadvantage versus the others. TLC NAND’s higher bit density increases the storage capacity of the cells, but it also makes the flash more sensitive to wear. Flash memory stores data using a range of voltages that narrows down as the cells degrade. TLC NAND has to differentiate between eight possible values within that narrowing window, which is more difficult than tracking the four values required by the MLC alternative. (Likewise, MLC flash has lower endurance than one-bit SLC NAND.)
We’re keeping an eye on flash health using several methods. So far, the best one seems to be monitoring the raw SMART data reported by each drive. One of the SMART attributes counts the number of sectors that have been reallocated from the spare area to replace retired flash from the user-accessible storage. The reallocated sector count is a death toll of sorts, and it lets us highlight TLC NAND’s more limited endurance rather neatly. The following graph depicts the number of reallocated sectors for each drive over the course of the experiment thus far.
The 840 Series reported its first reallocated sectors after 100TB of writes, and it’s been burning through flash steadily ever since. After 500TB of writes, the 840 Series is up to 1722 reallocated sectors. Meanwhile, the other SSDs have only a handful of flash failures between them. And two of the drives, the Neutron GTX and the HyperX 3K being tested with compressible data, haven’t logged a single reallocated sector.
Samsung won’t confirm the size of the 840 Series’ sectors, but we’re pretty sure it’s 1.5MB. That means the drive has lost 2.5GB of its total flash capacity already. Fortunately, those flash failures haven’t affected the amount of user-accessible capacity. The 840 Series has extra overprovisioned spare area specifically to offset the lower endurance of its TLC NAND. So far, at least, those flash reserves seem to be sufficient.
Although the 840 Series is clearly in worse shape than the competition, these results need to be put into context. 500TB works out to 140GB of writes per day for 10 years. That’s an insane amount even for power users, and it far exceeds the endurance specifications of our candidates. The HyperX 3K, which has the most generous endurance rating of the bunch, is guaranteed for 192TB of writes.
To be fair, we should note that the 840 Series has another blemish on its record. The drive failed several hash checks during the setup process for the unpowered retention test we performed after 300TB of writes. The 840 Series ultimately passed the retention test, but its SMART attributes logged a spate of unrecoverable errors that likely caused the hash failures. In a real-world setting, unrecoverable errors could result in data corruption or even a system crash.
Worryingly, Samsung’s Magician utility seems unaware of the 840 Series’ degrading condition. The software can access the SMART data, but its main interface still proclaims that the drive is in “good” health. The 840 Pro, which has recorded only two reallocated sectors and no unrecoverable errors, has the same “good” health rating.
Third-party software like Hard Disk Sentinel doesn’t necessarily do a better job of monitoring drive health, either. We’re using the app to read the SMART data on each drive, and it has a separate health indicator of its own. The thing is, that value seems to be based on different attributes for each drive. We’re getting wildly different ratings for units that otherwise appear to be in similar condition. HD Sentinel also gives the 840 Series and 840 Pro identical 1% ratings, so it’s hard to take the health indicator seriously.
Now that our health checkup is complete, it’s time to look at performance.
We benchmarked all the SSDs before we began our endurance experiment, and we’ve gathered more performance data at every milestone since. It’s important to note that these tests are far from exhaustive. Our in-depth SSD reviews are a much better resource for comparative performance data. Our goal here is to determine how each SSD’s benchmark scores change as the writes add up.
Despite a few hiccups, our subjects have maintained largely consistent performance throughout the experiment. I can’t explain the higher random read scores for the Kingston and Intel drives earlier on, but it’s worth noting that those drives are all based on SandForce controller tech. They seem to be back to normal now.
Despite obvious signs of flash wear, the 840 Series has shown no signs of weakness in our performance tests. Its 840 Pro sibling stumbled during the last round, though. The drive’s sequential write rate varied more than usual from one run to the next. We extended the test session from three to five runs, but the median speed was ultimately lower than at previous milestones.
The 840 Pro came close to its peak sequential write speed in a couple of test runs, so the recent variability may be a temporary anomaly. However, we have additional data suggesting that the 840 Pro’s write speed may be slowing slightly.
Unlike our first batch of performance results, which were obtained on the same system after secure-erasing each drive, the next set comes from the endurance test itself. Anvil’s utility lets us calculate the write speed of each loop that loads the drives with random data. This test runs simultaneously on six drives split between two separate systems (and between 3Gbps SATA ports for the HyperX drives and 6Gbps ones for the others), so the data isn’t useful for apples-to-apples comparisons. However, it does provide a long-term look at how each drive handles this particular write workload.
From the beginning, the 840 Pro’s average write speed in the endurance test has been the most erratic of the bunch. The other drives exhibit fluctuating speeds from one run to the next, too, but the amplitude of those oscillations has been substantially lower overall. Don’t worry about the occasional performance spikes exhibited by some of the SSDs; those outliers crop up because we secure-erase the drives at every milestone.
Now, look at what happens to the yellow line after the last spike. Note that the highs and lows are slightly lower than they were earlier in the experiment. Hmmm.
Since the 840 Pro’s write speeds in the endurance test have bounced around since we started the experiment, I’m hesitant to draw any firm conclusions about the recent reduction. The 840 Pro definitely exhibits inconsistency with this particular write workload, but we’ve seen it deliver strong all-around performance in a wide variety of benchmarks, so the variability isn’t necessarily a concern on its own. We should have a better sense of what’s going on with the 840 Pro as the experiment pushes past 600TB.
Before signing off until our next update, I need to take care of a little housekeeping. Between 300TB and 400TB of writes, the test rig hosting the Samsung drives and the compressible HyperX config crashed without warning. The event log reported an unexpected loss of connection to the system drive, and that disconnect seems to have caused the crash.
The system drive is a Corsair Force GT 60GB unit left over from an SSD performance scaling article we published nearly two years ago. It does little more than host the operating system for our test rigs, and it’s only written a few terabytes in its lifetime. The drive’s SMART attributes show neither reallocated sectors nor unrecoverable errors, so flash wear appears unrelated to the premature disconnect.
Around the time of the crash, everything about the system seemed fine. The SATA cable was attached, the PSU was pumping out the correct voltages, and the endurance test had been running without issue for days. SandForce-based SSDs of the Force GT’s vintage do have somewhat of a reputation for being finicky, though. To avoid future problems, we’ve replaced the drive with an Intel 510 Series SSD. The machine hasn’t suffered any disconnects since or crashes since, and testing is proceeding smoothly.
As I type this, our subjects are already well on their way to 600TB. We have another data retention test planned for that milestone, so stay tuned.