Single page Print

Two keep on truckin'
The Samsung 840 Pro and second Kingston HyperX 3K both reached 1.5PB with little drama. They also completed another unpowered retention test. After writing 1.5PB, the drives were loaded with a 200GB test file and then left unplugged for over a week. Both subsequently passed the MD5 hash check we use to verify data integrity.

A second hash check is integrated into Anvil's Storage Utilities, the application we use to write data to the drives. This test is configured to verify a smaller 720MB file after roughly every terabyte of writes, and there haven't been any inconsistencies yet.

Let's examine the survivors in greater detail, starting with the 840 Pro, which continues to accumulate reallocated sectors.

The burn rate has slowed slightly since the initial uptick, but over 3400 sectors have been compromised so far. At 1.5MB each, that's about 5GB of flash lost to cell degradation.

According to the SMART data, less than 40% of the flash reserves have been consumed. There's still plenty on tap to cover future failures.

The wear leveling count is supposed to be related to drive health, but it ran aground after just 500TB, and the 840 Pro has been fine through a petabyte of writes since. The health indicator in Samsung's SSD Magician utility software has given the drive a "good" rating since the beginning of the experiment, which seems like a more accurate assessment. Then again, the same utility gave the 840 Series a clean bill of health even after the drive had suffered hundreds of uncorrectable errors.

Practical limits restrict our experiment to one example of each SSD, but we have two HyperX 3K drives. One was tested like all the others, with randomized data that can't be compressed by the DuraWrite mojo in SandForce controllers. The other has been getting a lighter diet based on the Anvil utility's 46% incompressible setting. You can probably guess which one is still alive.

We can measure the effectiveness of SandForce's compression scheme by tracking host writes, which come from the system, and compressed writes, which are committed to the NAND. The host writes are identical for both HyperX configs, but the compressed writes are not.

The HyperX 3K writes much less to the flash with the partially compressible payload. 1.5PB of host writes translates to only 1.07PB of compressed writes. On the other setup, compressed writes are slightly higher than host writes due to write amplification.

(The sequential transfers that dominate the endurance test have relatively low amplification, at least compared to the more random workloads typical of client systems. DuraWrite's effectiveness in this particular scenario isn't necessarily indicative of how the scheme will perform with other workloads.)

If compression were the only factor in the remaining HyperX's survival, the drive would have hit the wall around 1.1PB, when it reached the same volume of compressed writes that crippled its twin. The built-in health indicator even suggested the end was coming around that mark:

But the flash in this particular SSD has proven surprisingly resilient. Just 12 sectors have been reallocated through 1.5PB, a far cry from the thousands accrued by the other HyperX.

Our sample size isn't large enough to confirm which result is the outlier. Chip-to-chip variance is common in semiconductor manufacturing, though. Some dies are simply better than others, whether it's the clock speeds that CPUs can attain or the write/erase cycles that NAND can survive.

The two HyperX SSDs arrived at the same time, and we used the highly scientific "eeny, meeny, miny, moe" method to determine which one got the partly compressible workload. If that drive also had a few cherry chips under the hood, it got lucky twice—and should probably buy a lottery ticket.

Digging deeper into the SMART data reveals that the surviving HyperX hasn't been entirely flawless.

We didn't notice it at the time, but the drive reported two uncorrectable errors between 900TB and 1PB of writes. Those episodes occurred during the same span as the first two reallocated sectors, though we can't know for sure if the two are related. In any case, uncorrectable errors are very serious. They can corrupt data, crash applications, and even bring down entire systems.

The program and erase failures aren't as critical. In those cases, the drive should be able to move on to another sector without risking the user's data. Performance may suffer, but only momentarily.

Speaking of performance, the next page explores whether any of the SSDs lost a step over the last stretch.