Single page Print

More casualties
There are actually two Kingston HyperX 3K SSDs in the experiment. One is being tested like all the others, with 100% incompressible data that's immune to SandForce's DuraWrite compression mojo. The second HyperX is identical to the first, but it's getting a stream of compressible data via Anvil's "applications" preset.

The HyperX's SMART attributes log host and flash writes separately, giving us a glimpse of DuraWrite in action. After 700TB of writes from the host, the incompressible HyperX config showed 738TB of flash writes, while its compressible sidekick indicated only 501TB.

As one might expect, the imminent-failure warning came from the incompressible drive. The warning was displayed by both HD Sentinel and the Intel storage driver used on our test systems. Then, after 725TB of writes, we got another cautionary message, this time from the OS. "Windows detected a hard disk problem," read the dialog box, "Back up your files immediately to prevent information loss." 3TB later, Anvil started reporting write errors. The drive was still accessible, and we were able to dump one last batch of SMART data, but it bricked after a reboot.

On the HyperX 3K, the SSD life left attribute tracks flash wear. Like Intel's media wearout indicator, it counts down from 100 and is tied directly to the rated lifespan of the NAND.

When this attribute reaches 10, the flash's specified endurance has been exhausted, and the SMART warning is triggered. Kingston urges users to back up their data and move to a new SSD at this point. The firm describes the SMART message as being similar to the warning light on a car's gas gauge. There's still some fuel in the system when the light comes on, but you should pull over at next opportunity to fill up.

The HyperX is designed to keep writing for as long as the NAND is viable, regardless of its rated endurance. Flash blocks are only retired if there's a programming failure, an erase failure, or if the acceptable ECC tolerance has been exceeded.

Programming and erase failures are logged by separate SMART attributes, and they really ramped up toward the end of the drive's life. So did the number of reallocated sectors. By the end, there were 986 reallocated sectors, 111 programming failures, and 381 erase failures. Those figures suggest about half of the retired sectors were taken out of commission due to ECC issues.

The HyperX 3K has loads of overprovisioned area, but sections of it are reserved for internal management routines and for RAISE, the RAID-like redundancy feature available in SandForce SSDs. Only a small portion is dedicated to "spare" blocks that can fill in for reallocated sectors. Once this extra NAND is consumed, the HyperX is finished. Kingston says the drive will fail to mount if the power cycles, which explains why ours wasn't detected after a reboot.

Unlike the 335 Series, which checked out on its own terms, the HyperX appears to have failed after burning through all of the NAND available for writes. We still received multiple warnings before the failure, and there was additional write headroom after each one. A normal user would have had plenty of time to prepare for the failure.

Our third casualty was the Samsung 840 Series, which we expected to fail first due to the shorter theoretical lifespan of TLC NAND. Our accumulated SMART data supported that assumption, too. The 840 Series started logging reallocated sectors after only 200TB of writes, and it's reported thousands of them since our experiment began—far more than any other SSD. However, the 840 Series also allocates more spare area to replace bad blocks, so it's tuned with the TLC's relative frailty in mind.

When we checked on the SSDs after 900TB of writes, the 840 Series was still functional, and Samsung's own SSD Magician software gave it a clean bill of health. The 840 Series didn't make it to a petabyte, though. It died suddenly in the last leg, without any preceding SMART warnings.

We're not entirely sure what caused the failure. The Anvil utility crashed, and the drive disappeared from not only the Windows device and disk managers, but also from the SSD Magician and HD Sentinel utilities. The Intel storage driver detected the 840 Series as an unnamed Samsung SATA drive, but we couldn't actually do anything with it. We weren't even able to grab a log of the last batch of writes or a final accounting of the SMART status. We can, however, analyze the SMART data collected up to 900TB.

The wear-leveling count is sort of like the MWI and life-left attributes on the Intel and Kingston SSDs. It's "directly related to [the] lifetime of the SSD," according to Samsung, and it bottomed out after 300TB of writes. HD Sentinel bases its health estimate on this attribute, so it's had a dim assessment of the 840 Series since the 300TB mark. But Samsung's own software pronounced the drive in good health after 300TB, as it did at every subsequent milestone.

The SMART attributes also track how much of the 840 Series' spare block reserve has been consumed by reallocated sectors. That attribute suggested there were plenty of spare blocks at the 900TB mark, so the flash's mortality rate would have to have spiked dramatically for insufficient reserves to cause the eventual failure. Without SMART details from the time of death, we can't be certain about what happened. We can quantify the reallocated sectors along with another important attribute: uncorrectable errors.

Uncorrectable errors can compromise data integrity and potentially cause application or system crashes, so they're kind of a big deal. The first bunch appeared after 300TB of writes, apparently during preparation for our first unpowered retention test. The 200GB file we use to check data integrity failed multiple initial hash checks and had to be recopied before proceeding. Although the 840 Series ultimately passed the retention test and a similar one after 600TB of writes, the uncorrectable errors put a mark on its permanent record.

Between 800 and 900TB of writes, the 840 Series logged 119 more uncorrectable errors, bringing the total to 295. Anvil didn't report any hash failures during that period, but we have its built-in integrity test set to run relatively infrequently—after each 1TB of writes—and on a 700MB file, that covers only a small portion of the flash. Regardless of whether the last spate of uncorrectable errors resulted in incorrect data, it's probably no coincidence the 840 Series died shortly after.

When we kicked off this experiment, Samsung told us to expect warning messages before the 840 Series' demise. Failure would resemble a compatibility error, the company said, and it could manifest in a BSOD or "other failure notice." Since we didn't get any warnings or failure messages, something may have gone awry at the end of the line. The 840 Series' lifeless body is being returned home for further analysis, which we hope will shed light on the drive's final moments.

All these casualties are bumming me out, so let's turn our attention to the survivors...