The SSD Endurance Experiment: 500TB update

I am running out of ways to introduce our SSD Endurance Experiment. This long-term write endurance test began in August, and we’ve published numerous updates since. Now that our subjects have crossed the 500TB mark, it’s time for another checkup.

The rationale for our endurance test hasn’t changed, which is why these intros tend to channel the same theme. Solid-state drives use flash memory that has limited write endurance. Every time data is written, the physical structure of the NAND cells degrades. The cells eventually erode to the point where they become unusable, forcing SSDs to poach replacement blocks from their overprovisioned spare areas.

This dynamic raises several questions. What happens when drives run out of overprovisioned area? How long does it take? And do they slow down along the way? We’re seeking answers in our endurance experiment, which is subjecting a collection of drivesβ€”the Corsair Neutron GTX 240GB, Intel 335 Series 240GB, Kingston HyperX 3K 240GB, Samsung 840 Series 250GB, and Samsung 840 Pro 256GBβ€”to a merciless onslaught of writes.

This introductory article explains the finer details of the experiment, so I won’t rehash them here. Our approach is pretty straightforward, though. We’re using the endurance test built into Anvil’s Storage Utilities to write a series of incompressible files to each SSD. We’re also writing compressible data to a second HyperX unit to test the impact of SandForce’s write compression technology. As the experiment progresses, we’re monitoring the health and performance of each drive at regular intervals.

All but one of our test subjects is based on two-bit MLC NAND. The Samsung 840 Series has three-bit TLC flash, and that puts it at a distinct disadvantage versus the others. TLC NAND’s higher bit density increases the storage capacity of the cells, but it also makes the flash more sensitive to wear. Flash memory stores data using a range of voltages that narrows down as the cells degrade. TLC NAND has to differentiate between eight possible values within that narrowing window, which is more difficult than tracking the four values required by the MLC alternative. (Likewise, MLC flash has lower endurance than one-bit SLC NAND.)

We’re keeping an eye on flash health using several methods. So far, the best one seems to be monitoring the raw SMART data reported by each drive. One of the SMART attributes counts the number of sectors that have been reallocated from the spare area to replace retired flash from the user-accessible storage. The reallocated sector count is a death toll of sorts, and it lets us highlight TLC NAND’s more limited endurance rather neatly. The following graph depicts the number of reallocated sectors for each drive over the course of the experiment thus far.

The 840 Series reported its first reallocated sectors after 100TB of writes, and it’s been burning through flash steadily ever since. After 500TB of writes, the 840 Series is up to 1722 reallocated sectors. Meanwhile, the other SSDs have only a handful of flash failures between them. And two of the drives, the Neutron GTX and the HyperX 3K being tested with compressible data, haven’t logged a single reallocated sector.

Samsung won’t confirm the size of the 840 Series’ sectors, but we’re pretty sure it’s 1.5MB. That means the drive has lost 2.5GB of its total flash capacity already. Fortunately, those flash failures haven’t affected the amount of user-accessible capacity. The 840 Series has extra overprovisioned spare area specifically to offset the lower endurance of its TLC NAND. So far, at least, those flash reserves seem to be sufficient.

Although the 840 Series is clearly in worse shape than the competition, these results need to be put into context. 500TB works out to 140GB of writes per day for 10 years. That’s an insane amount even for power users, and it far exceeds the endurance specifications of our candidates. The HyperX 3K, which has the most generous endurance rating of the bunch, is guaranteed for 192TB of writes.

To be fair, we should note that the 840 Series has another blemish on its record. The drive failed several hash checks during the setup process for the unpowered retention test we performed after 300TB of writes. The 840 Series ultimately passed the retention test, but its SMART attributes logged a spate of unrecoverable errors that likely caused the hash failures. In a real-world setting, unrecoverable errors could result in data corruption or even a system crash.

Worryingly, Samsung’s Magician utility seems unaware of the 840 Series’ degrading condition. The software can access the SMART data, but its main interface still proclaims that the drive is in “good” health. The 840 Pro, which has recorded only two reallocated sectors and no unrecoverable errors, has the same “good” health rating.

Third-party software like Hard Disk Sentinel doesn’t necessarily do a better job of monitoring drive health, either. We’re using the app to read the SMART data on each drive, and it has a separate health indicator of its own. The thing is, that value seems to be based on different attributes for each drive. We’re getting wildly different ratings for units that otherwise appear to be in similar condition. HD Sentinel also gives the 840 Series and 840 Pro identical 1% ratings, so it’s hard to take the health indicator seriously.

Now that our health checkup is complete, it’s time to look at performance.

Performance

We benchmarked all the SSDs before we began our endurance experiment, and we’ve gathered more performance data at every milestone since. It’s important to note that these tests are far from exhaustive. Our in-depth SSD reviews are a much better resource for comparative performance data. Our goal here is to determine how each SSD’s benchmark scores change as the writes add up.

Despite a few hiccups, our subjects have maintained largely consistent performance throughout the experiment. I can’t explain the higher random read scores for the Kingston and Intel drives earlier on, but it’s worth noting that those drives are all based on SandForce controller tech. They seem to be back to normal now.

Despite obvious signs of flash wear, the 840 Series has shown no signs of weakness in our performance tests. Its 840 Pro sibling stumbled during the last round, though. The drive’s sequential write rate varied more than usual from one run to the next. We extended the test session from three to five runs, but the median speed was ultimately lower than at previous milestones.

The 840 Pro came close to its peak sequential write speed in a couple of test runs, so the recent variability may be a temporary anomaly. However, we have additional data suggesting that the 840 Pro’s write speed may be slowing slightly.

Unlike our first batch of performance results, which were obtained on the same system after secure-erasing each drive, the next set comes from the endurance test itself. Anvil’s utility lets us calculate the write speed of each loop that loads the drives with random data. This test runs simultaneously on six drives split between two separate systems (and between 3Gbps SATA ports for the HyperX drives and 6Gbps ones for the others), so the data isn’t useful for apples-to-apples comparisons. However, it does provide a long-term look at how each drive handles this particular write workload.

From the beginning, the 840 Pro’s average write speed in the endurance test has been the most erratic of the bunch. The other drives exhibit fluctuating speeds from one run to the next, too, but the amplitude of those oscillations has been substantially lower overall. Don’t worry about the occasional performance spikes exhibited by some of the SSDs; those outliers crop up because we secure-erase the drives at every milestone.

Now, look at what happens to the yellow line after the last spike. Note that the highs and lows are slightly lower than they were earlier in the experiment. Hmmm.

Since the 840 Pro’s write speeds in the endurance test have bounced around since we started the experiment, I’m hesitant to draw any firm conclusions about the recent reduction. The 840 Pro definitely exhibits inconsistency with this particular write workload, but we’ve seen it deliver strong all-around performance in a wide variety of benchmarks, so the variability isn’t necessarily a concern on its own. We should have a better sense of what’s going on with the 840 Pro as the experiment pushes past 600TB.

Before signing off until our next update, I need to take care of a little housekeeping. Between 300TB and 400TB of writes, the test rig hosting the Samsung drives and the compressible HyperX config crashed without warning. The event log reported an unexpected loss of connection to the system drive, and that disconnect seems to have caused the crash.

The system drive is a Corsair Force GT 60GB unit left over from an SSD performance scaling article we published nearly two years ago. It does little more than host the operating system for our test rigs, and it’s only written a few terabytes in its lifetime. The drive’s SMART attributes show neither reallocated sectors nor unrecoverable errors, so flash wear appears unrelated to the premature disconnect.

Around the time of the crash, everything about the system seemed fine. The SATA cable was attached, the PSU was pumping out the correct voltages, and the endurance test had been running without issue for days. SandForce-based SSDs of the Force GT’s vintage do have somewhat of a reputation for being finicky, though. To avoid future problems, we’ve replaced the drive with an Intel 510 Series SSD. The machine hasn’t suffered any disconnects since or crashes since, and testing is proceeding smoothly.

As I type this, our subjects are already well on their way to 600TB. We have another data retention test planned for that milestone, so stay tuned.

Comments closed
    • LoveIt
    • 6 years ago

    The experiment you guys are doing are wow unbelievable thank you very much, I just had to register to your website.
    How would a ssd performe with multiple guests vbox machines running on my home pc. would it kill the ssd quick. I have windows 7 ultimate 64 on which the linux vbox machine are running and I am scared to switch to an ssd drive because of the constant writing from the virtual linux vbox guests machines.

    • Chrispy_
    • 6 years ago

    So, although TLC is unlikely to expire in a consumer workload, it is inferior to MLC because it does wear out.

    I though the whole point of TLC was to deliver double the capacity using the same amount of physical NAND, so why aren’t 250GB TLC drives priced closer to 120GB MLC drives? As a consumer we’re still being ripped off I think.

      • TheEldest
      • 6 years ago

      It’s 50% higher capacity compared to MLC drives. But most companies will do more over provisioning than an MLC drive so you get closer to 30-40% higher capacity at the same price.

    • Dirge
    • 6 years ago

    This is why I love TR. I am sure many of us have wondered about SSD endurance over time.

    I am interested to see what sort of warning you receive when the drives start failing. Oh and how they handle re-allocated sectors and data integrity.

    • deruberhanyok
    • 6 years ago

    This has been a fantastic series of articles, guys. Keep up the good work!

    • ronch
    • 6 years ago

    Hey guys, don’t worry about your SSDs so much. I got my Samsung 840 EVO 250GB about 15 days ago and it probably has one of the lowest write endurance figures in the world of SSDs, barring cheap, lesser drives from third-rate manufacturers. Am I worried about wearing it out? No. I’m worried more about bricking it after a power failure. And I don’t have a UPS either. Does that worry me? Not so much either. We’ll see just how reliable and durable these things are in real life usage scenarios.

      • Firestarter
      • 6 years ago

      The EVO probably has better write endurance for desktop tasks than the Samsung 840 (not EVO or Pro) that is being tested in this series of articles. The cache is not only meant to increase performance, but also to reduce write amplification by caching the small writes and writing them to the main storage in a single batch.

        • balanarahul
        • 6 years ago

        840 is built on 21 nm flash and the EVO is built on 19 nm flash. So the endurance might as well be similar to the 840.

        IMHO the lower capacity SSDs should have higher binned NAND to slightly compensate for the lack of chips for wear levelling.

    • PixelArmy
    • 6 years ago

    Thank you for validating my Neutron GTX purchase! Taking a chance on the LAMD controller seems to have paid off…

    • lilbuddhaman
    • 6 years ago

    Forgive my lack of knowledge (and laziness to research it myself), but what physical/environmental factors would speed up the degradation of these cells? Would a high temperature, high humidity area cause them to fail quicker? How about just age? Do the cells have any measurable degradation when not being used at all?

    I ask as I wonder how a moderately used SSD will behave after its been disconnected, sat in a box for a few years, then plugged back in; like I’ve done with old mechanical drives from time to time (that i’ve found when cleaning out the attic).

      • Chrispy_
      • 6 years ago

      NAND failure is caused by oxide degredation each time a high voltage is passed through a cell. As far as I understand, it’s not susceptible to heat, magnetic fields, impact or any of the obvious weaknesses magnetic storage has, but it’s more susceptible to damage through ESD.

      Humidity and temperature make no difference to NAND, at least not in the ranges that you should have in a computing device. The risk with flash drives is, like most other things, other components corroding or degrading over time, and not really the NAND itself.

        • meerkt
        • 6 years ago

        I don’t have concrete data, but I do think the flash cells are the weakest part in SSDs. ICs and other electronic parts can happily work for many years.

      • meerkt
      • 6 years ago

      I don’t think you should rely on SSDs working after sitting idle for a few years. Data retention, as it’s called, is not something flash is good at for long term. It’s also rather unproven and undocumented so far, as far as I’ve seen, at least not publicly.

      The only official thing I know is that according to the JEDEC standard for SSDs, consumer drives, once they’ve exhausted their program/erase cycles, are supposed to retain the data for one year. I don’t know what it is for new drives. Additionally, it appears that newer generation drives are progressively getting worse at data retention.

      Temperature is a factor. The JEDEC standard mentions different data retention figures for different temperatures. If I remember correctly, the 1 year consumer drive retention once P/E cycles are exhausted is for 25C. It’s worse at higher temperatures.

    • boomshine
    • 6 years ago

    Can anyone help me here? I will be building a database server with 4 SSDs in RAID 10 win server 2012 R2 standard will I have a problem with this kind of setup where I think TRIM is not supported by RAID10? Or the server will be good even TRIM is disabled? Your comments will greatly help me πŸ™‚

      • davidbowser
      • 6 years ago

      If you are using consumer grade SSDs, then software RAID 10 is the only way you will get TRIM. Make sure the SATA is setup for AHCI in the BIOS and then use Windows Disk Management to create the RAID 10 set.

      Some Intel chipsets support RAID 0 in hardware, but I don’t think RAID 10 works with TRIM.

      [url<]https://techreport.com/news/23430/intel-brings-trim-support-to-ssd-raid-arrays[/url<]

    • DavidC1
    • 6 years ago

    People like to have a “standard” set mind. They want to associate one tech with one number. So “all” SSDs must be same, and “all” HDDs must be same.

    Truth is, like every other product, there are good products and bad ones. But SSDs can be a LOT better than HDDs, even though you might find SSDs that are worse than HDDs(and they are getting better).

    Having not moving parts on a NAND chip versus moving platter on a regular HDD doesn’t mean its automatically better in reliability, and even speed, but there is a big POTENTIAL for the NAND flash SSDs to be LOT better than platter HDDs, while the latter is practically standstill in reliability and speed.

    • mkk
    • 6 years ago

    Jolly good work. Now further testing is clearly going into the esoteric zone. Personally I’d call it done and start again with a different set of drives some time in the future. Your work is pretty much done here, and it’s been good learning for a lot of us.

    • cheesemon
    • 6 years ago

    De-lurking to say this is definitely my favourite tech experiment out there. It’s answering many long-time reliability questions about SSD I’ve had, so kudos to Techreport for tackling this and looking forward to every update.

    One drive that I wish TR did test is the Sandisk Extreme II. Since it’s an SLC hybrid, could we assume that it would be more reliable than any of these drives, or is it just a marketing gimmick?

    • indeego
    • 6 years ago

    I absolutely love that the unrelated SSD in the system crashed/hiccuped. Great series of updates!

      • UberGerbil
      • 6 years ago

      I agree. If one who commits a felony is a felon, then the universe is an iron.

    • slaimus
    • 6 years ago

    The conventional wisdom was that as the NAND cells wear out, read performance will drop as it becomes harder and harder to figure out the correct voltage as more electrons are trapped.

    It seems the Samsung TLC just plain drops the cell rather than try to read it several times, as the read performance looks pretty constant.

    • UnfriendlyFire
    • 6 years ago

    How long would it take for the fastest 7,200 RPM HDD to write 500 TB of data?

      • derFunkenstein
      • 6 years ago

      At ~150MB/sec (average of lots of current 7200RPM drives), they can write 150*60*60 = 540,000 MiB/hr, or 527GiB/hr, or .51498 TiB/hr. So like 1000 hours, or 41 2/3 days, right around 6 weeks of nonstop thrashing. That would be all that drive could do, and the PC couldn’t be so busy as to not be able to keep it rolling.

        • UberGerbil
        • 6 years ago

        Of course if the process isn’t careful to keep the drive from getting fragmented (ie you’re writing files rather just raw low-level writes) then the actual sustained throughput is going to drop significantly.

    • Cranx
    • 6 years ago

    It seems to me that we should also question just how accurate the reporting is from these drives. It’s plausible the controlle’s on some of the drives just don’t report reallocated sectors. Hopefully I’m wrong though.

    • ronch
    • 6 years ago

    Given how durable SSDs seem to be as shown by these tests, I think people should actually be worried more about THEIR longevity than the longevity of their SSDs.

    • ronch
    • 6 years ago

    Ok, this series of articles prove how durable SSDs actually are and there really isn’t much reason to worry about writing to your SSD and wearing it out quickly. But SSDs aren’t actually troubl-free and they die occasionally, bringing all your data down with it. The most worrying perhaps, is what happens when there’s a sudden power failure? Will the SSD get bricked? Will it lose data? I think tests that try to answer this concern should be in the cards.

      • Norphy
      • 6 years ago

      All of that applies to standard spinning rust hard drives as well. Why do you think that SSDs are any different in that regard?

    • Modivated1
    • 6 years ago

    This is a very interesting article! It’s good to see that all the drives have surpassed their warrantied specifications and are still doing well.

    To me this means that even if I bought a cheaper drive the odds that it would present an issue is nearly impossible. Only businesses can possibly utilize the endurance of these SSD’s and that would be a challenge even for them.

    I wonder will all the drives cross the petabyte line? Guess we will have to wait and see.

    • jessterman21
    • 6 years ago

    I’ve been preaching the SSD gospel to my manager in our IT dept for a while now, and thanks to this ongoing test, I’ve got the data to back it up! He’s promised to start ordering Dell laptops with SSDs in the future.

    • meerkt
    • 6 years ago

    Wouldn’t it be more interesting to focus just on retention now?

    • meerkt
    • 6 years ago

    Why does the Corsair just keep on getting faster? πŸ™‚ Plus, no speed change after full erases.

    • Ochadd
    • 6 years ago

    Thanks for the experiment. Really does take away the doubt that lingered about SSD longevity. I’ve got a Vertex 4 256 as my OS and gaming drive that recorded 6.5 TB from December 2012 to December 2013. Looks like I’m good for another 76ish years at least.

    • mesyn191
    • 6 years ago

    I like how you keep arranging the SSD’s in different artsy piles.

    The wear data is great too of course.

    But more creative SSD art piles is what will truly decide if the next article is great.

    IMO.

      • Dissonance
      • 6 years ago

      You may be disappointed by future endurance articles. I’m running out of ways to stack these things πŸ˜‰ Perhaps I need to start incorporating props. Paging Soundwave.

        • UberGerbil
        • 6 years ago

        Fortunately you’ve got some time to think about it.

        • SomeOtherGeek
        • 6 years ago

        How about stacking them like card towers? You know, led one against another and build up. Could be a nice challenge for you.

        Anyway, nice write-up and keep up the good work.

    • tanker27
    • 6 years ago

    [quote<]140GB of writes per day for 10 years[/quote<] That would be some pr0n collection. πŸ˜‰

      • willmore
      • 6 years ago

      On a tiny 240GB drive? Are you kidding?

        • tanker27
        • 6 years ago

        Well if your machine uses the it as the target for the downloads, recompile, & name then moves it to a file store drive then yes.

          • UberGerbil
          • 6 years ago

          Somebody is an expert.

        • MadManOriginal
        • 6 years ago

        It’s not the size of the drive, but the motion of the megabytes.

          • UberGerbil
          • 6 years ago

          At least, that’s what she tells you.

    • oldDummy
    • 6 years ago

    Great stuff.
    Question: Does the crash invalidate the test in any way?
    Don’t think so, just asking.
    Thanks for the effort.
    This is what it’s all about.

      • Dissonance
      • 6 years ago

      It shouldn’t. The Anvil app is hosted on each of the drives individually, so the interruption really only screwed up one of the endurance test runs. Those amount to ~200GB each–not a lot in the grand scheme of things.

    • balanarahul
    • 6 years ago

    Do you think it would be beneficial to ‘not’ have Turbowrite at or above 500 GB and focus more on parallism?? I ask this because a 250 GB 840 Evo achieves about 260 MB/s after it runs out of TW cache. So I would expect the 500 GB or above Evo to atleast reach 500 MB/s.

    • puppetworx
    • 6 years ago

    Thank Jesus my 120GB 840 is still sitting at 1.52TB written after 6 months of use. I actually stopped installing single player games onto my 840 after the last SSD Endurance update, now I install them on a 1TB HDD. Only slightly paranoid.

      • Firestarter
      • 6 years ago

      [quote<]Only slightly paranoid.[/quote<] Overly paranoid. When are you going to break 20TB written? In 5 years? By that time the drive is just getting warmed up.

      • ronch
      • 6 years ago

      I just got my 840 EVO 250GB last Dec. 26 and have been using it for 15 days now (although some days I barely used the computer containing it). Right now Samsung Magician says I’ve written 0.13TB on it already, which is a whopping 130GB. That’s about 10GB a day. I can’t imagine I’ve written that much already and I don’t know how the drive counts written bytes or writes data onto itself. At first I was kinda worried but then, I realized, even if I write 50GB every single day and assume the drive can reach up to just 300TB, this thing will still last 16 years. And even if it reaches just 10 years I would already have made the most out of this investment. It’s not like I paid a million bucks for this thing. Besides, what’s the point of spending money on an SSD only to install your games on an HDD? You should’ve just kept the SSD inside the box and never use it if you’re [u<]THAT[/u<] worried about wearing it out.

        • Melvar
        • 6 years ago

        I think a lot of people are paranoid about how much data they write to their first SSD (even though they never give a thought to how many millions of miles their hard disks travel in a 3 inch circle). I certainly was at first.

        The big thing this test has done for me, aside from giving me confidence to just use the drive and not worry about it, is it made me realize that these drives aren’t just good for one system and then you throw them away because they’re worn out. My 120GB 840, which has about the lowest write durability of any SSD, will still be good for another system or two as a hand-me-down part after I’m done using it as my main boot drive.

          • UberGerbil
          • 6 years ago

          People can also freak themselves out in the early days — installing an OS and full suite of apps is hardly reflective of normal daily use (for most of us, anyway) — so looking at the wear information after that an extrapolating is generally invalid.

          I just looked at the Samsung 830 I installed as an OS drive in my daily work system almost exactly two years ago, and it has just passed the 10TB threshold — plus it has ~10% unallocated space in case the wear-levelling ever needs it. Needless to say, I’m not worried.

            • Firestarter
            • 6 years ago

            the 830 I got 2 years ago hasn’t even gotten 4TB of writes, goes to show that a gaming system doesn’t put nearly as much stress on the SSD as puppetworx seems to think

          • puppetworx
          • 6 years ago

          My first SSD (80GB Intel X25-M) failed just within the warranty period (3 years), that’s probably responsible for any irrational pangs of paranoia I get. I only ever keep software on my SSDs so I’m not too worried. The drive failed slowly at first and became progressively worse which lead to more frequent sporadic crashing – [i<]that[/i<] easily could've resulted in data loss, but thankfully didn't. It took my swapping and testing almost every component in my PC before I found out it was the SSD because System File Checker and Intel's SSD Toolbox didn't show any problems with the drive for the first few months when the PC would randomly crash. The video last week about memory card hacking got me thinking; the smallest size SSDs are probably comprised of all the binned chips from the larger size drives, probably with 50% or more of the chips' transistors unusable and relying on the algorithm of the drive controller for integrity. That might suggest that smaller SSDs (like mine) will fail earlier. If the 240GB model fails at around 200TB and has the same number of chips (presumably, I don't [i<]know[/i<] that they do) doesn't that suggest the 120GB will fail at around 100TB? I be as well also note that TR is only testing one drive of each model - not a statistically relevant sample size, it's likely that some of these drives came from a better or worse than average batch of chips. I appreciate it enormously however, any data is better than none and this will be a great guideline. All that said, I was actually aiming to be jocular with my first post, and failed in spectacular form. Having an SSD doesn't keep me up at night - so long as it has a nice warranty. πŸ˜‰

            • Melvar
            • 6 years ago

            [quote<]the smallest size SSDs are probably comprised of all the binned chips from the larger size drives, probably with 50% or more of the chips' transistors unusable and relying on the algorithm of the drive controller for integrity. That might suggest that smaller SSDs (like mine) will fail earlier. If the 240GB model fails at around 200TB and has the same number of chips (presumably, I don't know that they do) doesn't that suggest the 120GB will fail at around 100TB?[/quote<] The smaller SSDs use fewer chips than the larger ones. That's why the smallest capacities of a particular line are the slowest; they don't have enough chips to populate every memory channel on the flash controller. I'd imagine the lower binned chips get dumped into cheap phones & tablets. That said, if a 240GB SSD fails at 200TB, the 120GB model [i<]should[/i<] fail at 100TB, not because the chips are crappier, but because it has half as many flash cells to wear out. Likewise the 960GB model should have 4 times the write endurance of the 240GB drive.

        • indeego
        • 6 years ago

        As uber states below, you haven’t written 10GB a day, you wrote likely about 100G for OS and programs the first few days, then almost nothing thereafter.

        Keep track of it for a week now. It won’t be anywhere close to 10 GB a day I’m thinking.

          • ronch
          • 6 years ago

          Perhaps. After installing everything onto the SSD for the first time something like 0.04 was written already. Fair enough, since I only installed essential apps and that figure doesn’t include games yet. Usage somewhat stuck to 0.01TB/day after that. I would’ve continued tracking but I got tired of it and stopped worrying so much. I have far bigger problems to worry about my SSD.

      • NeelyCam
      • 6 years ago

      Do you believe your 1TB HDD is more reliable than the SSD…?

        • ronch
        • 6 years ago

        No. He uses his HDD for fear of wearing out his SSD and possibly worsen its performance and reliability, I reckon.

          • spiritwalker2222
          • 6 years ago

          I have just over 4 TB’s of writes to my Intel X25 from 2009. I feel like the drives barely been used after seeing how these ither drives do after 500 TB’s. Guess I don’t use it enough, the hour meter is just over 5,000 hours.

    • pandemonium
    • 6 years ago

    I love this on-going experiment. It puts to rest any perceptions that SSDs are inferior to HDDs for longevity.

    I’d really love to see a broad range of drives, from consumer to enterprise and several manufacturers and even interfaces to show how things really end up, but I can only imagine how intense this test has been.

    This is great stuff on the road to awesome. Thanks a lot, TR!

      • sbhall52
      • 6 years ago

      If anything, I’m MORE confident about SSD longevity than I am a spinning HDD. In fact, I’ll never put a spinning disk inside a computer again. (My use case is probably different from yours–the general “you,” not you, pandemonium, specifically–so of course, YMMV.)

      I just wish I’d bought a 500GB SSD when I built my box a year ago. Wait . . . a year ago, it would have cost more than the rest of the components combined.

      • nanoflower
      • 6 years ago

      Keep in mind that while HDs can crash, so can SSDs. I believe Scott had one go south on him unexpectedly. It may not be that common but it can happen.

        • pandemonium
        • 6 years ago

        As with anything. It’s a believed notion in some camps that SSDs can’t maintain a life cycle as strong as RAIDed HDDs. With this testing, that’s being proven wholly untrue.

          • Krogoth
          • 6 years ago

          The problem with SSDs in RAIDs is that SSDs don’t fail like HDDs. The cells in the SSDs start to “burn-in” and this can result in subtle data corruption. HDDs typically don’t do this unless bad sectors start to accumulate. That subtle data corruption can wreak havoc on a nested RAID if it isn’t caught in time.

          The firmware in the HDD usually detects this and RAID controllers knows how to deal with it. HDDs usually fail from mechanical failure and most fault tolerance schemes in RAID were design with this in mind. SSDs rarely suffer outright hardware failure.

    • JosiahBradley
    • 6 years ago

    Talk about ‘solid state’. This experiment has hands down given me faith in SSDs over hard drives. I’ve had old HDDs just die from from old age.

    Thanks for the excellent testing that I haven’t seen from any other site.

      • meerkt
      • 6 years ago

      This test shows that write endurance is good, but it doesn’t say if the drives will or won’t die too of old age, or how their long-termish retention holds.

        • ronch
        • 6 years ago

        Clearly, there’s still a LOT both manufacturers AND consumers need to know about SSDs. And even so, quality can still range from A to Z after the technology becomes fully mature and commoditized many years from now.

    • Krogoth
    • 6 years ago

    Impressed with the results……

      • ronch
      • 6 years ago

      You know the end is near when

      1. Pigs fly
      2. Hell freezes over
      3. Krogoth is impressed

        • Deanjo
        • 6 years ago

        4. An positive Apple comment gets thumbs up votes.

          • sweatshopking
          • 6 years ago

          You often get pluses from apple comments. TRY LIKING THE DEVIL THEMSELVES. NOBODY PLUSES ME.

            • atryus28
            • 6 years ago

            Yer not even close to the devil and I kinda like you so I plused you just because. πŸ˜›

        • superjawes
        • 6 years ago

        Don’t cause a panic! Krogoth [i<]is[/i<] impressed, and this week's cold snap could count as Hell freezing over! (especially if Hell, VA froze)

          • swaaye
          • 6 years ago

          Hell, MI definitely froze.

        • Voldenuit
        • 6 years ago

        #2 and #3 happened this year, did I miss #1?

      • NeoForever
      • 6 years ago

      I am not sure that I comprehend. [i<]Who's[/i<] impressed?

      • UberGerbil
      • 6 years ago

      Guys, I’m pretty sure his account was hacked. Either that, or we need some kind of medical / psychiatric intervention, stat.

        • Neutronbeam
        • 6 years ago

        Has anyone checked Krogoth’s basement for pods?

      • SomeOtherGeek
      • 6 years ago

      Seriously, I laughed. You might be serious, but it is funny!

    • Sahrin
    • 6 years ago

    This is the kind of hardcore data I like to come to TR for. Too bad your competitors won’t do research like this anymore (I’m looking at you, Anand). Thanks for doing this TR. Great work, keep it up.

    Would be very interesting to hear if Samsung has anything to say about the 840’s performance so far (I own one), and maybe even moreso what drive engineers from competitors have to say about the performance of the Samsung drive (ie, if they can tell us anything about the choices Samsung made in designing the drive based on its failure pattern).

      • indeego
      • 6 years ago

      Anand may not be doing this type of research, but the site is excellent in its own right. Their smartphone testing went from nonexistent to best-in-class, without a doubt, from a technical standpoint.

      Both Anandtech and TR are great sites. (Anandtech’s design/site layout is atrocious, in my opinion, however.)

      • KristianAT
      • 6 years ago

      It’s Kristian here, AnandTech’s SSD guy.

      We’ve had manufacturers asking us to test endurance but we’ve decided against it due to limited test methods and variables involved. The biggest issue is data retention — the more you write to the drive, the quicker it will lose the data (writing will wear out the silicon oxide that holds the electrons in the floating gate). This is something that’s nearly impossible to test accurately with the equipment reviewers have because once the drive loses its data, it’ll be bricked since it also loses the NAND mapping table. In case the drive still works, the cells will be refreshed and you’re at square one again. In other words, there’s a chance that if you decided to wait a week to see if the drive still holds the data, it might have lost it after the first day without you knowing.

      To test data retention and endurance throughly, you would need dozens of samples and way too much time. IMO the minimum acceptable data retention for a consumer grade drive would be a month — after that the possibility of data loss gets too high. To test that, the drive would have to remain unpowered for a month, which is pretty long given that you’ll still face the same issue of accuracy (i.e. when was the data lost). The only way would be through trial and error (i.e. test so many samples that you have enough data points to make conclusions) but that could easily take over a year. Furthermore, just testing one model wouldn’t be enough, so when you incorporate several models, the test becomes way too time consuming to be worth it.

      There’s also the warranty angle. Most manufacturers are now including a TBW (Total Bytes Written) limitation in their warranties, which means the warranty will be void if the TBW is exceeded. I’m not very comfortable with saying that the drive is still fully usable after the TBW has been exceeded because we can get an egg on our face if that’s not the case for someone. In the end, NAND is binned just like CPUs are (but for endurance instead of performance) and not all SSDs are equal (even though they are sold as the same). Manufacturers are known to cherry-pick review samples after all.

      I’m not trying to lowball TR’s test, not at all. I just wanted to explain why you’ve not seen similar endurance tests at AnandTech. This data is great but should be taken with a grain of salt in the sense that even though the drive can be written to, it may not be usable in real world environment anymore. Each site and reviewer has their own requirements and standards for a test and I admit that our’s are pretty strict. If we can’t test every aspect of what we want to test, we rather don’t test it at all πŸ™‚

        • meerkt
        • 6 years ago

        Hey Kristian,

        Are you saying that drives proactively rewrite static data to “refresh” retention time? Do all of them do it? Does it happen at idle time?

        Why would the NAND map necessarily die as well? I’d assume that retention failures would show up randomly one or few blocks at a time, at least if you don’t wait 50 years before checking it. πŸ™‚ And wouldn’t critical indexes anyway be stored with redundancy or in higher grade flash (SLC?).

        A resolution of 1 month would be great to know, and even few months. I just want to know if I can keep archival data for 5 or 10 years, if I should be worried if I didn’t power up an old computer for a few months, if static data needs to be refreshed proactively with software, etc.

        BTW, if I recall correctly, the JEDEC spec for SSDs requires consumer drives to retain data for 1 year once their P/E cycles have been exhausted, and one month for enterprise-grade. I don’t know what it might be for half-used drives. That’s definitely something I’d like to know. I wish manufacturers would advertise retention specs as well, but without market awareness and demand they might not have a reason to.

          • KristianAT
          • 6 years ago

          meerkt,

          That’s what wear-leveling is for. Static data will be moved around in order to ensure that all blocks are consumed equally. All drives do this but the aggressiveness and timing is drive dependent (though usually it’s done while there’s very little IO activity).

          The NAND mapping table would also (eventually) get lost due to wear-leveling. You are right that before a total failure there will be reallocated blocks/sectors (as TR’s tests show) but once the number is big enough the drive will fail. I’m not sure if all NAND blocks will actually fail (wear-leveling is never perfect and there are binning differences between each die), it might be that the drive is just not programmed to adjust to major decrease in usable NAND space, which is why the drive will become unrecognisable. That would make sense since if too many blocks are lost, the drive will have to start randomly using user-accessible blocks, which could result in data loss.

          As far as I know, most (if not all) drives store their mapping tables in NAND. That’s because the size of the mapping tables has gone up due to design changes and storing it in the internal SRAM (or other non-volatile storage inside the controller) is no longer possible. That’s why the DRAM cache sizes have increased too — during the Intel X-25M 32MB was more than enough while drives now use 128MB/256MB for same capacity points.

          You are right about the JEDEC spec. For consumer SSDs and NAND, the drive must hold data for one year after all the P/E cycles have been used. For enterprise that’s 3 months (one of the reasons why eMLC is more durable). It would certainly be interesting to know how the data retention acts (e.g. does it decrease linearly or exponentially etc.).

            • meerkt
            • 6 years ago

            I thought wear leveling only refers to what happens during writes, and maybe reads:
            distribution of touched blocks during writes, and maybe side effects of read error handling, like retiring blocks that take too long to read or have too many ECC errors.

            But does it really include automatically refreshing or moving around data in a readonly drive? This isn’t something I expected. So drives keep a record of how much time’s passed since each cell/block was written to, factor in its used P/E cycles, and refresh or reallocate as needed?

            But how could one know how long a full refresh takes? If used for archival, what would one do, turn the drive on every few months, and let the computer idle for 10 hours just to be sure? Hmm… I guess this should show in SMART as an increase in NAND writes without any host writes. (Has anyone noticed something like this?)

            I didn’t expect SRAM for indexes, but SLC or SLC-like storage (like the fast areas of the Samsung EVOs), or more ECC, or duplicate copies. This could practically guarantee that the index data survives much after all usable blocks are exhausted. If NTFS and even FAT keep dual copies of the allocation info, surely SSDs would do at least as much. Well, that’s what I would do. πŸ™‚

            I recall reading that new cells have a retention of something like 10 years, but that was a few years ago and in reference to much larger process geometries. Maybe it was even SLC. I hope it’s not something like 2 years for 2x nm TLC. I don’t intend to use TLC for archival if I can help it, but also MLC is getting inherently less reliable.

            It’d be really nice if you, or TechReport, could get more info on this from manufacturers and write up some just on retention.

            And to manufacturers: I’d like to see retention specs advertised and standardized. With a nice little graph showing retention time over P/E cycles. πŸ™‚

            • balanarahul
            • 6 years ago

            IIRC Sandisk SSDs store the NAND making tables in the ‘nCache’. And nCache uses pseudo-SLC NAND.

            Nice to see someone of Anandtech keeping tabs on this experiment.

    • albundy
    • 6 years ago

    interesting on the 60gb ssd crash. so it disconnected and just bluescreened on you? seems like an anomaly has unexpectedly occurred. were you able to ever boot from that drive?

      • willmore
      • 6 years ago

      The’ve been using it as the system drive, so…..

    • FireGryphon
    • 6 years ago

    Great article, and I like that it’s ongoing. I’m happy and a bit surprised that the drives seem so hearty.

    This is outside the scope of the article, but it’d be awesome to see if data is corrupted when dead sectors are replaced with spare ones.

      • meerkt
      • 6 years ago

      There’s not much point in reallocating sectors if the data gets corrupted while doing it.

      • UberGerbil
      • 6 years ago

      Bugs are always possible, but that doesn’t seem like a significant thing to worry about: this is really no different from the wear-levelling that is going on all the time anyway, other than retiring the suspect block (which you don’t care about anyway). There just happens to be a new block in the pool with a write count of zero.

        • FireGryphon
        • 6 years ago

        I don’t understand, then, how a drive determines that a sector is bad. Does it just guess, or does it try to read the sector and discover that everything’s ruined? Presumably, at least some versions of the latter scenario result in lost data, no?

          • UberGerbil
          • 6 years ago

          Blocks have to be erased before they are re-written, and it’s trivial to test whether the erase failed; obviously they also read-after-write to verify that stage. So if either of those operations fails, the block is retired and the write goes to some other block (probably one that is fresh out of the reservoir of reserved blocks, assuming some are available). I believe most implementations have ECC bits to guard against “spontaneous” failures of blocks during reads, but how much that covers is going vary.

          But my knowledge in this area is several years old now and was based on reading papers and talking to some people, not doing actual implementations, so I can’t claim any real expertise.

          • meerkt
          • 6 years ago

          I’ll add to what Gerbil said. I don’t have any concrete inside info, but this is what I figure.

          All sectors/blocks are stored with extra error detection and correction code (EDC/ECC). While reading, the drive can know how many errors there are, and it’s also able to correct them. Once too many errors accumulate the block is deemed “bad”, the drive marks it as such and reallocates the data. It’s quite possible that “bad” areas could be used just fine for much longer, but safety margins are a good thing to have.

          Even if the drive fails to correct the data with ECC because there are too many errors, I suppose it may be possible to read it using more extreme measure. Like, increasing read voltage, or retrying the read 1000 times until one succeeds (a successful read can be detected with a checksum, like CRC). This isn’t something you’d use in normal reads because it’s slow, or because it could damage the flash cell, but in emergency reads on corrupt data, and before the block is retired anyway, the only thing that matters is being able to read the data correctly one more time.

          Another thing about EDC/ECC is that it’s not only used for blocks that are “bad”. It’s probably used in the normal course of things as well. The media itself is unreliable to a certain degree, and it’s only made reliable through EDC/ECC. This isn’t unique to flash, but happens also elsewhere, like optical media.

      • davidbowser
      • 6 years ago

      Good resource for how this works:

      [url<]http://www.storagesearch.com/sandforce-art1.html[/url<] They review the basics of ECC, CRC, and LBA tests.

    • Pan Skrzetuski
    • 6 years ago

    Thanks, Geoff. This really is great information and it doesn’t get old.

    I am happy to see the performance stays high well beyond the amount of time I am likely to own any SSD (since interface and capacity will likely spur me to a change well before 500TB).

    I am also interested to see how and when the drives do end up dying.

    • Dposcorp
    • 6 years ago

    I am really liking this long term analysis and testing. Kudos to you guys. This is real world data I can use to make buying purchases.

Pin It on Pinterest

Share This