Some Samsung SSDs may suffer from a buggy TRIM implementation

A new bug may be present in the firmware of some Samsung SSDs. An engineer from search provider Algolia has written up a blog post detailing his team’s saga of troubleshooting data corruption in some of their servers. Ultimately, Algolia narrowed down the problem to an apparent bug in the TRIM implementation used in Samsung drives.

This TRIM bug can cause affected drives to zero out 512-byte blocks of data in incorrect, seemingly random locations. For larger files, the net result is corrupted data, while smaller files can be overwritten entirely.

Algolia ruled out other causes for the corruption by carrying out a thorough investigation of several other layers in their service’s stack, including Linux’s RAID implementation, kernel, and filesystem. The company also found that servers running Intel drives were unaffected by this issue.

The affected drives in use at Algolia are:

  • Samsung MZ7WD480HCGM-00003 (model SM843TN)
  • Samsung MZ7GE480HMHP-00003 (model PM853T)
  • Samsung MZ7GE240HMGR-00003 (model PM853T)
  • Samsung SSD 840 PRO Series
  • Samsung SSD 850 PRO 512GB

If Algolia’s determinations are correct, this problem is potentially quite serious. It also wouldn’t be the first SSD-related black eye for Samsung, which recently issued two separate updates to remedy the read slowdowns we’ve seen with the 840 EVO SSD.

We’ve brought Algolia’s claims to Samsung’s attention, and we’ll update this post with more information as it becomes available. Thanks to our shy anonymous tipster for the heads-up.

Comments closed
    • sustainednotburst
    • 7 years ago

    Algolia posted a new update. Turns out The issue was the Linux Kernal, not the SSDs. Update from today, July 17th, “Samsung had a concrete conclusion that the issue is not related to Samsung SSD or Algolia software but is related to the Linux kernel. Samsung has developed a kernel patch to resolve this issue and the official statement with details will be released tomorrow, July 18 on Linux community with the Linux patch guide.”

    • sustainednotburst
    • 7 years ago

    For those of you complaining or worried about data loss with TRIM and Samsung SSDs, please see this: [url<]http://forums.macrumors.com/threads/os-x-el-capitan-opens-door-to-trim-support-on-third-party-ssds-for-improved-performance.1891936/page-10#post-21469307[/url<] Explains what is actually going on with TRIM and Samsung SSDs.

    • thorz
    • 7 years ago

    Do you know of some utyl like DiskFresh for the Mac?

    • brothergc
    • 7 years ago

    their was a time when I wanted a sumsung ssd , no longer I am eyeballing a Intel drive

    • K-L-Waster
    • 7 years ago

    Ok, your zeroes are problem free — how about your 1s?

    πŸ˜€

    • maxxcool
    • 7 years ago

    You don’t replace good servers until the hardware or the warranty dies πŸ™‚

    • TheMonkeyKing
    • 7 years ago

    Yeah, just because my model, MZ-75E500B/AM, does not appear in their list of affected drives does not mean it is inherent in mine either. (And it seemed like such a good deal too…)

    • willmore
    • 7 years ago

    According to the blob post in this article, the bug they are seeing is for unqueued TRIM–i.e. classic TRIM (pre SATA-3.1). That’s pretty darn sad since that’s not exactly a *new* part of SATA.

    • dragontamer5788
    • 7 years ago

    [quote<]zero problems.[/quote<] I see what you did there! Algolia's TRIM issue was causing [b<]zero[/b<] problems across the drive. Get it? Ha!

    • lycium
    • 7 years ago

    Of course the news of this TRIM firmware bug surfaces on the very day I get a new 256GB 850 Pro delivered… whatever, installed Windows on it now, fingers crossed…

    • crystall
    • 7 years ago

    This, a million times. And to top it up: Intel drives with an Intel controller please. Just to be on the safe side (and even then there have been trouble: I’ve got a 320 which is serving me well but that one suffered from the “your SSD just turned into an 8MB drive after a power loss” bug).

    • Duct Tape Dude
    • 7 years ago

    Yes I am. I wasn’t aware of that issue on the MX100s, thanks.

    • Smeghead
    • 7 years ago

    On 14.04, /etc/cron.weekly/fstrim is set to run fstrim-all, which has a bunch of mentions that it only runs for Intel and Samsung drives.

    Looking at /sbin/fstrim-all:

    [code<] # As long as there are bugs like [url<]https://launchpad.net/bugs/1259829[/url<] we only run # fstrim on Intel and Samsung drives; with --no-model-check it will run on all # drives instead. [/code<] However, it then goes on to give the thumbs up to a whole bunch of manufacturers: [code<] HDPARM="`hdparm -I $REALDEV 2>/dev/null`" || continue if [ -z "$NO_MODEL_CHECK" ]; then if ! contains "$HDPARM" "Intel" && \ ! contains "$HDPARM" "INTEL" && \ ! contains "$HDPARM" "Samsung" && \ ! contains "$HDPARM" "SAMSUNG" && \ ! contains "$HDPARM" "OCZ" && \ ! contains "$HDPARM" "SanDisk" && \ ! contains "$HDPARM" "Patriot"; then #echo "device $DEV is not a drive that is known-safe for trimming" continue fi fi [/code<] Oops. At the very least, if you're concerned, then an edit of /etc/cron.weekly/fstrim to have it exit without doing anything (either by commenting out fstrim-all or just a plain 'exit 0' before that) might be advisable.

    • colinstu12
    • 7 years ago

    830 128GB + 256GB, 840 Pro 256GB, 850 512GB…. all going strong with zero problems.
    *knocks on wood*

    • Thresher
    • 7 years ago

    I have a 1TB 850 Pro and a 512GB 840 Pro.

    Never had a problem with them. Just hoping they don’t blow up now that I brought that up.

    • juzz86
    • 7 years ago

    It’s their constant rush to be ‘first’ with everything, product-catalog wide. Excellent performance on the surface, niggling issues over time. First phones and whitegoods, now solid-state storage. It’d need a shift out of the ‘race’ mentality though, and it’s probably not likely to happen anytime soon.

    • divide_by_zero
    • 7 years ago

    Amen. I used to recommend Intel or Samsung, but after the 840 EVO mess, no more of that.

    So, Intel it is. There’s a bit of a price premium of course, but honestly given their well deserved reputation for stability above all, it seems more than fair.

    • jaset
    • 7 years ago

    This must be the first time I’m actually glad that TRIM doesn’t work with RAID 1 RST, having just upgraded my OS drives to 850 EVOs.

    • Leader952
    • 7 years ago

    So you really don’t know if any data has been corrupted.

    • shaurz
    • 7 years ago

    I actually bought an 840, then decided to buy an 830 instead and returned the 840.

    • Khali
    • 7 years ago

    I had a choice of the 830 or the 840 when I built this system and I went with two 512GB 830’s. Now I’m glad I did. The 840 line had just released at the time but I looked and couldn’t find any bad reviews of the 830 line and the 840 was just to new to have any reviews at all.

    • chucko
    • 7 years ago

    This is apparently not just a Linux or Samung related issue.

    TRIM apparently causes issues if SSD’s are used in RAID-1 with Intel Rapid Storage Technology Enterprise (RSTe). The problem can be eliminated by turning off TRIM, according to thread participants.

    [url<]http://www.win-raid.com/t498f23-RAID-Mirror-Corruption-on-R-Server-with-Intel-RSTe-Controller-with-Intel-SSD-Drives-5.html[/url<]

    • Freon
    • 7 years ago

    Still rocking an 830, just added a BX100. Hoping nothing creeps up on the BX100, I think some people have had issues with older M4’s.

    • just brew it!
    • 7 years ago

    Same here.

    • just brew it!
    • 7 years ago

    Bad things happening when TRIM is used in combination with NCQ has been observed on the 840 EVO with the latest firmware update:
    [url<]http://www.techspot.com/article/997-samsung-ssd-read-performance-degradation/[/url<] (scroll down about 1/3 of the way) [url<]http://www.overclock.net/t/1507897/samsung-840-evo-read-speed-drops-on-old-written-data-in-the-drive/2640#post_23827674[/url<] [url<]https://bugs.launchpad.net/ubuntu/+source/fstrim/+bug/1449005[/url<] I wonder if this is related?

    • Price0331
    • 7 years ago

    Ah fair point, also need to keep them o n my radar.

    • continuum
    • 7 years ago

    Are we forgetting all the Crucial firmware bugs? (5200 hours, as well as the MX100’s randomly dying?)

    [url<]http://forum.crucial.com/t5/Crucial-SSDs/MX100-will-not-boot-sometimes/td-p/158815[/url<]

    • SomeOtherGeek
    • 7 years ago

    Yea, I always say, “if it isn’t broken, don’t fix it” and I go a stop buying Intel SSD… Stupid me.

    • SnowboardingTobi
    • 7 years ago

    Good thing I’m still rocking spinning hard drives!! wooo! go me!

    *sad panda*

    • cygnus1
    • 7 years ago

    If this is confirmed, that’s the last nail in the coffin for me.

    • Vaughn
    • 7 years ago

    This!

    My main Rig still uses two Intel G2 160 SSD in raid 0 and when I decide to upgrade it will be intel again either 730 or 750 series for my OS drive reliability is king in a storage device.

    I will play extra for intel and have slower benchmarks than the samsung drive to avoid stuff like that.

    • chuckula
    • 7 years ago

    If this bug was hitting me in the way that these guys describe, I wouldn’t have to do any of that. My system would simply fail to boot.

    • Convert
    • 7 years ago

    A Samsung issue you say? It’s safe to assume TR and everyone else will remain mum on the subject for months and then repost the Samsung PR on how they fixed the issue only for the 850 Pro.

    Look for more Samsung drives in the deal of the week though!

    /abnormally pissed and not afraid to whine about it

    • divide_by_zero
    • 7 years ago

    *Hugs my 830 SSD*

    Oh good, now it’s the Pro models with a potentially serious issue. Glad I’ve resisted the various Samsung SSD sales I’ve seen over the past few months.

    • WaltC
    • 7 years ago

    Interesting. Looks like here it’s a Linux kernel interaction problem with certain drives…Go here for the drive “black list” contained in the Linux kernel:

    [url<]https://github.com/torvalds/linux/blob/e64f638483a21105c7ce330d543fa1f1c35b5bc7/drivers/ata/libata-core.c#L4109-L4286[/url<] ...and take a look starting @ Line 4109... From the article: "Our new deployments were switched to different SSD drives and we don’t recommend anyone to use [i<]any SSD that is anyhow mentioned in a bad way by the Linux kernel.[/i<] Also be careful, even when you don’t enable the TRIM explicitly, at least since Ubuntu 14.04 the explicit FSTRIM runs in a cron once per week on all partitions – the freeze of your storage for a couple of seconds will be your smallest problem."

    • jihadjoe
    • 7 years ago

    I’m sticking with Intel. After that video tour of their facilities I trust them, everybody else is pretty much an unknown.

    • jessterman21
    • 7 years ago

    Yep, I was getting excited about purchasing an 850 EVO 500GB as soon as it goes on sale again at Newegg, but I saw this and was like, “gorram it, not another Samsung firmware issue!”

    Definitely going for the BX100 instead.

    • Leader952
    • 7 years ago

    Some data here:

    Certifying Flash Devices (SSDs)
    [url<]http://www.aerospike.com/docs/operations/plan/ssd/ssd_certification.html[/url<]

    • demol3
    • 7 years ago

    I have just faced one problems with my 830.

    The sequential write speed drop to 50MB/s. My drive as OP enabled and still have roughly 20-30GB free space. I have checked TRIM as well as running the optimize from Magician Software but have no luck.

    Only a Secure Erase restore the write speed.

    • geekl33tgamer
    • 7 years ago

    Maybe you have a point… πŸ˜‰

    • Leader952
    • 7 years ago

    Not for me.

    Data corruption (especially silent corruption) is the absolute worst problem to have on any storage device and with Samsung’s current history with firmware problems it is time to move to more reliable vendors.

    Any review that keeps these drives on a recommended list with these known problems is now suspect in that why should you believe any future review from them.

    • ronch
    • 7 years ago

    Might try it. Thanks.

    • ronch
    • 7 years ago

    Google is your pal.

    Or is it “Google is your friend?”

    • Flying Fox
    • 7 years ago

    Intel?

    • cobalt
    • 7 years ago

    At least run DiskFresh — the access times on my 840 and 840 EVO got so long, access times had almost gone to infinity. I’m not joking — at some point the signal gets so weak, it can’t read your data, and now you’re in data loss mode. Download the free version, and set it up to run before you go to bed, all fixed up in the morning.

    • dragontamer5788
    • 7 years ago

    I thought it went…

    [quote<]β€œThere's an old saying in Tennessee β€” I know it's in Texas, probably in Tennessee β€” that says, fool me once, shame on β€” shame on you. Fool me β€” you can't get fooled again.” [/quote<]

    • cmrcmk
    • 7 years ago

    My 830 has been flawless so far, but I share you unease with this sporadic bug reports.

    • ronch
    • 7 years ago

    Hey, you know this quote?

    “Fool me once, shame on you. Fool me twice, shame on me.”

    • Leader952
    • 7 years ago

    [quote<]I've never experienced data corruption[/quote<] How do you verify that no data on the drive has ever been corrupted? Do you run some kind of data scrubbing with hash/parity/error correction?

    • ronch
    • 7 years ago

    Me too. Kinda wish I went with Intel. Heck, even Adata doesn’t seem like it would’ve been a bad choice at this point.

    • derFunkenstein
    • 7 years ago

    Probably true, but 100% of normal users would have the previous TLC performance degradation issue, and now we’re going to see something new when the data is actually refreshed from time to time. I don’t happen to own any Samsung drives, but I certainly don’t plan to in the near future, either.

    • Firestarter
    • 7 years ago

    knock on wood I haven’t had any funny business yet

    • wiak
    • 7 years ago

    i have read this somewhere before (linux raid and samsung/trim might been the topic), this seem to be linux and trim on samsung drives, windows isnt affected it seems

    • DPete27
    • 7 years ago

    [quote<]I'll just go with Toshiba/new OCZ, Crucial, or Sandisk from now on.[/quote<] Exactly

    • highlandr
    • 7 years ago

    I wonder if I should breathe easy about my Samsung 830 – it seems like it was a generation before the trouble began, but now I wonder if it has problems but no one’s found them yet…

    • willmore
    • 7 years ago

    Could be popularity breeds demand and then demand exceeds the ability to make the drives and so they cut corners–rush firmware development, use below spec memory, etc.

    • ronch
    • 7 years ago

    It’s been a month or two since Samsung’s latest firmware for my 840 EVO came out. Been really lazy to flash it since AFAIK I’m required to burn the ISO onto a CD and use THAT to boot and flash my SSD. I dunno, I’ve gone lazy about it. Maybe I just don’t care anymore. Maybe one of these days I’ll do it, but not today.

    But I’m sure staying away from Samsung on my next SSD purchase until they get a clean bill of health.

    • willmore
    • 7 years ago

    I have a Crucial as the boot drive on my desktop. *crossfingers*

    • tay
    • 7 years ago

    I regret buying a Samsung. Had an intel previously. Frustrating.

    • bittermann
    • 7 years ago

    Ah…OK that makes sense.

    • geekl33tgamer
    • 7 years ago

    Ergh, my old SSD’s from Samsung had that slowdown problem. Every firmware fix (hint: never made a difference) mean all drive data was lost and you had to start over.

    I replace my SSD’s with the 850 after reviews are positive, and well what do ya know – my SSD’s are on the affected list.

    Whoever said mechanical storage was dead again? It’s looking very appealing when Samsung rapidly become the next OCZ in the storage sector… :-/

    Edit: I’ve not personally found any corrupted or missing files on my 850’s yet, but I’m a little uneasy about this news to say the least.

    • Ninjitsu
    • 7 years ago

    Intel’s worth the little extra IMO, especially for a system drive.

    • chuckula
    • 7 years ago

    I think they meant in the context of their specific DBMS that you shouldn’t bother with a filesystem. That’s not specific to Aerospike either since there are plenty of Oracle guys who just drop the Oracle DB straight onto a storage array without there being any underlying filesystem. The storage array is just another block device and the Oracle DB handles all the I/O directly.

    Of course, outside of these niche applications you need a filesystem.

    • Ninjitsu
    • 7 years ago

    Would be awesome if Aerospike could release that data!

    • Chrispy_
    • 7 years ago

    Oh Samsung, how far you’ve fallen.

    Samsung is the new OCZ, here to fill the void that the old OCZ left behind when they became Toshiba.

    • bittermann
    • 7 years ago

    I appreciate their data findings but they lost me at “don’t use a file system”. WTF

    • Price0331
    • 7 years ago

    Yeah, my next buy is looking like a Crucial branded SSD at this rate.

    • marraco
    • 7 years ago

    Why the most popular and better rated vendors seem to have the worst problems?

    The Vertex 2 from OCZ was highly recommended by review sites, then they started bricking themselves after recovering from hibernation.

    Then Samsung models turned to be the most recommended, and that was followed by lots of problems like these.

    Maybe they are the most popular, so they are better scrutinized.

    Maybe other vendors also have deep flaws, but they are not reported, because they do not reach the critical mass of users sharing information.

    • ronch
    • 7 years ago

    Keep this up, Sammy, and you’ll be joining the ranks of OCZ, Kingston, and to a lesser extent, PNY. Good luck!

    • ozzuneoj
    • 7 years ago

    Argh… I bought an SM841 (OEM 840 Pro) thinking it’d be a good investment at a super cheap price. I have an 840 Evo in my laptop and i put an 840 in a friend’s computer. I wonder if anyone has lost their job at Samsung over all these terrible developments. They’ve certainly damaged their reputation with enthusiasts.

    • Duct Tape Dude
    • 7 years ago

    So we either face slow file reads on the Samsung 840 EVO/similar, or file corruption on the Samsung 840 PRO.

    And then there were NAND bait-and-switch shenanigans with Kingston and PNY.

    I’ll just go with Toshiba/new OCZ, [s<]Crucial[/s<], Sandisk, or Intel from now on. Is that it? EDIT: Added Intel. Whoops. EDIT2: Crucial MX100s have boot issues?

    • chuckula
    • 7 years ago

    I’ve had 2 840 Pro 512GB models running in my system for just over 2 years now. A few months after the initial build I had one drive go bad* and I did an RMA. Other than that, I’ve never experienced data corruption and I do run Linux and I have TRIM enabled.

    Having said that, I’m not running an intensive web server either, so I’m not saying these guys are wrong, just that not everybody is going to hit this bug.

    * Bad to the point of being completely unrecognized by the firmware, not corrupt in a way where I could potentially do a reformat like what is seen with this bug.

    • End User
    • 7 years ago

    Samsungs SSD division is going through their Firestone tire phase.

    • derFunkenstein
    • 7 years ago

    Yet we’re still recommending these drives.

    • anotherengineer
    • 7 years ago

    lolz

    Oh wait told my bro to get an 840 pro for his pc, hopefully that doesn’t turn into more work for me.

    edit – [url<]https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/[/url<] Hmmmm so all on Linux. Are the windows and Linux TRIM commands the same??? Intel drives ok Working SSDs: Intel S3500 Intel S3700 Intel S3710 Interesting indeed. Edit 2 - in comments - even more interesting At Aerospike, we have a NoSQL database that's been run on a lot of flash drives, for years, and we've found a few things: * Don't use TRIM. There are too many bad controllers that (at best) do bizarre performance actions. TRIM should be a good thing.... but it's (apparently) too complex for most controller writers. * Don't use a file system. Aerospike has its own native data layout, and you can use files, but lousy things happen. * Some manufacturers go through bad times, so keep testing. We built a tool [url<]https://github.com/aerospike/act[/url<] so we can prove to manufactures when they have bugs. If I tried to list all the devices that we tested and had firmware issues (either performance or "functionality") it would be a long and embarrassing list. We have over 4 years of data covering a lot of manufacturers.

Pin It on Pinterest

Share This

Share this post with your friends!