A new bug may be present in the firmware of some Samsung SSDs. An engineer from search provider Algolia has written up a blog post detailing his team’s saga of troubleshooting data corruption in some of their servers. Ultimately, Algolia narrowed down the problem to an apparent bug in the TRIM implementation used in Samsung drives.
This TRIM bug can cause affected drives to zero out 512-byte blocks of data in incorrect, seemingly random locations. For larger files, the net result is corrupted data, while smaller files can be overwritten entirely.
Algolia ruled out other causes for the corruption by carrying out a thorough investigation of several other layers in their service’s stack, including Linux’s RAID implementation, kernel, and filesystem. The company also found that servers running Intel drives were unaffected by this issue.
The affected drives in use at Algolia are:
- Samsung MZ7WD480HCGM-00003 (model SM843TN)
- Samsung MZ7GE480HMHP-00003 (model PM853T)
- Samsung MZ7GE240HMGR-00003 (model PM853T)
- Samsung SSD 840 PRO Series
- Samsung SSD 850 PRO 512GB
If Algolia’s determinations are correct, this problem is potentially quite serious. It also wouldn’t be the first SSD-related black eye for Samsung, which recently issued two separate updates to remedy the read slowdowns we’ve seen with the 840 EVO SSD.
We’ve brought Algolia’s claims to Samsung’s attention, and we’ll update this post with more information as it becomes available. Thanks to our shy anonymous tipster for the heads-up.
Algolia posted a new update. Turns out The issue was the Linux Kernal, not the SSDs. Update from today, July 17th, “Samsung had a concrete conclusion that the issue is not related to Samsung SSD or Algolia software but is related to the Linux kernel. Samsung has developed a kernel patch to resolve this issue and the official statement with details will be released tomorrow, July 18 on Linux community with the Linux patch guide.”
For those of you complaining or worried about data loss with TRIM and Samsung SSDs, please see this: [url<]http://forums.macrumors.com/threads/os-x-el-capitan-opens-door-to-trim-support-on-third-party-ssds-for-improved-performance.1891936/page-10#post-21469307[/url<] Explains what is actually going on with TRIM and Samsung SSDs.
their was a time when I wanted a sumsung ssd , no longer I am eyeballing a Intel drive
Of course the news of this TRIM firmware bug surfaces on the very day I get a new 256GB 850 Pro delivered… whatever, installed Windows on it now, fingers crossed…
830 128GB + 256GB, 840 Pro 256GB, 850 512GB…. all going strong with zero problems.
*knocks on wood*
[quote<]zero problems.[/quote<] I see what you did there! Algolia's TRIM issue was causing [b<]zero[/b<] problems across the drive. Get it? Ha!
Yeah, just because my model, MZ-75E500B/AM, does not appear in their list of affected drives does not mean it is inherent in mine either. (And it seemed like such a good deal too…)
Ok, your zeroes are problem free — how about your 1s?
π
I have a 1TB 850 Pro and a 512GB 840 Pro.
Never had a problem with them. Just hoping they don’t blow up now that I brought that up.
This must be the first time I’m actually glad that TRIM doesn’t work with RAID 1 RST, having just upgraded my OS drives to 850 EVOs.
This is apparently not just a Linux or Samung related issue.
TRIM apparently causes issues if SSD’s are used in RAID-1 with Intel Rapid Storage Technology Enterprise (RSTe). The problem can be eliminated by turning off TRIM, according to thread participants.
[url<]http://www.win-raid.com/t498f23-RAID-Mirror-Corruption-on-R-Server-with-Intel-RSTe-Controller-with-Intel-SSD-Drives-5.html[/url<]
Bad things happening when TRIM is used in combination with NCQ has been observed on the 840 EVO with the latest firmware update:
[url<]http://www.techspot.com/article/997-samsung-ssd-read-performance-degradation/[/url<] (scroll down about 1/3 of the way) [url<]http://www.overclock.net/t/1507897/samsung-840-evo-read-speed-drops-on-old-written-data-in-the-drive/2640#post_23827674[/url<] [url<]https://bugs.launchpad.net/ubuntu/+source/fstrim/+bug/1449005[/url<] I wonder if this is related?
According to the blob post in this article, the bug they are seeing is for unqueued TRIM–i.e. classic TRIM (pre SATA-3.1). That’s pretty darn sad since that’s not exactly a *new* part of SATA.
You don’t replace good servers until the hardware or the warranty dies π
Good thing I’m still rocking spinning hard drives!! wooo! go me!
*sad panda*
A Samsung issue you say? It’s safe to assume TR and everyone else will remain mum on the subject for months and then repost the Samsung PR on how they fixed the issue only for the 850 Pro.
Look for more Samsung drives in the deal of the week though!
/abnormally pissed and not afraid to whine about it
*Hugs my 830 SSD*
Oh good, now it’s the Pro models with a potentially serious issue. Glad I’ve resisted the various Samsung SSD sales I’ve seen over the past few months.
Still rocking an 830, just added a BX100. Hoping nothing creeps up on the BX100, I think some people have had issues with older M4’s.
I had a choice of the 830 or the 840 when I built this system and I went with two 512GB 830’s. Now I’m glad I did. The 840 line had just released at the time but I looked and couldn’t find any bad reviews of the 830 line and the 840 was just to new to have any reviews at all.
I actually bought an 840, then decided to buy an 830 instead and returned the 840.
Interesting. Looks like here it’s a Linux kernel interaction problem with certain drives…Go here for the drive “black list” contained in the Linux kernel:
[url<]https://github.com/torvalds/linux/blob/e64f638483a21105c7ce330d543fa1f1c35b5bc7/drivers/ata/libata-core.c#L4109-L4286[/url<] ...and take a look starting @ Line 4109... From the article: "Our new deployments were switched to different SSD drives and we donβt recommend anyone to use [i<]any SSD that is anyhow mentioned in a bad way by the Linux kernel.[/i<] Also be careful, even when you donβt enable the TRIM explicitly, at least since Ubuntu 14.04 the explicit FSTRIM runs in a cron once per week on all partitions β the freeze of your storage for a couple of seconds will be your smallest problem."
On 14.04, /etc/cron.weekly/fstrim is set to run fstrim-all, which has a bunch of mentions that it only runs for Intel and Samsung drives.
Looking at /sbin/fstrim-all:
[code<] # As long as there are bugs like [url<]https://launchpad.net/bugs/1259829[/url<] we only run # fstrim on Intel and Samsung drives; with --no-model-check it will run on all # drives instead. [/code<] However, it then goes on to give the thumbs up to a whole bunch of manufacturers: [code<] HDPARM="`hdparm -I $REALDEV 2>/dev/null`" || continue if [ -z "$NO_MODEL_CHECK" ]; then if ! contains "$HDPARM" "Intel" && \ ! contains "$HDPARM" "INTEL" && \ ! contains "$HDPARM" "Samsung" && \ ! contains "$HDPARM" "SAMSUNG" && \ ! contains "$HDPARM" "OCZ" && \ ! contains "$HDPARM" "SanDisk" && \ ! contains "$HDPARM" "Patriot"; then #echo "device $DEV is not a drive that is known-safe for trimming" continue fi fi [/code<] Oops. At the very least, if you're concerned, then an edit of /etc/cron.weekly/fstrim to have it exit without doing anything (either by commenting out fstrim-all or just a plain 'exit 0' before that) might be advisable.
i have read this somewhere before (linux raid and samsung/trim might been the topic), this seem to be linux and trim on samsung drives, windows isnt affected it seems
I wonder if I should breathe easy about my Samsung 830 – it seems like it was a generation before the trouble began, but now I wonder if it has problems but no one’s found them yet…
knock on wood I haven’t had any funny business yet
My 830 has been flawless so far, but I share you unease with this sporadic bug reports.
I have just faced one problems with my 830.
The sequential write speed drop to 50MB/s. My drive as OP enabled and still have roughly 20-30GB free space. I have checked TRIM as well as running the optimize from Magician Software but have no luck.
Only a Secure Erase restore the write speed.
It’s been a month or two since Samsung’s latest firmware for my 840 EVO came out. Been really lazy to flash it since AFAIK I’m required to burn the ISO onto a CD and use THAT to boot and flash my SSD. I dunno, I’ve gone lazy about it. Maybe I just don’t care anymore. Maybe one of these days I’ll do it, but not today.
But I’m sure staying away from Samsung on my next SSD purchase until they get a clean bill of health.
At least run DiskFresh — the access times on my 840 and 840 EVO got so long, access times had almost gone to infinity. I’m not joking — at some point the signal gets so weak, it can’t read your data, and now you’re in data loss mode. Download the free version, and set it up to run before you go to bed, all fixed up in the morning.
Might try it. Thanks.
Do you know of some utyl like DiskFresh for the Mac?
Ergh, my old SSD’s from Samsung had that slowdown problem. Every firmware fix (hint: never made a difference) mean all drive data was lost and you had to start over.
I replace my SSD’s with the 850 after reviews are positive, and well what do ya know – my SSD’s are on the affected list.
Whoever said mechanical storage was dead again? It’s looking very appealing when Samsung rapidly become the next OCZ in the storage sector… :-/
Edit: I’ve not personally found any corrupted or missing files on my 850’s yet, but I’m a little uneasy about this news to say the least.
Hey, you know this quote?
“Fool me once, shame on you. Fool me twice, shame on me.”
I thought it went…
[quote<]βThere's an old saying in Tennessee β I know it's in Texas, probably in Tennessee β that says, fool me once, shame on β shame on you. Fool me β you can't get fooled again.β [/quote<]
Google is your pal.
Or is it “Google is your friend?”
Maybe you have a point… π
Intel’s worth the little extra IMO, especially for a system drive.
I regret buying a Samsung. Had an intel previously. Frustrating.
Me too. Kinda wish I went with Intel. Heck, even Adata doesn’t seem like it would’ve been a bad choice at this point.
Yea, I always say, “if it isn’t broken, don’t fix it” and I go a stop buying Intel SSD… Stupid me.
Amen. I used to recommend Intel or Samsung, but after the 840 EVO mess, no more of that.
So, Intel it is. There’s a bit of a price premium of course, but honestly given their well deserved reputation for stability above all, it seems more than fair.
This, a million times. And to top it up: Intel drives with an Intel controller please. Just to be on the safe side (and even then there have been trouble: I’ve got a 320 which is serving me well but that one suffered from the “your SSD just turned into an 8MB drive after a power loss” bug).
Oh Samsung, how far you’ve fallen.
Samsung is the new OCZ, here to fill the void that the old OCZ left behind when they became Toshiba.
It’s their constant rush to be ‘first’ with everything, product-catalog wide. Excellent performance on the surface, niggling issues over time. First phones and whitegoods, now solid-state storage. It’d need a shift out of the ‘race’ mentality though, and it’s probably not likely to happen anytime soon.
Why the most popular and better rated vendors seem to have the worst problems?
The Vertex 2 from OCZ was highly recommended by review sites, then they started bricking themselves after recovering from hibernation.
Then Samsung models turned to be the most recommended, and that was followed by lots of problems like these.
Maybe they are the most popular, so they are better scrutinized.
Maybe other vendors also have deep flaws, but they are not reported, because they do not reach the critical mass of users sharing information.
Could be popularity breeds demand and then demand exceeds the ability to make the drives and so they cut corners–rush firmware development, use below spec memory, etc.
Keep this up, Sammy, and you’ll be joining the ranks of OCZ, Kingston, and to a lesser extent, PNY. Good luck!
Yeah, my next buy is looking like a Crucial branded SSD at this rate.
I have a Crucial as the boot drive on my desktop. *crossfingers*
Yep, I was getting excited about purchasing an 850 EVO 500GB as soon as it goes on sale again at Newegg, but I saw this and was like, “gorram it, not another Samsung firmware issue!”
Definitely going for the BX100 instead.
I’m sticking with Intel. After that video tour of their facilities I trust them, everybody else is pretty much an unknown.
This!
My main Rig still uses two Intel G2 160 SSD in raid 0 and when I decide to upgrade it will be intel again either 730 or 750 series for my OS drive reliability is king in a storage device.
I will play extra for intel and have slower benchmarks than the samsung drive to avoid stuff like that.
Ah fair point, also need to keep them o n my radar.
Same here.
If this is confirmed, that’s the last nail in the coffin for me.
Argh… I bought an SM841 (OEM 840 Pro) thinking it’d be a good investment at a super cheap price. I have an 840 Evo in my laptop and i put an 840 in a friend’s computer. I wonder if anyone has lost their job at Samsung over all these terrible developments. They’ve certainly damaged their reputation with enthusiasts.
So we either face slow file reads on the Samsung 840 EVO/similar, or file corruption on the Samsung 840 PRO.
And then there were NAND bait-and-switch shenanigans with Kingston and PNY.
I’ll just go with Toshiba/new OCZ, [s<]Crucial[/s<], Sandisk, or Intel from now on. Is that it? EDIT: Added Intel. Whoops. EDIT2: Crucial MX100s have boot issues?
[quote<]I'll just go with Toshiba/new OCZ, Crucial, or Sandisk from now on.[/quote<] Exactly
Intel?
Are we forgetting all the Crucial firmware bugs? (5200 hours, as well as the MX100’s randomly dying?)
[url<]http://forum.crucial.com/t5/Crucial-SSDs/MX100-will-not-boot-sometimes/td-p/158815[/url<]
Yes I am. I wasn’t aware of that issue on the MX100s, thanks.
Samsungs SSD division is going through their Firestone tire phase.
Yet we’re still recommending these drives.
I’ve had 2 840 Pro 512GB models running in my system for just over 2 years now. A few months after the initial build I had one drive go bad* and I did an RMA. Other than that, I’ve never experienced data corruption and I do run Linux and I have TRIM enabled.
Having said that, I’m not running an intensive web server either, so I’m not saying these guys are wrong, just that not everybody is going to hit this bug.
* Bad to the point of being completely unrecognized by the firmware, not corrupt in a way where I could potentially do a reformat like what is seen with this bug.
Probably true, but 100% of normal users would have the previous TLC performance degradation issue, and now we’re going to see something new when the data is actually refreshed from time to time. I don’t happen to own any Samsung drives, but I certainly don’t plan to in the near future, either.
[quote<]I've never experienced data corruption[/quote<] How do you verify that no data on the drive has ever been corrupted? Do you run some kind of data scrubbing with hash/parity/error correction?
If this bug was hitting me in the way that these guys describe, I wouldn’t have to do any of that. My system would simply fail to boot.
So you really don’t know if any data has been corrupted.
Not for me.
Data corruption (especially silent corruption) is the absolute worst problem to have on any storage device and with Samsung’s current history with firmware problems it is time to move to more reliable vendors.
Any review that keeps these drives on a recommended list with these known problems is now suspect in that why should you believe any future review from them.
lolz
Oh wait told my bro to get an 840 pro for his pc, hopefully that doesn’t turn into more work for me.
edit – [url<]https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/[/url<] Hmmmm so all on Linux. Are the windows and Linux TRIM commands the same??? Intel drives ok Working SSDs: Intel S3500 Intel S3700 Intel S3710 Interesting indeed. Edit 2 - in comments - even more interesting At Aerospike, we have a NoSQL database that's been run on a lot of flash drives, for years, and we've found a few things: * Don't use TRIM. There are too many bad controllers that (at best) do bizarre performance actions. TRIM should be a good thing.... but it's (apparently) too complex for most controller writers. * Don't use a file system. Aerospike has its own native data layout, and you can use files, but lousy things happen. * Some manufacturers go through bad times, so keep testing. We built a tool [url<]https://github.com/aerospike/act[/url<] so we can prove to manufactures when they have bugs. If I tried to list all the devices that we tested and had firmware issues (either performance or "functionality") it would be a long and embarrassing list. We have over 4 years of data covering a lot of manufacturers.
I appreciate their data findings but they lost me at “don’t use a file system”. WTF
I think they meant in the context of their specific DBMS that you shouldn’t bother with a filesystem. That’s not specific to Aerospike either since there are plenty of Oracle guys who just drop the Oracle DB straight onto a storage array without there being any underlying filesystem. The storage array is just another block device and the Oracle DB handles all the I/O directly.
Of course, outside of these niche applications you need a filesystem.
Ah…OK that makes sense.
Would be awesome if Aerospike could release that data!
Some data here:
Certifying Flash Devices (SSDs)
[url<]http://www.aerospike.com/docs/operations/plan/ssd/ssd_certification.html[/url<]