Samsung docs detail Linux TRIM bug and fix

We've been covering a report from search provider Algolia pointing out a potential issue in Samsung SSDs' TRIM implementation. More recently, Samsung itself reported that the bug actually resides in the Linux kernel, and that the company had submitted a patch for the problem.

Now, we have more details of the bug. Samsung has provided us with internal documents detailing the exact cause of the issue, and the subsequent solution. We're geting a bit technical here, so we'll take some liberty to simplify. When Linux's RAID implementation receives a sequence of read or write operations, it creates separate buffers in memory for each of them.

When it comes to TRIM operations, however, a single shared buffer is used. That works in theory, except there's a bug—more specifically, a form of race condition. A sequence of queued TRIM commands in a specific order all need to make use of the shared buffer, but after the first command is queued, subsequent ones may erroneously free the buffer before the previous operation completes. Boom. The wrong sector in the disk gets zeroed out, and chaos ensues.

Samsung developed a fix and reportedly ran Algolia's test scripts for a week without issue. It then submitted a workaround patch to the Linux RAID mailing list on July 19. A healthy discussion ensued until a more permanent solution was tested and agreed upon, which was then commited to the kernel source tree.

In the meantime, users with linear, RAID 0, or RAID 10 configurations using SATA SSDs are advised to disable TRIM altogether until a kernel version is released that includes the patch. RAID1 setups are not affected. The reason why this problem cropped up with Samsung's SSD is due to the precise sequence of events needed to trigger it. Martin Petersen from Oracle notes that the bug is dependant on "timing and a very heavy discard load."

Comments closed
    • mkp
    • 4 years ago

    First of all, there is absolutely nothing Samsung-specific about the RAID vs. TRIM bug. It would happen on any SATA SSD in an MD linear/raid0/raid10 config as well as on some SCSI storage arrays. Algolia’s workload was extremely TRIM-heavy which exacerbated the issue and it was pure chance that they only saw it with one brand of SSD.

    The queued TRIM issue with Samsung drives remains open but it has nothing whatsoever to do with the Algolia bug.

    Martin Petersen

      • sustainednotburst
      • 4 years ago

      Why did the Intel drives not show the error though? (i mean this inquisitively, not as an accusation)

        • Nevermind
        • 4 years ago

        Same question.

    • ronch
    • 4 years ago

    A bug in Linux??? [b<]IMPOSSIBLE!!!![/b<]

      • Nevermind
      • 4 years ago

      That only SSD’s made by Samsung might find, as a result of their special way of TRIM’ing stripes?

        • just brew it!
        • 4 years ago

        Do you not understand what the phrase “race condition” means, or are you just trolling at this point?

        As far as we can tell based on the released info, the drive is in fact doing what the OS is asking it to do. Queued operations can be executed in a different order by different drives, BY DEFINITION. If (as appears to be true in this case) the OS does not properly take that into account and bad things happen becase the OS has asked the drive to do something potentially destructive, IT ISN’T THE DRIVE’S FAULT!

          • Nevermind
          • 4 years ago

          A race condition is sequence dependent.

          Due to the way Samsung uses TRIM in a RAID config, AFAIU, where multiple buffers are “expected” only one is used and that buffer is flushed before the data gets to where it needs to go.

          Now EVERY OTHER VENDOR does this differently, apparently. Yes, there is an issue with the design of the existing RAID stack infrastructure that Samsung did not successfully “workaround” with their “pretty gross hack” (explained above rather well and technically specific, I think) but given Samsung is the ONLY vendor having the problem of this type of data corruption, does blaming the existing linux kernel really explain away their testing methodology failure to notice/admit this?

          It’s also not a classic “race condition” example when multiple buffers are expected and only one exists. That’s something else, I don’t know what to call that. Call it a bug.

          The question is why is Samsung the only one tripping on the bug, and you jumped over that.

            • sustainednotburst
            • 4 years ago

            Nevermind,

            one thing i’ve noticed you keep mentioning is QUEUED TRIM. in Algolia’s case, they specifically stated that Queued TRIM is disabled in their systems.

            • just brew it!
            • 4 years ago

            Oh, interesting…

            • sustainednotburst
            • 4 years ago

            [url<]https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/[/url<] "UPDATE June 16: A lot of discussions started pointing out that the issue is related to the newly introduced queued TRIM. This is not correct. The TRIM on our drives is un-queued and the issue we have found is not related to the latest changes in the Linux Kernel to disable this feature."

            • just brew it!
            • 4 years ago

            OK, so it must be solely related to how quickly (or slowly) the drive processes the TRIM command, and not due to any differences in command reordering.

            • sustainednotburst
            • 4 years ago

            Also to add,

            The only drives they tested outside of Samsung drives were 3 Intel drives. They didn’t test any others. So we don’t really know whether or not this issue only happen on Samsungs.

            • Nevermind
            • 4 years ago

            I didn’t read that. But TRIM is queued either way despite what they’re calling ‘queued’ TRIM. That’s a proper feature name.

            I mean the underlying TRIM functions queued against other underlying functions, or in this case against itself, because it expects to have multiple buffers for RAID but only apparently ‘has’ one.
            It doesn’t do what it expects itself to do by virtue of the ‘missing’ buffers which creates a ‘queue’ problem in the internal garbage collection operation, and data is lost before it’s moved. Something like that. I’m not pontificating.

            And for that reason I differentiated between that and a more readily understood ‘race condition’. Correct me if I’m wrong, because I take no personal ego cache hit.

            Multiple buffers here.

    • Nevermind
    • 4 years ago

    Exactly as predicted, a conflict between two garbage collection subroutines.

    “A sequence of queued TRIM commands in a specific order all need to make use of the shared buffer, but after the first command is queued, subsequent ones may erroneously free the buffer before the previous operation completes. Boom. The wrong sector in the disk gets zeroed out, and chaos ensues.”

    And they blame the Linux kernel – but they’re the only one having this issue, right?

    Wouldn’t ALL SSD’s have this issue?

      • chuckula
      • 4 years ago

      [quote<]And they tried to blame the Linux kernel, that's just typical Samsung chutzpah. [/quote<] Actually: 1. It most certainly *was* a bug in the Linux kernel. 2. You need to look at who was blaming whom: Samsung was incorrectly demonized for months by a lot of people who took this Linux kernel bug and automatically blamed Samsung without even considering that Linux might have a bug. The kernel devs even wrongly blacklisted Samsung's SSDs from some TRIM operations because they just assumed there couldn't be a bug in their own code.

        • notfred
        • 4 years ago

        [quote<]The kernel devs even wrongly blacklisted Samsung's SSDs from some TRIM operations because they just assumed there couldn't be a bug in their own code.[/quote<] I can't blame the Linux kernel devs for that. As a developer myself if my code runs fine on hardware A and crashes in flames on hardware B then my first thought is that hardware B is doing something bad.

          • madseven7
          • 4 years ago

          yes you can. It’s called being more thorough.

          • Sabresiberian
          • 4 years ago

          The key phrase here is “first thought”. Getting locked into your first thought means you can’t find the solution if your assumption is wrong.

          The thought needed to be checked out thoroughly. I’d have looked at it and said “Windows systems don’t have the problem, so if it only happens on Linux systems, maybe we should be digging deep looking for a bug in Linux code”. I’d want Samsung working on it on their end, but I’d be trying to find out why only Linux had the issue.

        • mkp
        • 4 years ago

        The blacklist entries are there because Samsung recently started advertising support for the *queued* variant of the DSM TRIM command in their firmware (updates for older drives as well as factory preload for new ones). For some reason things are not working between our implementation and theirs. We have contacted Samsung and would like to work with them on this issue.

        In the meantime we have disabled queued TRIM support on Samsung drives to prevent errors. The original, unqueued TRIM support remains enabled for all Samsung models and works fine. Please note that Windows does not yet support queued TRIM so Linux is the only OS exercising this recent addition to ATA command spec.

        In any case: Blacklisting of queued TRIM on drives from Samsung and other vendors has nothing to do with the data corruption issue at Algolia.

          • Nevermind
          • 4 years ago

          “For some reason things are not working between our implementation and theirs”

          That’s basically what I’m getting at, blaming one side or the other without a direct cause->effect explanation is pretty ridiculous but that’s exactly what Samsung did out of reflex, and sure enough it’s a mixed issue. This was the same when people reported problems with their 830 models and they ignored the problem for 6 months, blamed “implementations” of the bios and all sorts of stuff. It’s typical for big companies these days I guess.

            • w76
            • 4 years ago

            Just big companies, and just big days? I’m pretty sure human nature has been to deflect blame from the evolutionary start. But yeah, lets deflect blame more and blame faceless companies for being human. 😉

        • BobbinThreadbare
        • 4 years ago

        That’s not a wrongful blacklist. X isn’t working right with system Y. Blacklist until we fix either X or Y.

        • Nevermind
        • 4 years ago

        WHY is Samsung the ONLY one having the problem?

      • Prototyped
      • 4 years ago

      It wasn’t garbage collection. It was a mismatch between the expectation of Linux’s Software RAID 0 implementation and the block I/O layer underlying it. A failure to adhere to a contract. According to Martin Petersen this is a very old bug, and he was surprised someone noticed it only now. Essentially everyone had been very, very lucky.

      [url<]http://marc.info/?l=linux-raid&m=143741678313051&w=2[/url<] The blast radius of the bug actually extends even to thinly-provisioned volumes, e.g. LUNs exported from a Storage Area Network. But as he says, it's pretty unlikely someone would be setting up [b<]software[/b<] RAID 0/10/JBOD on [i<]top[/i<] of such a volume. This is yet another manifestation of problems when mutable data structures are passed around with a lack of clarity as to which part of the system is allowed to modify them. (In this case the RAID implementation wrongly assumed that the block I/O layer wouldn't change a data structure out from under it, whereas the block I/O layer happily nerfed it.)

        • willmore
        • 4 years ago

        Actually, Nevermind was correct. On the drive side, TRIM is part of the garbage collection system. On the Linux block driver side, it was a premature free of a data struction–which is garbage collection of a sort.

          • Nevermind
          • 4 years ago

          That is what I meant by garbage collection.

          • just brew it!
          • 4 years ago

          Well, he was half right (first part of his post). Second half of his post is inaccurate though, as it would appear that Samsung’s firmware got the garbage collection right, while it was the Linux software RAID stack that got it (subtly) wrong.

            • Nevermind
            • 4 years ago

            It’s was the implementation of both that caused the problem. They didn’t test for the edge case.

            But ONLY SAMSUNG was having this issue at this point.

            And of course they say it’s “entirely” a linux kernal issue – which it isn’t really.

            • just brew it!
            • 4 years ago

            Sorry, but I must disagree here. From everything I’ve read about this issue so far, this is a bug in the Linux RAID implementation, and Samsung implemented the SATA queued TRIM spec correctly. If you can find evidence to the contrary, please post a link. Software RAID-0/RAID-10 of consumer SSDs on Linux isn’t exactly a common use case, so I don’t find it unreasonable that they didn’t specifically test their drives against it.

            It is simply not reasonable to expect hardware vendors to proactively test for, and implement workarounds for, edge case bugs in software that is not under their direct control. The manufacturer’s responsibility is to correctly implement the specifications for the device interface, and work with software vendors to diagnose unexpected incompatibilities when they arise. AFAICT, in this case Samsung has done both.

            • Nevermind
            • 4 years ago

            “Software RAID-0/RAID-10 of consumer SSDs on Linux isn’t exactly a common use case, so I don’t find it unreasonable that they didn’t specifically test their drives against it.”

            “It is simply not reasonable to expect hardware vendors to proactively test for, and implement workarounds for, edge case bugs in software that is not under their direct control”

            But why are they the ONLY ONES tripping on it then? Wouldn’t ALL SSD’s be having the same?

            • just brew it!
            • 4 years ago

            Please read what I’ve said elswhere in these comments about queued TRIM. Queued TRIM introduces a lot of potential variability in terms of ordering and timing of operations. Do other vendors even IMPLEMENT queued TRIM on their consumer drives? From what I’ve heard it is actually somewhat unusual in the consumer space, being considered somewhat of an enterprise-level feature. (And enterprise users probably aren’t doing software RAID, and even if they are they probably aren’t using RAID-0 or RAID-10.)

        • Nevermind
        • 4 years ago

        “According to Martin Petersen this is a very old bug, and he was surprised someone noticed it only now. ”

        Well that’s self contradictory.

        • mkp
        • 4 years ago

        The bug wasn’t really the MD driver’s fault. A block layer optimization did not take into account that the SCSI disk driver would need to change things at the bottom of the stack.

        When you are reading and writing a disk device, the command you send to the device usually contains the operation you want to perform, a start LBA and a block count and a few other things. Separately from that you’ll have a data in/out buffer that contains the data to be written or the pages in memory the data from disk needs to be read into.

        However, DSM TRIM is special in that it takes multiple ranges of input. These won’t fit in the command descriptor so the ranges are stored in the *data* buffer. This is essentially what’s causing us grief. Unlike a read or a write there is no memory buffer representing the 2GB block range you are about to discard on disk. So we allocate a page of memory to carry the block range descriptors. And that page is then retrofitted into the discard request which up until that point had no buffer associated with it. It’s a pretty gross hack but there are lots of reasons why we have to do it that way.

        In the case of the MD linear/raid0/raid10, all requests that had the same parent would end up sharing the same buffer pointer (well, bio_vec). So depending on timing and command processing speed of the drive’s host facing interface we’d occasionally end up sending down a discard range belonging to a different drive in the stripe. I.e. discarding blocks that were potentially in use.

        Martin

          • Nevermind
          • 4 years ago

          “It’s a pretty gross hack but there are lots of reasons why we have to do it that way.”

          Can you explain?

            • mkp
            • 4 years ago

            Well, fundamentally the entire I/O stack depends on the payload residing in bio_vecs that can be mapped to scatterlists that in turn can be used to set up DMA transfers. If we stuck the TRIM payload somewhere else we’d have to change a pretty significant amount of code. Also, the way we keep track of how many bytes to transfer and how many bytes have successfully completed would have to change.

            Both of these are certainly doable but would add a reasonable amount of extra complexity and add an ongoing maintenance burden. We would all like a solution that’s more aesthetically pleasing but we’re at the end of the road for SATA and our focus is on removing complexity rather than adding it. Shortening the I/O paths to reduce latency for high IOPS flash and post-flash devices is our main focus. We aim to design for the future, not for the past.

            • Nevermind
            • 4 years ago

            “We aim to design for the future, not for the past.” TRIM is a basic and required feature so yeah, I think “changing a significant amount of code” might be required.

            When they tried to blame the Linux kernel they were insisting other people needed to “change a significant amount of code” also, right?

            Low latency sounds great. But you still have to make sure IOPS go in order and fit the spec.
            And when a problem is found in Samsung’s drivers (as has been the case with 830, 840, and 850) they’ve denied anything is wrong first and “changed a significant amount of code” later.

          • BobbinThreadbare
          • 4 years ago

          Awesome info, thanks!

          • Nevermind
          • 4 years ago

          I just think there’s a better way than the ‘gross hack’ method people aren’t exploring, probably due to the amount of money required to rewrite and test everything.

          But after all, this is what customers store THEIR DATA on. So if there’s an issue with the garbage collection and RAID, that needs to be something they test for and find themselves and if consumers bring it to their attention, they DO have a responsibility to “rewrite significant amounts of code” to make things work correctly, IMO.

          People with other vendor SSD’s in RAID aren’t having these issues are they.

      • just brew it!
      • 4 years ago

      The shared buffer screwup is in the OS, not the drive. The OS is, in fact, telling the drive to TRIM the wrong sectors. The drive is doing what it is told, so on this particular issue, it would appear that Samsung is not at fault, and in fact helped diagnose a tricky bug in the Linux software RAID implementation.

      This in no way excuses their handling of the TLC cell leakage issue in the 840 series drives…

        • Nevermind
        • 4 years ago

        Why is Samsung the only one having the problem then?

          • just brew it!
          • 4 years ago

          As already noted, it is a race condition; this means it is timing related. Samsung is probably executing the commands in question a little slower or faster than other vendors, or re-ordering them differently (which NCQ explicitly allows, in fact that is the whole point of NCQ). As long as their drives implement the specs for queued TRIM correctly, this is not “their” bug.

            • Nevermind
            • 4 years ago

            A race condition is sequence dependent. Due to the way Samsung uses TRIM in a RAID config, AFAIU, where multiple buffers are “expected” only one is used and that buffer is flushed before the data gets to where it needs to go. Now EVERY OTHER VENDOR does this differently, apparently. Yes, there is an issue with the design of the existing RAID stack infrastructure that Samsung did not successfully “workaround” with their “pretty gross hack” (explained above rather well and technically specific, I think) but given Samsung is the ONLY vendor having the problem of this type of data corruption, does blaming the existing linux kernel really explain away their testing methodology failure to notice/admit this?

            • just brew it!
            • 4 years ago

            Samsung doesn’t have a “way they use TRIM in a RAID config”. The drive has no way of knowing it is being used in a RAID array. All of the buffer misuse/abuse is happening in the OS.

            As I’ve explained multiple times already, command queueing EXPLICITLY gives the drive flexibility to execute commands out-of-order. My guess is that the drive is doing things in a different (but still legal) sequence compared to other drive vendors, and this is what is triggering the race condition in the OS.

            As also noted previously, this is a pretty unusual use case. While it would’ve been nice of them to test for it, it is unreasonable to expect them to. New Linux kernels are released every few weeks — which one are they even supposed to test against? Should they test against FreeBSD too, since some people use that for file/network servers? What about ZFS? BTRFS? All of the different intelligent hardware RAID controllers on the market? Any of these could potentially have bugs like this that only show themselves under very specific conditions.

            • Nevermind
            • 4 years ago

            Other vendors DO queue TRIM, they don’t have this issue. That’s my point.

            “My guess is that the drive is doing things in a different (but still legal) sequence compared to other drive vendors”

            What makes you assume there is no problem with their sequence choice?

            Since NO OTHER VENDOR has this problem or attempts TRIM in this exact way?

            Doesn’t their implementation of TRIM in their firmware fall under their responsibility?
            Doesn’t testing edge-case scenarios of software RAID make sense for a global vendor?
            Doesn’t this fall into a pattern of denying problems with their SSD firmware GENERALLY?

            Or am I being absurd by pointing all this out?

            • just brew it!
            • 4 years ago

            [quote<]Other vendors DO queue TRIM, they don't have this issue. That's my point. "My guess is that the drive is doing things in a different (but still legal) sequence compared to other drive vendors" What makes you assume there is no problem with their sequence choice?[/quote<] The fact that fixing an ACKNOWLEDGED BUG in the Linux kernel results in correct system operation?

            • Nevermind
            • 4 years ago

            That doesn’t explain why Samsung is the only one hitting it, does it?

            You keep avoiding that part intentionally.

            Let’s have a self-driving car analogy, Samsung’s keeps crashing on a certain road.
            Well, nobody else is crashing there. Is it a flaw in the road? Perhaps.
            But nobody else is crashing there.

            So before they sell a million of these cars, that needs to be tested for.

            • just brew it!
            • 4 years ago

            No, you keep ignoring a completely plausible explanation.

            • Nevermind
            • 4 years ago

            Not really, you’re jumping over the repeated emphasized point: Samsung is having the issue.

            Nobody else.

            • just brew it!
            • 4 years ago

            And you’re overlooking the fact that the Linux RAID stack was issuing incorrect or ambiguous commands to the drive, and relying on the fact that other drives just happened to react in a way that didn’t cause problems. If anything, Samsung deserves kudos in this case for helping to improve the Linux kernel.

            • Nevermind
            • 4 years ago

            And if Samsung had changed their firmware that’s another way of tackling the issue.

            To be in-line with all the other vendors, who do not have this issue.

            • just brew it!
            • 4 years ago

            If they had done that, the Linux kernel devs would’ve remained oblivious, and the bug would have remained in the kernel code. Eventually someone else would’ve probably tripped over it. Silently modifying their own firmware to work around a bug in Linux is ABSOLUTELY THE WRONG WAY to handle something like this; they did the right thing. So I don’t get why you keep harping on this.

            • Nevermind
            • 4 years ago

            ” Eventually someone else would’ve probably tripped over it. ”

            But nobody ever had, until Samsung’s implementation of TRIM queuing did.

            Is there ” no other way” of doing it? Obviously of course there is, everyone else did it.

            Is there ” no reasonable way to test for it?” Of course there is. You pay for that.

            And yes, other vendors DO use queued TRIMing.

            • just brew it!
            • 4 years ago

            I give up. Obviously you’ve made up your mind to bash Samsung over this for some reason, even though they’ve been exemplary members of the Open Source community by diagnosing a significant bug in the Linux kernel that has the potential to cause data corruption, reporting it to the appropriate developers, and proposing a fix (even though a different fix was ultimately chosen).

            By your logic, if I spot an open manhole I should just jump over it and not bother reporting it to the authorities. “Nobody else has fallen in yet, and I know it’s here now so *I* won’t fall into it in the future… if anyone else falls in, that’s THEIR problem!”

            • Nevermind
            • 4 years ago

            ” Obviously you’ve made up your mind to bash Samsung over this for some reason”

            Not really, how did I ‘bash’ them by saying the problem was, AFAIK, AFAWK, experienced by them alone?

            I think firmware implementations have problems like this every day. If I wrote firmware, I could probably point to some examples of that, but I don’t. Some others have chimed in with some technical details and most of that I was able to parse, some of it was above my pay grade.

            Some mentioned “inelegant hack” among other justifications of why it was so.

            That doesn’t completely absolve Samsung in their handling of this, or other, firmware related issues that are not sufficiently tested for until released by millions to the public.

            Obviously I’d give OCZ much LOWER marks! As well as a few other manufacturers and vendors. I have no personal gripe against Samsung whatsoever.

            However your defense of them has been STRIDENT. I wonder what your potato is in this oven?

            • Nevermind
            • 4 years ago

            “Silently modifying their own firmware to work around a bug” Yeah that would be catastrophic!

            • just brew it!
            • 4 years ago

            It is certainly non-optimal to not disclose a bug in the Linux kernel that you know exists. It means that someone else will eventually have to waste the time to diagnose it all over again.

            • _ppi
            • 4 years ago

            Maybe other vendors did, but this reminds me a bit situation with MS Internet Explorer 6 – It followed standards horribly, so web developers were designing their pages around the buges in IE6.

            Ultimately it hurt MS as well, because enterprises did not want to leave IE6 due to their own applications. And that did not help security either. My previous employer used IE6 until ~2010, when IE9 was available. There are probably some that use it still now.

            Here you have a bug, where international Linux development community acknowledged it is Linux bug. Even from descriptions in the article and in comments here, it is clear Linux was doing something wrong.

            And what is better – covering up for bad code elsewhere or fixing the root cause?

            But hey, maybe somebody could get together multiple SSD drives to check, if other vendors were really unaffected.

            • Nevermind
            • 4 years ago

            I’m NOT saying anyone should “cover up” anything, for the record.

            I don’t want to be construed along that rationale.

            • flip-mode
            • 4 years ago

            It is incredible to see that you would rather see Samsung create a workaround to sidestep a Linux bug rather than see the bug fixed, just to maintain the appearance that the problem was with Samsung and not with Linux. I am searching for what words best describe your position. Immoral, dishonest, careless, corrupt.

            • Nevermind
            • 4 years ago

            Can you quote me saying that? That’s not what I said at all.

            I specifically went out of my way to say I’m NOT saying that.

            Samsung sells a product, their product has an issue that no other vendors product had.

            Who should test and find that bug and determine what’s causing it?

            A: Everyone but Samsung
            B: Samsung, with help
            C: The world of open source software generally
            D: Unwitting consumers who bought a product with a bug and suffer data loss to notice it.
            E: Me personally for mentioning it.

    • willmore
    • 4 years ago

    Despite what Martin Petersen says, it remains strange that Samsung was the only one seeing this. And, yeah, I read his comment. It is quoted in its entirety in the article here. I clicked the link hoping that the quote here was just a summary. Nope, that’s the whole comment.

      • SuperSpy
      • 4 years ago

      Not knowing how any of it works internally, my only guess would be that maybe Samsung drives return from trim calls faster, or they defer them and return immediately.

        • willmore
        • 4 years ago

        Could be. With race conditions, the devil is very much in the details.

        • Prototyped
        • 4 years ago

        From the discussion it seemed to me that Samsung allow multiple blocks to be UNMAPped at the same time, which appears to be a relative rarity for some reason . . .

          • mkp
          • 4 years ago

          The concurrency came from the fact that it was a RAID deployment and multiple drives were being accessed in parallel. One single request from fstrim had to get turned into multiple requests by MD, one for each drive in the RAID stripe. Due to the bug, all those requests essentially ended up sharing the same buffer pointer.

            • Nevermind
            • 4 years ago

            That’s correct.

            • Nevermind
            • 4 years ago

            So, whose bug is that if it’s only affecting the way Samsung collects garbage?

      • just brew it!
      • 4 years ago

      Race conditions like this are often sensitive to seemingly innocuous changes in the environment. Different hardware, different number of CPU cores, system loading, etc. can all change the behavior of a system enough to expose (or mask) bugs that are dependent on the timing relationships between events. That’s one of the reasons bugs like this can be so difficult to diagnose.

      Throw NCQ into the mix (as in this case) and things get even more complicated, since the drives are allowed to reorder commands before executing them. I.e., the OS tells the drive “do A then B”. But if A and B are independent operations, the drive is free to do B before A instead, if that would potentially result in better overall performance.

        • Nevermind
        • 4 years ago

        I don’t know if this is the classic “race” condition example though really. Where multiple buffers are expected only one is used. I don’t know what to call that.

          • just brew it!
          • 4 years ago

          I’d say it qualifies. But even if it doesn’t, the bug is clearly in the Linux RAID implementation regardless.

            • Nevermind
            • 4 years ago

            But you keep not explaining why Samsung is the only one tripping on the bug.

            If it’s a RAID stack bug why hasn’t every single SSD ever made been affected in that setup.

            • just brew it!
            • 4 years ago

            I’ve offered up a plausible explanation multiple times already, you just keep ignoring it.

            I repeat: Queued TRIM. (Or possibly the lack thereof, in the case of some drives.)

            Or to turn your question on its head, if this is a Samsung firmware bug, why did patching the bug they found in the Linux RAID code make the problem go away?

            • Nevermind
            • 4 years ago

            But Samsung is the ONLY one with this problem.
            Samsung is not the only one queuing TRIM commands.
            Hence the “plausibility” of the explanation is not sufficient.

            If it was a software raid – anywhere – issue with linux and TRIM,
            why would Samsung be the only vendor affected?

            Their implementation of the TRIM command, in their firmware,
            hits a snag. Now did that snag exist outside of their firmware, yes.
            But their firmware hits it squarely and nobody else’s did.

            You can say this is an edge case, software raid0/1+0.
            You can say “the problem isn’t the drive HW” that’s true.

            I would counter by saying testing SSD firmware in every possible raid config
            is probably a good idea for a vendor of worldwide sales of storage hardware,
            with profits in the billions…

            But maybe we don’t have to see eye to eye on that.

            • just brew it!
            • 4 years ago

            [quote<]But Samsung is the ONLY one with this problem. Samsung is not the only one queuing TRIM commands. Hence the "plausibility" of the explanation is not sufficient.[/quote<] When you consider that queueing essentially tells the drive "do these commands in whatever order you want, just tell me when each one is done" I think it is MORE THAN sufficient. It introduces a lot of potential variability into the sequencing and timing of command execution across drive vendors, which could easily cause that broken buffer allocation code in the kernel to behave very differently. [quote<]If it was a software raid - anywhere - issue with linux and TRIM, why would Samsung be the only vendor affected?[/quote<] See above. Furthermore, if it is a bug in Samsumg's firmware, why is Windows NOT affected? [quote<]Their implementation of the TRIM command, in their firmware, hits a snag. Now did that snag exist outside of their firmware, yes. But their firmware hits it squarely and nobody else's did.[/quote<] Again, see above. Command queueing introduces variability. That is BY DESIGN, to give vendors of HDDs/SSDs the flexibility to execute commands in the order that results in best throughput on their hardware. Different vendors are free to implement it differently, as long as they adhere to the SATA spec for how command queueing is supposed to work. While I don't KNOW FOR A FACT that Samsung has fully adhered to the specs for queued TRIM, nothing that has been published so far suggests otherwise. [quote<]You can say this is an edge case, software raid0/1+0. You can say "the problem isn't the drive HW" that's true. I would counter by saying testing SSD firmware in every possible raid config is probably a good idea for a vendor of worldwide sales of storage hardware, with profits in the billions...[/quote<] As I noted in another post, "every possible RAID config" is going to result in dozens -- if not hundreds! -- of potential test cases. In a perfect world, yes it would be nice if every vendor tested every one of their releases against every release of every other vendor's stuff that it might need to work with. In the real world, computer hardware is a very competitive market, often with slim margins and tight deadlines. We're talking about a lot of expense to acquire all the hardware, lease lab space, and hire an additional team of test engineers; then there will be (likely) months of additional time to meticulously validate every new firmware release against all possible operating modes (and re-validate it against each new OS kernel and RAID controller firmware version as they are released). [quote<]But maybe we don't have to see eye to eye on that.[/quote<] Like I said, in a perfect world it would be nice. But it just isn't even remotely practical.

            • Nevermind
            • 4 years ago

            “As I noted in another post, “every possible RAID config” is going to result in dozens — if not hundreds! — of potential test cases. ”

            Is that an unreasonable thing to expect of a billions-in-profits global vendor of storage hw?

            Really? I don’t think so.

            • just brew it!
            • 4 years ago

            It’s reasonable to expect them to test for adherence to the SATA specification, and with commonly used hardware/software configurations. For everything else, it is perfectly reasonable to take the attitude that anything that relies on incidental behavior (i.e. queued commands going in a particular order or taking a specific amount of time) might result in bad behavior. IT’S NOT THEIR JOB TO DIAGNOSE OTHER PEOPLE’S BUGGY CODE!

            And the number of potential test configurations is probably more like thousands, once you factor in all the possible levels of RAID, the different file systems that are widely used on Linux, use (or non-use) of LVM, and so on. Any of these could potentially affect the pattern of TRIM commands issued to the drives, so by your logic we should test them all to catch incompatibilities like this, eh?

            They didn’t get to BE one of the largest vendors of SSDs by spending months or years testing each firmware release. If they did that they’d miss the market window.

            This whole thread is rather surreal. Never thought I’d end up defending a vendor I’ve vowed to avoid in the future (due to the 840 EVO fiasco)!

            • Nevermind
            • 4 years ago

            “IT’S NOT THEIR JOB TO DIAGNOSE OTHER PEOPLE’S BUGGY CODE!”

            Actually when they’re the only vendor having an issue with it, IT BECOMES THEIR PROBLEM!
            And when they sell millions of products with the problem and NOBODY ELSE DOES?
            Then it’s their problem ALONE, isn’t it? From a business standpoint at LEAST.

            Maybe they want a high call volume to their support center instead, who knows…

            • just brew it!
            • 4 years ago

            [quote<]Actually when they're the only vendor having an issue with it, IT BECOMES THEIR PROBLEM! And when they sell millions of products with the problem and NOBODY ELSE DOES? Then it's their problem ALONE, isn't it? From a business standpoint at LEAST.[/quote<] ...and that's exactly what happened. They figured it out, and informed the Linux developers. So I'm not sure what your issue is at this point.

            • Nevermind
            • 4 years ago

            The issue is this isn’t the first time Samsung has missed major problems with their firmware, regardless of whether or not the firmware itself causing their issue that nobody else has.

            The issue is the 840, 830, EVO and other firmware issues that they’ve blamed on BIOS this and “edge case that” but which turned out to be directly their responsibility entirely.

            And now we have a case where they do things in a strange, queued way without the ability to monitor the queuing effectively and in a way different from any other vendor, and it has data loss problems that they didn’t test for – and of course it’s someone else’s fault also.

            I guess the fact that nobody else trips over this particular bug in their millions-sold product…
            Makes it Linux’s problem to solve, for their weird implementation of TRIM to work, sure.

            That’s blaming the road stripes because your self-driving car crashed though.

            • just brew it!
            • 4 years ago

            Yes, they have screwed up royally in the past. I don’t buy Samsung SSDs any more as a result of this! But criticizing them when they actually do something RIGHT makes no sense.

            And your road stripe analogy makes absolutely no sense in this context. Your position is more like programming the self-driving car with incorrect road map data that indicates there are two viable routes from point A to point B (when in fact there is only one); then getting angry at the person who designed the car’s navigation algorithm when the car choses the wrong route. Direct your ire where it belongs instead (at the creator of the erroneous map data). Just because differences in navigational algorithm causes another self-driving car to choose the correct route doesn’t make the map data correct, or make the first car’s navigation algorithm wrong.

            • Nevermind
            • 4 years ago

            ” But criticizing them when they actually do something RIGHT makes no sense.”

            I don’t think I’m criticizing them for doing something right here..

            • Firestarter
            • 4 years ago

            Then what are you doing? Do you suggest that they should forget about the solution to the problem and instead start figuring out how to work around it?

            • Nevermind
            • 4 years ago

            “Do you suggest that they should forget about the solution” NO.

            I’m suggesting this isn’t the only problem. I’m suggesting they could have caught it, and others.

            • just brew it!
            • 4 years ago

            How much more explicit do I need to be?

            1. You’re criticizing them for their conduct regarding this TRIM issue.

            2. As a result of Samsung’s own internal investigation, conducted on their own dime, a previously unknown bug was found in the Linux RAID code.

            3. Based on the information Samsung passed along to the kernel developers, a fix has been implemented in the Linux RAID code.

            4. The fix results in correct operation of TRIM on Linux RAID arrays, when using the affected Samsung drives.

            5. We currently do not have any evidence that Samsung’s implementation of TRIM on these drives (queued or otherwise) deviates from the SATA specification.

            Do you dispute the factual accuracy of ANY of the above? If so, then on what grounds? If not, then WTF is it that you think Samsung did wrong? And don’t repeat the ridiculous “they should’ve tested with every possible combination of hardware/software their drives could conceivably be used with” nonsense, NOBODY does this.

            You either don’t understand how industry standard interface specifications are supposed to work, or have unrealistic expectations regarding the level of validation that is customary for a manufacturer to perform, or have a vendetta against Samsung. Maybe all of the above?

            • Nevermind
            • 4 years ago

            1. I’m “criticizing” them for buggy releases generally. This is one bug that they experienced that turned out to be triggered by their software, and theirs exclusively AFAIK.

            2. Their “own dime” was funded by sales of these products in the millions of dollars.
            Not so with Linux drivers.

            3. A fix has been developed. Yes. To this one issue. How long after first reported? Do you know?

            4. Have you tested this fix to say so? I haven’t. I’ll take your word I guess.

            5. We have evidence that Samsung was the only vendor affected by this bug, due to the way they sequenced their TRIM ‘under heavy load’. Until evidence proves otherwise, if.

            Am I disputing the facts, not really no. I see them in the context of other bugs that SSD’s made by this vendor have had, and obviously I’m MUCH more critical of OCZ than Samsung.

            I do believe testing TRIM under load in a real-life testing regime under the various RAID configurations was possible a long time ago, and when multiple bugs on multiple products kept popping up after they’ve been on the market for months, that might have QA take on a much more direct economic consideration in terms of potential recalls or lost future sales.

            But you’re right, it’s just a consumer product. Why should I expect them to test it rigorously and find these types of bugs before they sell them for millions worldwide?

            • just brew it!
            • 4 years ago

            Re #1 and #5: Blaming them for “triggering” this bug is silly. The bug isn’t in their code, and was effectively a ticking time bomb. Someone would’ve tripped over it eventually, and we are fortunate that the vendor it happened to affect first had the technical skills and inclination to trace it back to root cause.

            Re #3 and #4: No need to speculate about the timeline or take my word for it. The details are posted on the Algolia site.

            Re testing: As I’ve already noted, I expect vendors to test rigorously for standards compliance and expected common use cases. While Samsung has certainly shown deficiencies in this area in the past (840 EVO fiasco), it is doubtful whether such testing would have caught this TRIM issue, since the use case was something only a small fraction of users would ever see. Your implication that they should have tested with every possible hardware/software combination that their drives might be used with before releasing the product is unreasonable, and at odds with industry practice.

            • Nevermind
            • 4 years ago

            “They didn’t get to BE one of the largest vendors of SSDs by spending months or years testing each firmware release”

            Yeah spending basic money on QA testing has been responsible for countless market failures…

            I just can’t think of any…

            • just brew it!
            • 4 years ago

            Show me another SSD vendor who exhaustively certifies their drives in the way you are implying, and maybe you’ll have a case. If anyone is doing it, they’ll trumpet it in their marketing materials because it is likely costing them tens of millions of dollars.

            • Nevermind
            • 4 years ago

            You know I’m not trying to dump on Samsung in particular right? I own Samsung stuff.
            Their SSD shenanigans I’ve been reading about while I buy cheap OCZ’s that fail instantly.

            But they in particular roll their own firmware and overprovisioning magic like everyone else.
            And they are having the issue, and AFAIK, nobody else is – I don’t mind being proven wrong.

            Obviously my original synopsis was 1/2 right at best, but so was Samsung’s frankly.

            But this is a testable, discoverable problem and it wouldn’t cost 10 million dollars really.
            And 10 million to a company making billions off a product sold worldwide, at that.

            • sustainednotburst
            • 4 years ago

            Like I said in another post. The only other drives that were tested were 3 Intel SSDs. We don’t know about Crucial, OCZ, Plextor, Sandisk, etc. Algolia are the only ones who reported having the issue, so you’d have to ask them if they can try or did try any other company’s SSD, but I don’t think they’ll do that seeing as the issue is resolved by fixing the kernal.

            • Nevermind
            • 4 years ago

            Go test as many as you like. When you find one outside of Samsung that does it, let us know.

            So far we can only go by what is reported, and so far, it’s only tripped by Samsung’s implementation of the subroutines, as far as we know. So that’s what I went by.

            This also isn’t the only bug Samsung is facing in their firmware. Nor Linux in ‘our’ drivers.
            But this was enough to corrupt data and cause data loss, that’s a big deal in SSD ‘errors’.

            My point was that testing a drive’s TRIM function and load testing the drivers and firmware in real-world environments is money well spent at the scale SAMSUNG sells them worldwide.

            Because at the end of the day it’s their responsibility to make sure their drive works, where other vendors drives work, and make them work better. Regardless of whether the tripping point is actually undiscovered in a Linux driver or the subroutine ordering of the SSD’s firmware, if ONLY SAMSUNG is tripping on it, it behooves them to find out why.

            Finally now, (how long has it been on the market already?) it’s being patched.
            What was the first date of reporting of this bug to Samsung?

            What about the other bugs with the 830/840/EVO, some STILL outstanding?

            That’s what I’m getting at. If this was the only problem they had we wouldn’t have noticed.

            • sustainednotburst
            • 4 years ago

            I ain’t gonna test anything, i hate linux and i don’t got the money to buy a bunch of drives.

            What’s wrong with the 830? The 840 and 840 EVO had the read speed bug, Samsung fixed it on the 840 EVO, whether you wanna believe it was a “bandaid” fix (which its not) or not. But as far as i know, the 830 drives had no issue(s).

            The only ppl to ever have and report the data loss issue was Algolia, and based on their updates, Samsung responded pretty quickly. We have no idea if any other vendor, other than those Intel drives, might or might not have the issue. You can’t claim other vendors aren’t afflicted or that Samsung is the only one, as no other vendors’ drives were tested.

            Its as simple as this, there is a problem going on, if changing something in the software fixes it, then the software was the problem, and if changing something in the hardware fixes it, then the hardware is the problem. In this case, the kernal was the issue, and the kernal was patched, and the issue is gone.

            Only 2 Samsung drives claim they support Queued TRIM, the 840 EVO (after the new FW EXT0DB6Q) and the 850 PRO (after EXM02B6Q). Algolia tried older Samsung drives and they had the same issue, so Algolia’s data loss issue is unrelated to queued TRIM, tied in with the fact that they had queued TRIM disabled in the first place.

            Lastly, i feel like you keep missing the point that Algolia were the only ones who were ever afflicted by this TRIM data loss issue.

            • Nevermind
            • 4 years ago

            “Its as simple as this, there is a problem going on, if changing something in the software fixes it, then the software was the problem, and if changing something in the hardware fixes it, then the hardware is the problem.”

            That’s a fine example.. except you can have edge case issues between technologies that really could be “solved” by changes to either side. I’m not saying “Queued TRIM” is the cause of the bug either.

    • cobalt
    • 4 years ago

    Awesome! When are they going to acknowledge, or better yet “fix” (workaround), the issues with the vanilla (non-EVO) 840’s?

      • Nevermind
      • 4 years ago

      Right? That’s EXACTLY what I’m talking about.

      • dmjifn
      • 4 years ago

      I sold mine but honestly these are still useful and sane if you need temp space. Like a FreeNAS cache, or a dedicated Windows swap and temp files drive, or media working files. I’m sure someone would still want this.

        • cobalt
        • 4 years ago

        Oh, I agree, and I’m still using them as OS drives — I’ve got an 840 and an 840 EVO, and I’m not in a hurry to replace either. I don’t know how this slipped by QC, but problems happen, and with the EVO they acknowledged the problem and released a new firmware which changes some settings to minimize the problem and periodically refresh the parts of the disk that need refreshing.

        The problem here is not that the degradation issue happened, but that for the non-EVO vanilla 840, they continue to deny the problem even exists. It wouldn’t be hard, just do the same fix as the EVO that periodically refreshes the disk. You don’t even need to adjust the voltage read threshold. But Samsung continuing to actively deny the problem and forcing me to find my own tool and schedule my own disk refreshes is [b<]unacceptable[/b<], and sends a clear signal that they don't stand behind their products. (I know you're probably aware of this -- I'm not saying this directly to you.)

          • dmjifn
          • 4 years ago

          I agree it’s pretty disappointing, and I bailed before you did. I don’t know why I felt the need to make my comment. Probably due to the residual anxiety I had about having to rebuild my wife’s machine when she lost all her itunes! Or the residual guilt I have for offloading it on some schmo. 🙂

          Fortunately, I had a same-sized Toshiba Q-Series laying around and this [url=http://www.amazon.com/dp/B0094C0DYI<]cloning drive dock[/url<]. It really was a trouble-free swap.

          • Nevermind
          • 4 years ago

          ” and with the EVO they acknowledged the problem” Not right away they didn’t!

            • just brew it!
            • 4 years ago

            Yes, the EVO issue is another matter entirely. They really screwed up there, and they are now on my “avoid” list as a result.

            • Nevermind
            • 4 years ago

            This is like the 3rd or 4th one in a row with their SSD models. Not QUITE OCZ level fail. Yet.

            • cobalt
            • 4 years ago

            That’s true, and I don’t mean to give them a pass for it, but initially it was only a few months, right? From what I can find, the overclock.net thread started in August, they said they were investigating in September, and had their first “fix” (workaround) in October. It took them longer for their slightly more comprehensive “fix” (workaround), though; I suppose you could count a significant chunk of that time as them pretending their first “fix” worked.

            • just brew it!
            • 4 years ago

            The second “fix” is really only a half-fix too. It mitigates the issue, but doesn’t truly make it go away.

    • chuckula
    • 4 years ago

    Yup, the dreaded race condition “Heisenbug” that only shows up when the precise sequence of events happens with the precise timing to make the bug manifest.

    The nice part about Linux is that while there definitely are bugs, you get to see the process for fixing the bugs unfold in a public manner where others can learn from past mistakes and hopefully not repeat them.

      • divide_by_zero
      • 4 years ago

      100% agree.

      While I understand everything here in a broad sense, most of the actual technical details (single vs. shared buffers, queued TRIM, etc) in this discussion are over my head.

      That being said, I really enjoy(? – man, such a nerd) watching the the bug-fix process play out publicly rather than the closed-source world where things magically get fixed in a driver or firmware update which often doesn’t even call out that said issue was fixed!

Pin It on Pinterest

Share This