Linux kernel patch reveals TLB bug’s workings

In our first article about AMD’s translation lookaside buffer (TLB) erratum and its effects on quad-core Opteron supply, we mentioned that AMD was prepping a kernel patch for the 64-bit flavor of Red Hat Enterprise Linux, Upgrade 4. Unlike AMD’s BIOS fix and microcode update, which we’ve heard induces a 10-20% performance hit, the Linux patch was said to reduce performance by less than 1%. However, we were also told that customers would need to sign a non-disclosure agreement in order to obtain it.

As it turns out, AMD publicly released the patch’s source code on the x86-64.org mailing list today. However, the code is provided on an as-is basis, with repeated warnings that suggest it’s not exactly fit for mainstream use:

Due to the very invasive nature of this patch and the very small number of affected customers (you know it if you have an affected part), we do not recommend the use of this patch on a regular Linux system. This patch is NOT intended for mainline acceptance or inclusion with a Linux distribution! The patch has only received minimal functional testing. Every user must evaluate it prior to production use to make sure it meets the necessary quality standards.

In a previous posting on the same mailing list, AMD Fellow Elsie Wahlig also warns that the patch is "NOT being recommended to be applied upstream." Wahlig mentions that the patch was developed by AMD’s Operating System Research Center team for Linux 2.6.23.8 and provides a detailed description of the erratum:

Erratum 298 will be described as follows: "The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2. In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation. The system may experience a machine check event reporting an L3 protocol error has occurred. In this case, the MC4 status register (MSR 0000_0410) will be equal to B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be equal to 26h."

Wahlig describes the workings of the Linux patch, as well, which bypasses the BIOS workaround and emulates "Accessed and Dirty bits" in order to prevent the erratum from rearing its head:

The basis for the kernel patch solution depends on the root cause of the L2 eviction problem. The only exposure for the problem is when the TLB needs to set an A or D bit in a page table entry. If the TLB never needs to set an A or D bit, the bug cannot occur. By emulating the A and D bits with the help of the Present and Writable bits, the patch will ensure the real A and D bits are always preset. It works by forcing a page fault when the first access is made to a page with the emulated A bit not set, and when the first write access is made to a writable page with the emulated D bit not set. Emulated A and D bits are stored in bits generally available to the OS in the page table entry.

AMD ended up releasing the patch under more amiable terms than initially expected, but the company isn’t giving all Linux users a "get out of jail free" card to avoid the BIOS fix’s reported performance penalty.

AMD does mitigate its warnings by saying few customers are actually affected by the bug. However, the company told us yesterday that it is telling motherboard makers to enable the BIOS fix by default and not provide an option to disable it.

Comments closed
    • HighTechDude
    • 12 years ago

    Is this a r[

    • stmok
    • 12 years ago

    The beauty of opensource…You can always see the funny shit going on with the hardware! ๐Ÿ™‚

    • ssidbroadcast
    • 12 years ago

    I got a movie title,

    _[

    • Flying Fox
    • 12 years ago

    I don’t see virtualization mentioned here. Perhaps there exists some non-virtualization workloads that can trigger the bug?

      • d2brothe
      • 12 years ago

      I’d imagine it could be…pretty much any heavy workload could produce this bug I think, I’m not sure if virtualization is particularily bad…I don’t know a whole lot about it, I get kinda annoyed at the buzzwordish nature of it…

    • nstuff
    • 12 years ago

    Am I the only one that saw “AMD does mitigate its warnings by saying few customers are actually affected by the bug.” and immediately thought that this is very true, especially considering the number of people that are actually buying the Phenom chips?

    Might I refresh those short term memories with the following: ยง[< http://support.microsoft.com/?kbid=936357<]ยง Intel had a nasty little bug in their latest chips that had to be fixed with BIOS updates or MS patches.

    • Prototyped
    • 12 years ago

    q[<. . . and the very small number of affected customers (you know it if you have an affected part) . . .<]q This tells me that whatever few Phenoms and Barcelona Opterons have managed to get into the channel haven't been selling well.

    • muyuubyou
    • 12 years ago

    Fantastic. A great incentive to switch and hopefully great prices for us linux nerds ๐Ÿ™‚

    • derFunkenstein
    • 12 years ago

    I <3 the pic on this article.

      • evermore
      • 12 years ago

      There’s a picture?

        • ludi
        • 12 years ago

        Yes, if the Random Featured Entry Generator on the front page grabs this particular article for highlight.

    • srg86
    • 12 years ago

    Ouch! So it’s possible for the processor to access or modify a page in memory and then it’s page table entry to be moved form the L2 to the L3 cache before the processor has had a chance to up date the accessed and dirty bits in the L2 version of that page table entry, so there’s now two versions of this page table entry in the cache hierarchy that conflict with each other. One said that the page has been accessed or modified and the other may say it hasn’t. I think this could be the cause of the data corruption for an unrelated cache operation, if I’m reading this correctly (I understand how virtual memory systems work pretty well but I understand caches less well)

    This could also be why it’s a problem under high load, as the caches are being modified within this small time period between page access and these bits being updated.

    This is my take on it so far.

      • d2brothe
      • 12 years ago

      I don’t understand it all that well, but it seems even more complex than what you indicate. In addition to the situation you describe, a cache probe (from a third core I would think) must occur on that page that has been moved to L3 before the update has completed. This is an extreme race condition, involving 3 threads…something that can only occur on a quadcore CPU. I dunno how they missed it ;)…I’d imagine its quite rare indeed.

        • srg86
        • 12 years ago

        Indeed that’s a good point, this causes problems with cache coherency, as you have other cores looking at the contents of the shared L3 cache, as the page tables must be the same in all levels of the memory hierarchy as it’s an important piece of system information that all CPUs and cores in the system need to have up to date knowledge off, so if another core uses that page table information in the shared L3 cache, which is at odds with what is in the first cores L2 cache, you now have one core that knows that the page has been accessed or modified and the three other cores thinking that (especially if the erroneous L3 line has been brought into their own L2 caches) the page hadn’t been accessed at all. So first of all you’ve got the paging information is now inconsistent, plus you will have this race condition in which you may have the first core trying to move the updated L2 line into the L3 cache and the other core(s) trying to access this L3 cache line in order to bring it into their own caches.

        Now with the race condition, leading to a complete lockup could stop the incorrect paging information being moved from the L3 cache to the L2 of other cores, but it seems to me that the potential of inconsistent paging information is there.

      • Prototyped
      • 12 years ago

      Even worse, if there’s a TLB snoop by another processor’s cache before the A or D state flags are set, the other processor now caches an inconsistent PTE — even harder to detect. And so on.

        • srg86
        • 12 years ago

        This is true, as I noted in my other reply, this would be even harder to detect and possible even more dangerous.

        Still I have a feeling that the race condition stops this because it locks the whole machine up.

    • excession
    • 12 years ago

    Oh, poor AMD. This is just such a PR disaster for them.

    Bit of a shame that they are asking manufacturers to FORCE the microcode update on, when a) the problem is rare and b) there is a feasible software fix. By all means enable it by default, but if I had one of these early Phenoms I’d want to be able to turn the hobbling off!

    Do we know if this can be fixed easily in a new stepping?

      • shank15217
      • 12 years ago

      no it means that it is not as rare as they are saying. This is a pure show stopper. Imagine 10 VMs running on a host with this bug, you would effectively burn 10 servers with one failure.

        • excession
        • 12 years ago

        I hadn’t considered that it might not be quite as rare as they make out. However, I was thinking more from a desktop perspective that being able to easily turn the BIOS fix off would be good. (I know that it should be possible with Overdrive but does this run on all motherboards?).

        Obviously, in an enterprise environment, any questionable reliability is Bad News – but equally, what if a sysadmin wants to patch their Linux kernel to workaround the problem in software for less of a performance hit?

          • d2brothe
          • 12 years ago

          In a desktop environment, yes, it would be reasonable to keep it enabled (heck my computer crashes damn often as it is). In an enterprise environment running a lot of VMs at high capacity, thats when the bug is exposed more easily, so yes, that would be a problem…but then again, thats why VMs are bad, you’ve gotta make sure you have backup for 10 servers…. BTW…its unlikely to burn them…just force a reboot…which is bad…but less bad than having to rebuild them.

Pin It on Pinterest

Share This