Linux kernel patch reveals TLB bug's workings


— 3:17 PM on December 5, 2007

In our first article about AMD's translation lookaside buffer (TLB) erratum and its effects on quad-core Opteron supply, we mentioned that AMD was prepping a kernel patch for the 64-bit flavor of Red Hat Enterprise Linux, Upgrade 4. Unlike AMD's BIOS fix and microcode update, which we've heard induces a 10-20% performance hit, the Linux patch was said to reduce performance by less than 1%. However, we were also told that customers would need to sign a non-disclosure agreement in order to obtain it.

As it turns out, AMD publicly released the patch's source code on the x86-64.org mailing list today. However, the code is provided on an as-is basis, with repeated warnings that suggest it's not exactly fit for mainstream use:

Due to the very invasive nature of this patch and the very small number of affected customers (you know it if you have an affected part), we do not recommend the use of this patch on a regular Linux system. This patch is NOT intended for mainline acceptance or inclusion with a Linux distribution! The patch has only received minimal functional testing. Every user must evaluate it prior to production use to make sure it meets the necessary quality standards.

In a previous posting on the same mailing list, AMD Fellow Elsie Wahlig also warns that the patch is "NOT being recommended to be applied upstream." Wahlig mentions that the patch was developed by AMD's Operating System Research Center team for Linux 2.6.23.8 and provides a detailed description of the erratum:

Erratum 298 will be described as follows: "The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2. In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation. The system may experience a machine check event reporting an L3 protocol error has occurred. In this case, the MC4 status register (MSR 0000_0410) will be equal to B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be equal to 26h."

Wahlig describes the workings of the Linux patch, as well, which bypasses the BIOS workaround and emulates "Accessed and Dirty bits" in order to prevent the erratum from rearing its head:

The basis for the kernel patch solution depends on the root cause of the L2 eviction problem. The only exposure for the problem is when the TLB needs to set an A or D bit in a page table entry. If the TLB never needs to set an A or D bit, the bug cannot occur. By emulating the A and D bits with the help of the Present and Writable bits, the patch will ensure the real A and D bits are always preset. It works by forcing a page fault when the first access is made to a page with the emulated A bit not set, and when the first write access is made to a writable page with the emulated D bit not set. Emulated A and D bits are stored in bits generally available to the OS in the page table entry.

AMD ended up releasing the patch under more amiable terms than initially expected, but the company isn't giving all Linux users a "get out of jail free" card to avoid the BIOS fix's reported performance penalty.

AMD does mitigate its warnings by saying few customers are actually affected by the bug. However, the company told us yesterday that it is telling motherboard makers to enable the BIOS fix by default and not provide an option to disable it.

Like what we're doing? Pay what you want to support TR and get nifty extra features.
Top contributors
1. GKey13 - $650 2. JohnC - $600 3. davidbowser - $501
4. cmpxchg - $500 5. DeadOfKnight - $400 6. danny e. - $375
7. the - $360 8. Ryszard - $351 9. rbattle - $350
10. Ryu Connor - $350
   
Register
Tip: You can use the A/Z keys to walk threads.
View options

This discussion is now closed.