We reported earlier today that a problem with AMD’s quad-core processors has limited supply of "Barcelona" Opterons, but that is only part of the picture. Because the hardware bug—known as an erratum—affects all revisions and clock speeds of AMD’s quad-core processors, it affects the newly introduced Phenom 9500 and 9600 processors, as well. And although AMD is no longer shipping quad-core Opterons to major server vendors and general customers, it is shipping Phenoms to large PC builders and distributors. In fact, AMD knew about the erratum before the Phenom product launch, although its original statements about the issue gave the impression it only affected virtualization, a server-class usage model uncommon for desktop processors.
To recap, the erratum is a chip-level issue involving the TLB logic for the L3 cache that can cause system hangs in specific circumstances. AMD has a fix for the problem in the works, but it degrades performance. AMD has stated publicly that the workaround can lower performance by as much as 10%, although one source characterized the performance hit to TR as 10-20%.
In order to better understand this problem, TR spoke with Michael Saucier, Desktop Product Marketing Manager at AMD. Saucier confirmed that the TLB erratum can cause the system to hang when the chip is experiencing high utilization. AMD has stated previously that virtualization workloads can lead to this problem, but Saucier clarified that other workloads can trigger system hangs, as well. He characterized the issue as a race condition in the TLB logic "where the other guy wins who isn’t supposed to win," and said the likelihood of the erratum causing a system hang is extremely rare.
Saucier flatly denied any relationship between the TLB erratum and chip clock frequencies. He also said there’s no relationship between clock speeds and the performance degradation caused by the BIOS-based fix for the erratum. AMD previously cited the TLB erratum as the primary motivation behind its decision to delay the 2.4GHz Phenom variant.
Saucier clarified the exact nature of the workaround for the erratum that AMD has provided to motherboard makers and PC manufacturers. The fix comes in the form of a BIOS update, and this BIOS patch includes an update to the CPU microcode. This update disables the portion of the chip’s TLB logic that is problematic. Saucier noted that the L3 cache "still works" with this logic disabled, and he said AMD has no plans to implement the fix for existing chips in a different way.
Instead, AMD is preparing a hardware fix in the next revision of the chip, dubbed B3. Future revisions of the Phenom, including the planned Phenom 9700 model at 2.4GHz and the 9900 at 2.6GHz, will include the fix. AMD plans to replace the current Phenom 9500 and 9600 models with new 9550 and 9650 models, based on the B3 chip, as well. Saucier’s best estimate for the arrival of B3 chips is "mid to late Q1" next year.
In another bit of news, the company will introduce "more than two" triple-core Phenom variants by the end of Q1, too.
AMD claims it has handed off the BIOS workaround to motherboard makers for implementation, and Saucier told us the company’s guidance to partners included an enable/disable option in the BIOS. AMD also has plans for an update to its Overdrive overclocking utility for Windows that will allow users to toggle the erratum fix on and off. Saucier said AMD’s thinking here is that savvy users may choose higher performance over the relatively small risk of experiencing a system hang due to the TLB problem.
Update 12/4/07: AMD informs us that Saucier’s statement here was incorrect. AMD has asked motherboard makers not to include a toggle for the workaround in their BIOSes. Instead, the workaround should be enabled by default, and the option to disable it will be exposed solely via AMD’s Overdrive tweaking utility, unless motherboard vendors elect to add this option against AMD’s guidance.
However, as far as TR has been able to determine, BIOS updates with the workaround are not yet available from the three major motherboard vendors shipping Phenom motherboards based on the AMD 790FX chipset. We have inquired with each of them and are currently awaiting definitive answers about an ETA for a BIOS update with the workaround. We also asked about the possibility of a BIOS option to enable and disable the fix. Similarly, SuperMicro apparently doesn’t yet offer an updated BIOS for it H8DMU+ server platform for Barcelona Opterons.
According to Saucier, AMD’s PC OEM partners were informed about the erratum prior to the launch and should have fixes available.
AMD spokesman Phil Hughes told us the TLB issue has been designated errata number 298. When questioned about when AMD would update its technical documentation to include the erratum, Saucier said the person responsible for the updates is "on vacation," although he expects an update "by the end of the year."
Incidentally, the presence of the TLB erratum may explain the odd behavior of AMD’s PR team during the lead-up to the Phenom launch, as I described in my recent blog post. The decision to use 2.6GHz parts and to require the press to test in a controlled environment makes more sense in this context. Since 2.6GHz Phenoms, when they arrive, should be based on the B3 revision of the chip with the TLB erratum fix, AMD could justifiably argue that their performance won’t be limited by the BIOS-based workaround. Saucier confirmed to us that the test systems at the Tahoe press event did not have the workaround enabled.
On a related note, AMD PR consistently denied or delayed TR’s requests for samples of the production Phenom 9500 and 9600 models in the days following the product launch, until we informed them that we’d ordered a CPU from Newegg. We received a production sample of the Phenom 9600 from AMD shortly thereafter, followed by the 9500 we’d purchased at Newegg.
We don’t yet have a BIOS with the workaround to test, but we’ve already discovered that our Phenom review overstates the performance of the 2.3GHz Phenom. We tested at a 2.3GHz core clock with a 2.0GHz north bridge clock, because AMD told us those speeds were representative of the Phenom 9600. Our production samples of the Phenom 9500 and 9600, however, have north bridge clocks of 1.8GHz. Because the L3 cache runs at the speed of the north bridge, this clock plays a noteworthy role in overall Phenom performance. We’ve already confirmed lower scores in some benchmarks.
Given everything we’ve learned in the past few days, our review clearly overstates Phenom 9600 performance, as do (more likely than not) other reviews of the product. We can’t know entirely by how much, though, until we can test a Phenom system with the TLB erratum workaround applied.