Chip problem limits supply of quad-core Opterons

AMD’s quad-core "Barcelona" Opterons have been notably difficult to find since their introduction two months ago, and The Tech Report has learned that a chip-level problem has impacted the supply of these chips to both server OEMs and distribution channel customers.

Chipmakers refer to chip-level problems as errata. Errata are fairly common in microprocessors, though they vary in nature and severity. This particular erratum first became widely known when AMD attributed the delay of the 2.4GHz version of its Phenom desktop processor to the problem. Not much is known about the specifics of the erratum, but it is related to the translation lookaside buffer (TLB) in the processor’s L3 cache. The erratum can cause a system hang with certain software workloads. The issue occurs very rarely, and thus was not caught by AMD’s usual qualification testing.

An industry source at a tier-two reseller told The Tech Report that the TLB erratum has led to a "stop ship" order on all Barcelona Opterons. When asked for comment, spokesman Phil Hughes said AMD is shipping Barcelona Opterons now, but only for "specific customer deals." Industry sources have suggested to TR that those deals are high-volume situations involving supercomputing clusters. Such customers may run workloads less likely to be affected by any workarounds for the erratum that reduce L3 cache performance, and those customers could potentially consume hundreds of thousands of CPUs. Our sources indicate, and the current availability picture would seem to confirm, that quad-core Opterons are not shipping to OEMs or the channel more generally.

News of this problem is notable because it confirms that the TLB erratum affects Barcelona server processors as well as Phenom desktop CPUs, and that the problem impacts AMD’s quad-core processors at lower clock speeds. The Opteron 2300 lineup spans clock speeds from 1.7GHz to 2.0GHz. Those CPUs’ north bridge clocks, which determine the frequency of the L3 cache, range from 1.4GHz to 1.8GHz.

The erratum is present in all AMD quad-core processors up to the current B2 revision. AMD has said a revision B3 is in the works and expected in Q1. One source told TR that large quantities of B3 chips might not be available until the end of Q1.

The potential for instability with the TLB erratum can be corrected via BIOS-based workaround, but multiple sources have suggested the BIOS fix involves a substantial performance hit. AMD has publicly estimated the performance penalty for the BIOS fix could be around 10%, and one source pegged the penalty at 10-20%. AMD has acknowledged that the TLB erratum particularly affects virtualization, and industry sources say the performance hit from the fix may be most severe with virtualization, as well. Server administrators responsible for virtualized environments will probably want to wait for the B3-rev CPUs before upgrading.

TR has attempted to confirm the impact of the BIOS-based fix, but the BIOS for the SuperMicro H8DMU+ motherboard used in our review of the Barcelona Opterons has not been updated since mid-September and doesn’t appear to include the TLB erratum workaround.

Linux users may have another option in the form of a patch for that operating system’s kernel. Sources estimate this patch’s performance hit at less than one percent, but it comes with several caveats. At present, the patch purportedly only applies to the 64-bit version Red Hat Enterprise Linux, Upgrade 4. Customers must sign a non-disclosure agreement in order to obtain the patch, and will be responsible for supporting it themselves. The patch doesn’t currently appear to be available via Red Hat’s regular support channels.

At present, Microsoft doesn’t offer a Windows hotfix to address the problem, and our sources were doubtful about the prospects for such a patch. CPU makers have oftentimes addressed errata via updates to the processor’s microcode, but such a fix for this problem also appears to be unlikely.

Update: It turns out the BIOS-based workaround for the erratum already involves a microcode update, AMD informs us.  More soon.

Update II: We now have an update with a clarification on the nature of the BIOS workaround and information about how this issue affects Phenom processors right here.

Comments closed
    • Damage
    • 13 years ago

    I’ve made a small correction to the text of this story. For more info, please see here:

    §[<https://techreport.com/discussions.x/13764<]§

    • iOsiris
    • 13 years ago

    looks like there is a kernel fix for users of Linux that only has a 1% penalty. So if you use Linux, it’s fine

    • Shining Arcanine
    • 13 years ago
    • somedude743
    • 13 years ago

    I bet AMD CEO Hector Ruiz has been saying DAAMIT a lot over this TLB bug. All the AMD engineers too. AMD better get it in gear in a hurry or the 800lb gorilla, Intel, is going to maul them to death.

    I want AMD to succeed. They’ve made good products in the past and they are keeping Intel on their toes as far as innovation goes. I’d love to see AMD get a 50% market share for mobile, desktop, and server CPUs. True competition like in the auto industry. The Toyotas and Hondas have sure made Detroit’s auto companies more competitive I know that. Detroit is putting out some damn nice cars these days. Some of those new Cadillacs are winning a bunch of quality and innovation awards these days.

    C’mon AMD. Retrench and keep on battling. Make a clearly better chip at a competitive price and people will buy them in droves. People know who AMD is. They know AMD can make good products. People aren’t all that loyal to Intel. They just want the best performing CPU they can get their hands on at a reasonable price … it’s all about bang for the buck, quality, reliability … and performance of course. Everyone wants a faster PC if there aren’t too many downsides to it.

      • VILLAIN_xx
      • 13 years ago

      lol aint that the truth about loyalty. Ive seen alot of intel fans go to the green team when the X2’s came out.. Core2 came out they sold their parts and became an Intel fan.

      Easier said than done for AMD to come out with a bomb ass product to win back some bread. For the last year and a half they have been that student sleeping in your classes back in high school. Doing their projects the night before and giving an embarrassing presentation the next day.

      Ive owned both intel and amd for the price/performance ratio. But with news like this erratum and bios fixes that hurt even more to correct the erratum. Its sad to see AMD go from high end products to a products border lining novelty.

    • Ruiner
    • 13 years ago

    Great match news-wise for the 4GHz c2d overclocks.

    • tfp
    • 13 years ago

    I can’t say I’m shocked considering that Opterons are the same things as the desktop chips plus a HT link or so….

    • j3pflynn
    • 13 years ago

    There’s already a microcode update being prepared that will be applied in the near future.
    §[<http://www.theinquirer.net/gb/inquirer/news/2007/11/30/phenoms-feature-infamous-l3<]§

      • Flying Fox
      • 13 years ago

      Not sure if they are doing one for the Optys too, but since these are server chips we are talking about, they’d better be.

        • sigher
        • 13 years ago

        Microcodes are not the same as a working chip, a workaround is always less effective than not needing it I’m sure.

    • pluscard
    • 13 years ago

    Of course, just to compare, every single core2duo sold up thru April ’07 had pretty serious errata, at least according to this link:
    §[<http://www.darknet.org.uk/2007/07/intel-core-2-duo-vulnerabilities-serious-say-theo-de-raadt/<]§ "Security guru and creator of OpenBSD Theo de Raadt recently announced he had found some fairly serious bugs in the hardware architecture of Intel Core 2 Duo processors:" "...he warns that errata contained in the Intel processor is susceptible to security exploits that put users and enterprises at serious risk of being compromised."

      • Flying Fox
      • 13 years ago

      Data corruption is still ranked as more serious because the CPU practically stops working right. The TLB erratum is that kind, which does bare similarities to the FDIV bug.

      Also, if you chase the links a little, the dude also blasted AMD as well:q[<(While here, I would like to say that AMD is becoming less helpful day by day towards open source operating systems too, perhaps because their serious errata lists are growing rapidly too).<]q Source: §[<http://marc.info/?l=openbsd-misc&m=118296441702631<]§ And remember, they are now practically pulling the Optys/Phenoms off the market (for the Phenoms, it's the higher speed ones).

      • indeego
      • 13 years ago

      These were patched back in March via microcode, even Microsoft has a hf patchg{<.<}g

    • lordT
    • 13 years ago

    Well, this sucks. Blow after blow. Good luck AMD, you guys need it.

    • DrDillyBar
    • 13 years ago

    At this rate they’re going to be playing second fiddle until they release another architecture, as Hammer’s looking pretty long in the tooth at this point. Considering how long it took them to finalize and release Hammer, it could be years before AMD can hope to reclaim the performance crown from intel.

    • herothezero
    • 13 years ago

    Damn…DAAMIT can’t catch a break.

    This sucks.

    • Sargent Duck
    • 13 years ago

    Why have this L3 cache anyways? I mean I know /[

      • Forge
      • 13 years ago

      K8 Opterons started to lose more and more performance to inter-chip communications overhead when running >2 cores. With Phenom/Barcelona running 3 and 4 cores per socket, all of that cache sync communication lowers the boom on performance far earlier. The L3 isn’t really there for data, it’s more to keep the cores’ caches coherent without flooding the HT links with cache snoops.

      All as I understand it, anyways.

        • lyc
        • 13 years ago

        echoes my understanding

        • Shining Arcanine
        • 13 years ago

        A unified L2 cache would do the same thing with better latencies. There is no reason for AMD to use a unified L3 cache, unless their engineers did not have enough time to design a unified L2 cache to replace the L2 cache scheme they already had.

    • Krogoth
    • 13 years ago

    Damm, this is AMD’s equivalent of Intel’s PDIV bug in the original batch of Pentiums. 0_0

      • indeego
      • 13 years ago

      Not at all. That wasn’t microcode addressable g{<.<}g

        • d2brothe
        • 13 years ago

        Neither appear to be microcode addressable.

        • Flying Fox
        • 13 years ago

        The news post did mention a microcode update will be unlikely, and who knows if there is one available it wouldn’t be the same BIOS fix which is to slow down performance? I wouldn’t even call that a fix.

    • JJCDAD
    • 13 years ago

    Why is there no Digg It! link on this article?

      • JJCDAD
      • 13 years ago

      Nevermind. I see it now.

        • JJCDAD
        • 13 years ago

        And now it’s frontpage on Digg.com

    • Gungir
    • 13 years ago

    Bloody Hell, the hits just keep on coming. How does TR get this information? This is really in-depth stuff I haven’t seen anywhere else.

      • mortifiedPenguin
      • 13 years ago

      “An industry source at a tier-two reseller” apparently, who that would be specifically, TR is probably barred from saying

    • blastdoor
    • 13 years ago

    I guess they weren’t going to ship these chips in much volume anyway, but this is still a bigger than average straw being added to the camel’s back.

    • SGT Lindy
    • 13 years ago

    AMD is like a cargo ship that has taken two torpedo’s, its on fire, and listing badly.

    Now there is another torpedo on the way.

    • My Johnson
    • 13 years ago

    Man, the software for designing and simulating chips needs some work. This sounds like a mess.

    • MrJP
    • 13 years ago

    Oh dear.

    A BIOS-level fix to a problem related to the L3 cache which then gives a 10%-20% performance penalty. Could the fix be simply disabling the L3?

    • flip-mode
    • 13 years ago

    Whew, that sounds pretty damn bad.

    • provoko
    • 13 years ago

    Oh god. That’s just what they need, a nerf stick.

Pin It on Pinterest

Share This