Erratum degrades Phenom 9500, 9600 performance

We reported earlier today that a problem with AMD’s quad-core processors has limited supply of "Barcelona" Opterons, but that is only part of the picture. Because the hardware bug—known as an erratum—affects all revisions and clock speeds of AMD’s quad-core processors, it affects the newly introduced Phenom 9500 and 9600 processors, as well. And although AMD is no longer shipping quad-core Opterons to major server vendors and general customers, it is shipping Phenoms to large PC builders and distributors. In fact, AMD knew about the erratum before the Phenom product launch, although its original statements about the issue gave the impression it only affected virtualization, a server-class usage model uncommon for desktop processors.

To recap, the erratum is a chip-level issue involving the TLB logic for the L3 cache that can cause system hangs in specific circumstances. AMD has a fix for the problem in the works, but it degrades performance. AMD has stated publicly that the workaround can lower performance by as much as 10%, although one source characterized the performance hit to TR as 10-20%.

In order to better understand this problem, TR spoke with Michael Saucier, Desktop Product Marketing Manager at AMD. Saucier confirmed that the TLB erratum can cause the system to hang when the chip is experiencing high utilization. AMD has stated previously that virtualization workloads can lead to this problem, but Saucier clarified that other workloads can trigger system hangs, as well. He characterized the issue as a race condition in the TLB logic "where the other guy wins who isn’t supposed to win," and said the likelihood of the erratum causing a system hang is extremely rare.

Saucier flatly denied any relationship between the TLB erratum and chip clock frequencies. He also said there’s no relationship between clock speeds and the performance degradation caused by the BIOS-based fix for the erratum. AMD previously cited the TLB erratum as the primary motivation behind its decision to delay the 2.4GHz Phenom variant.

Saucier clarified the exact nature of the workaround for the erratum that AMD has provided to motherboard makers and PC manufacturers. The fix comes in the form of a BIOS update, and this BIOS patch includes an update to the CPU microcode. This update disables the portion of the chip’s TLB logic that is problematic. Saucier noted that the L3 cache "still works" with this logic disabled, and he said AMD has no plans to implement the fix for existing chips in a different way.

Instead, AMD is preparing a hardware fix in the next revision of the chip, dubbed B3. Future revisions of the Phenom, including the planned Phenom 9700 model at 2.4GHz and the 9900 at 2.6GHz, will include the fix. AMD plans to replace the current Phenom 9500 and 9600 models with new 9550 and 9650 models, based on the B3 chip, as well.  Saucier’s best estimate for the arrival of B3 chips is "mid to late Q1" next year.

In another bit of news, the company will introduce "more than two" triple-core Phenom variants by the end of Q1, too.

AMD claims it has handed off the BIOS workaround to motherboard makers for implementation, and Saucier told us the company’s guidance to partners included an enable/disable option in the BIOS. AMD also has plans for an update to its Overdrive overclocking utility for Windows that will allow users to toggle the erratum fix on and off. Saucier said AMD’s thinking here is that savvy users may choose higher performance over the relatively small risk of experiencing a system hang due to the TLB problem.

Update 12/4/07: AMD informs us that Saucier’s statement here was incorrect. AMD has asked motherboard makers not to include a toggle for the workaround in their BIOSes.  Instead, the workaround should be enabled by default, and the option to disable it will be exposed solely via AMD’s Overdrive tweaking utility, unless motherboard vendors elect to add this option against AMD’s guidance.

However, as far as TR has been able to determine, BIOS updates with the workaround are not yet available from the three major motherboard vendors shipping Phenom motherboards based on the AMD 790FX chipset. We have inquired with each of them and are currently awaiting definitive answers about an ETA for a BIOS update with the workaround.  We also asked about the possibility of a BIOS option to enable and disable the fix. Similarly, SuperMicro apparently doesn’t yet offer an updated BIOS for it H8DMU+ server platform for Barcelona Opterons.

According to Saucier, AMD’s PC OEM partners were informed about the erratum prior to the launch and should have fixes available.

AMD spokesman Phil Hughes told us the TLB issue has been designated errata number 298. When questioned about when AMD would update its technical documentation to include the erratum, Saucier said the person responsible for the updates is "on vacation," although he expects an update "by the end of the year."

Incidentally, the presence of the TLB erratum may explain the odd behavior of AMD’s PR team during the lead-up to the Phenom launch, as I described in my recent blog post. The decision to use 2.6GHz parts and to require the press to test in a controlled environment makes more sense in this context. Since 2.6GHz Phenoms, when they arrive, should be based on the B3 revision of the chip with the TLB erratum fix, AMD could justifiably argue that their performance won’t be limited by the BIOS-based workaround. Saucier confirmed to us that the test systems at the Tahoe press event did not have the workaround enabled.

On a related note, AMD PR consistently denied or delayed TR’s requests for samples of the production Phenom 9500 and 9600 models in the days following the product launch, until we informed them that we’d ordered a CPU from Newegg. We received a production sample of the Phenom 9600 from AMD shortly thereafter, followed by the 9500 we’d purchased at Newegg.

We don’t yet have a BIOS with the workaround to test, but we’ve already discovered that our Phenom review overstates the performance of the 2.3GHz Phenom. We tested at a 2.3GHz core clock with a 2.0GHz north bridge clock, because AMD told us those speeds were representative of the Phenom 9600. Our production samples of the Phenom 9500 and 9600, however, have north bridge clocks of 1.8GHz. Because the L3 cache runs at the speed of the north bridge, this clock plays a noteworthy role in overall Phenom performance. We’ve already confirmed lower scores in some benchmarks.

Given everything we’ve learned in the past few days, our review clearly overstates Phenom 9600 performance, as do (more likely than not) other reviews of the product. We can’t know entirely by how much, though, until we can test a Phenom system with the TLB erratum workaround applied.

Comments closed
    • Damage
    • 12 years ago

    I’ve made a couple of corrections to the text of this story. For more info, please see here:

    §[< https://techreport.com/discussions.x/13764<]§

    • mortifiedPenguin
    • 12 years ago

    I was rereading the article and noticed that motherboard manufacturers are required to implement the errata fix in BIOS and have it enabled by default, with the only way to change it is to use Overdrive. What happens to the AM2 (not AM2+) users? (Note that I am currently under the impression that only RD7xx chipsets can use Overdrive).

    • kc77
    • 12 years ago

    So I guess the sky is falling. The problem with Scott’s blow by blow Super Bowl Coverage of the Phenom errata is that…. this has happened before. With the Intel Core and Xeon processors. It was a TLB errata that would cause corruption of data and system locks. Sound familiar?? There was a BIOS update or if you so chose there was a Windows XP hot fix, which also resolved the issue. Want to check it out here is the link:

    §[< https://techreport.com/forums/viewtopic.php?t=43352&view=next&sid=a3a9ffe993e91c1453d97652f7222e65<]§ While I am completely pissed at AMD for being seemly in a perpetual downward spin, I am also quite concerned about all of these doomsday articles. First it was .. the horrible pre-production spider platform, then we had the AMD sent us to Tahoe, but didn’t give us any chips, then followed up with Intel class and grace article, and now we are onto the TLB errata. All of these articles within 2 weeks, I would say there’s been about seven or more articles here all around the horrible natures of AMD and the latest actual review or benchmark happened around what November 19th I think. At least which would be awesome is to show us some benchmarks or something around the errata before and after fix to showcase what the average user can expect. I am trying to walk a very fine line as an AMD fan, but also recognizing that when AMD screws up they need to be held accountable. Thing is I don’t think this is TR’s way of holding them accountable. This seems to be more of a FUD extravaganza. We are weeks away from Christmas, the biggest holiday shopping season, and when I go to TR the biggest headline is Phenom TLB errata????

      • Flying Fox
      • 12 years ago

      Please don’t mix news reporting and blog opinion articles together. This is a news article, and you can tell the tone is different. How “doomsday” is that?

      You only /[

        • kc77
        • 12 years ago

        The problem is I’m not mixing them together TR is… now this may have everything to do with the new layout which I’m not totally fond of. As far as an editorial piece not conveying a picture of “doomsday” repercussions. Look at the other posts under mine…. I think it can be established that it’s doing just that.

        In addition you’ve said “Take it a little further, the exact nature/workload that triggers the problem is not well-known so we will have to wait and see a little.” So if it’s it unknown why isn’t the article, which encompasses almost two pages taking the same tone??

          • Flying Fox
          • 12 years ago

          q[

            • kc77
            • 12 years ago

            Damage Report??? Where do you see that?? It’s not on the front page nor is it within the article. In fact it doesn’t even say whether it’s an opinion write up nor whether it’s just tech news. It rotating on the front page with a graphic and appears along side under tech stories which also encompasses tech news. The only thing that says Damage on this whole article is from Damage himself post #62.

            Plus as for “the article is old”, I don’t have your eye sight and thank god i don’t. The article was updated yesterday. I would say it’s pretty current. Plus if it’s so old why did you reply??? In fact people are still replying today. So I would say the article, fact, story, opinion piece, or whatever this maybe is still active and current.

            You can be dismissive all you want to be…..but anytime you have to update an article with a official stance from a company because what you wrote previously was mere conjecture it’s not news. It’s got fear uncertainty and doubt all over it.

            • Damage
            • 12 years ago

            kc77, The Damage Report is the name of my blog, and any posts from my blog show up with a logo that say “The Damage Report, A blog by Scott Wasson” at the top of the page. Fox is referring to that.

            This story on this page, however, is not a blog post and is not marked as such.

            I believe the measures we’ve taken to delineate blog posts are quite sufficient. Beyond that, contextual cues should make the basic import of the texts quite clear.

            You have had your say; you’ve taken your shots at the messenger. That is enough. I’ve seen your behavior in other threads, and that will not be duplicated here. Future posts in this thread from you will, at the very least, be modded down. If you have more to say, you may take it up with me via email. Thanks.

            • Flying Fox
            • 12 years ago

            No, *this* particular post is a news report. I was saying that, to me, you seem to be mixing this one to the other one last week which was the opinion piece where you were bashing on that too (“Lessons from…”).

            And in *this* post, I don’t see the doom and gloom tone. So what’s your problem? You want TR to stop reporting bad news as they come so not as to present a “doomsday” picture? None of the articles have said “AMD is doomed”, all the doomsday talk is being spread around by other people, not TR itself.

            • kc77
            • 12 years ago

            Look I’m not asking TR to stop reporting. Nor have i ever said such things. Nor would I ever want them to. But apparently, since Damage’s response to my post must have hit a nerve. I guess I’m being called out. Right now I don’t understand the need for that at all. Considering others have said far worse and have used words far less unprofessional. My responses were merely a retort to yours. Had you not responded I wouldn’t have responded to nothing??

            I guess I’ll just go with the flow, in this space which allows for public comment and dialog. Nothing I have said in previous posts were meant to be taken as a personal attack. I’ve merely done what others have…. voiced my opinion and responded to replies.

            In addition, I was confused with your original reply FF didn’t know if you were trying to say it was a editorial or not which led me to reply in haste to a couple of your points.

      • provoko
      • 12 years ago

      This just in: l[

      • BeFair
      • 12 years ago

      AMD engineers won’t just sit there … strategies in stock and you never know. Hope IBM seriously thinks about buying AMD when it has perfected 45nm fab. Just one idea can turn things around. Plus the GPUs, AMD has lots of potential. Go and buy AMD shares before it gets expensive!

      Another thing is that PC isn’t the future, but some mobile devices like iPhone. AMD can create some very low power CPU/GPU combination that is powerful enough for these kind of devices, and sell for $50/piece. That can make Intel market cap to go down to 50B. Trust me, you won’t need that fast $200 CPU into a mobile device.

    • Unleashed
    • 12 years ago

    awful, and I was so close to purchasing a 9500 off newegg(now that they are $240).

      • tfp
      • 12 years ago

      Even after the reviews? Can you explain the though process behind this? Its not like error it is new news or anything either.

    • xray1
    • 12 years ago

    I still don´t fully understand what part of the CPU is really affected.

    Is the L3 of the Phenom really indexed with virtual adresses? How does that work with a shared cache with 4 different cores accessing it?

    If the L3 uses physical adresses, what do you need an TLB for?

    Or are we talking about a defect of the third level TLB, independent of the cache architecture? Sort of like: if you don´t find the physical adress in the first TLB, you look in the second, and then in the third TLB?

    Can anyone clear this up, or is this Erratum still in the muddy waters of PR-Talk and nobody really knows anything about it outside of AMD?

    • Damage
    • 12 years ago

    I’ve updated the article with a clarification from AMD about how control over the BIOS workaround will be exposed to end users.

    • somedude743
    • 12 years ago

    For the new triple core Phenoms coming out, I wonder if AMD is going to use the real estate on the chip used for core #4 to cram as much L3 cache as they can onto it. Make those triple cores support the latest and greatest DDR3 memory too … make that L3 cache run at the same speed as the cores. Make a triple core like this for the desktop and I think it could be winner.

    I’m thinking that a triple core with a lot of L3 cache is faster/better than a quad core with not as much cache … not so sure that 4th core will make much difference with the software out there these days … not a whole lot of software to take advantage of the additional threads.

      • Flying Fox
      • 12 years ago

      Can I have some of what you are smoking? 😉

      • mortifiedPenguin
      • 12 years ago

      Tri-core Phenoms are quad-core Phenoms with one of the cores disabled, so not much chance of that happening. Unless they’ve decided to mate a single core with a dual core (Kentsfield style).

    • ssidbroadcast
    • 12 years ago

    Hey guys I got a question:

    Just how show-stopping is this errata? Are we looking at just more BSODs on Vista? Chance of system hang? I’ve read the article and the TLB erratum doesn’t sound /[

    • AMDisDEC
    • 12 years ago

    At least the triple core processor inventory will be fat.
    I hear they will make a triple-core CPU for every other Quad core fabbed.

    • Xenolith
    • 12 years ago

    This is actually quite sad. Intel needs competition. AMD is not providing it right now.

      • eitje
      • 12 years ago

      go go via! 😀

    • Krogoth
    • 12 years ago

    Penyrn-based C2Ds will be in full force by the time B3 Phenoms roll out. It is a darn shame that a significant logic bug has effectively crippled Phenom line in the eyes of enterprise market and enthusiast.

      • eitje
      • 12 years ago

      complete side-note:
      it’s awesome to see someone else that groups enterprise buyers and enthusiast users into the same “family” of consumer.

        • Flying Fox
        • 12 years ago

        The Optys are affected too. So both groups will be concerned.

          • eitje
          • 12 years ago

          my point was that this grouping can be seen to occur frequently outside of the current errata situation, but a lot of people rarely notice it.

        • crazybus
        • 12 years ago

        If you group users into “aware” and “unaware”, you’ll quickly see where the enterprise users and enthusiasts fall in relation to pretty much everyone else on the planet.

        • Krogoth
        • 12 years ago

        I meant that K10’s first revision problems are only a problem for enthusiast (who care about performance) and enterprise market (stability is key). Phenom and Opteron are practically the same chip expect that Opterons have coherent-HT links that permit them to be in multi-chip setups.

          • mortifiedPenguin
          • 12 years ago

          Aren’t the sockets different too? Although, I suppose that could be a given…

    • excetera
    • 12 years ago

    Any correlation to the scarcity of current chips with TLB errata problem and the near future release of (alot) of tri cores? Are tri cores nothing but binned bad B2 quad core chips with one disabled?

    • muyuubyou
    • 12 years ago

    DAMIT indeed. I guess I will wait for the tri-cores to make my next buying decision.

    • 0g1
    • 12 years ago

    What I think is ridiculous is that a dual core Phenom is only about 10% faster than AM2 at the same clock speed and, currently, the fastest AM2 (3.2Ghz) has about 40% faster clock speed than the Phenom 9600 (2.3Ghz).

      • sigher
      • 12 years ago

      It’s also odd that when they test them using a single core only; the new generation is the same or slower speed than the old generation when using a single core, at equal clocks.
      You’d think the basic cores would not just be quadrupled but also much improved.

    • Maks
    • 12 years ago

    Pure Panic on AMDATItanic!
    That’s a hell of a mess.

    • cass
    • 12 years ago

    Wonder how they feel about that “True Quadcore” decision now?

    It bothers me a whole lot more that they sought to be quiet and work behind the scenes to the bitter end rather than just be open about it. I could understand being quiet if the product hadn’t been released, but this kitty is in the wild.

    Sounds like to me AMD has a whole lot more people “on vacation” than just the ones they are willing to admit (deny) to. If they only have one person in charge of notating errata and no backup in case they quit or die, then just what the hell is their plan for finding the errata… the microsoft release and hotpatch.

    Mid to late Q1 for a fix? AMD is really lost if they think the whole pc market is waiting that long. At that point they might as well just write off 65nm quads and release the fix in a die shrink to 45nm. That would be the only hope of saving the AMD/ATItanic. If I was AMDs head of sales and could have seen this coming I would have bailed too.

      • Flying Fox
      • 12 years ago

      q[http://www.overclockers.com/tips01260<]§ There is no S in the Tech Report, dagnabit!

    • alex666
    • 12 years ago

    “And although AMD is no longer shipping quad-core Opterons to major server vendors and general customers, _[

      • Flying Fox
      • 12 years ago

      Well, everybody does it, software guys and hardware guys. Intel CPUs have a bunch of errata too. Just that either the BIOSes already have the microcode updates applied, or the errata are extremely rare.

      • sroylance
      • 12 years ago

      There are specific customers, with specific workloads, that won’t be affected by this bug. My speculation is that the erratum causes a hang during TLB flushes. Workloads like high-performance compute analysis/simulation that don’t incur a lot of TLB flushes are probably going to be fine. This is a market that AMD has done well in, and there are supercomputer/cluster customers that are clamoring for the quad core chips.

    • boing
    • 12 years ago

    I am soooooo happy I went with a P35+E6750 instead of waiting for the Phenom!

    10-20% slower = effectively shaving off more than 200 MHz. Or does the 10-20% slowdown only occur in certain conditions and not constant as I thought?

    • Bensam123
    • 12 years ago

    Guess it’s time for AMD to spit out a ‘FX’ series of products.

      • Flying Fox
      • 12 years ago

      What good will that do when they still have not finished the B3 stepping?

    • Prospero424
    • 12 years ago

    Look, I’m a big AMD fan, always have been. But this is /[

      • lyc
      • 12 years ago

      i’m in the same boat, and was always annoyed to hear “does it run everything an intel does?” now there’s really an edge to the “unreliable clone” sentiment…

      and poor ati will tank with amd 🙁

        • Entroper
        • 12 years ago

        Intel CPUs aren’t error-free, either. The FDIV and “f00f” bugs come to mind, just as some of the more well-known examples. Intel and AMD publish errata for their CPUs more often than most people realize. It’s just what happens when you try to cram hundreds of millions of transistors onto a tiny piece of metal.

      • Jive
      • 12 years ago

      To quote the previous article discussing the problem found in the Opterons:

      r[

        • eitje
        • 12 years ago

        q[

          • 1970BossMsutang
          • 12 years ago

          AMD is just contradicting themselves….I would really like to know why AMD can’t get their act together…at least release a fully functioning chip with no serious flaws.

          • Flying Fox
          • 12 years ago

          Could be a timing problem here too (yeah, race condition is really a timing problem as well). The bug only shows up easily in higher speeds, which they didn’t test with the server-based Barcelonas. When they tried to clock the thing higher in the Phenom they started to see them. As usual like anything tech (both software and hardware), it showed up late in the project. Then they went back and retest all their chips and just found the same bug, given the right stimuli. So that could explain why the bug escaped the radar during Barcelona’s launch ramp-up.

          These timing bugs usually showed up first as random so it is not easy to reproduce and may not be produced consistently early on when they spotted it. And while the Phenom launch was being prepped they finally realized the seriousness of the problem through additional testing.

          That could explain why they knew of the bug but not did things too drastic like pulling the entire launch. They knew the bug didn’t show up at slower speeds so it was ok for them to release them to get a “launch” out.

          • Jive
          • 12 years ago

          😮 My mistake!

      • provoko
      • 12 years ago

      This is dumb of them.

    • just brew it!
    • 12 years ago

    Egads, it seems AMD can’t catch a break these days. I sure hope they are able to get that B3 stepping out the door on schedule… IMO there’s an awful lot riding on them getting it right this time.

    Although I’m not planning any major upgrades in the next year or so, when it eventually becomes time for a new build, I want there to be stable, competitively performing AMD CPUs available. If nothing else, Intel needs the competition to keep ’em honest! 😀

      • sigher
      • 12 years ago

      There’s always phone chips, and console chips, and such,
      And oil money I guess.

      • Deveron
      • 11 years ago

      I am not going to be switching to Intel anytime soon but I’m fairly new with computers but i already have 2 custom built AMD cored PC’s. I even Switched over to Radeon video cards cause i wanted better products. I will never switch to Intel because I was told AMD is the best on the market and I stick to it. They will get it right it will maybe take some time but I’m confident AMD will pull through.

    • ssidbroadcast
    • 12 years ago

    l[

      • derFunkenstein
      • 12 years ago

      Yeah, the quotation marks really make it sound worrisome. I was thinking he got laid off, but you might be right.

        • ssidbroadcast
        • 12 years ago

        “Yeah, he’s on vacation… /[

      • eitje
      • 12 years ago

      in my experience, this is the kind of situation where – if the guy is on the freaking *[

    • afg34
    • 12 years ago

    Intel FTW………

    • JustAnEngineer
    • 12 years ago

    All the more reason to wait for the dust to settle a bit on these new processors and BIOSes.

      • wingless
      • 12 years ago

      I agree. Right now the chip is crippled so theres no reason to spend $300 on one right now. More pain for AMD but it is a big motivation to speed up the B3 delivery-to-market. If we see a 10-20% performance gain with a Phenom then the spider platform will be worth something again.

Pin It on Pinterest

Share This