A correction to our AMD TLB erratum coverage

I was closing some windows on my laptop and came across an unsaved document in an unselected tab in my text editor that had some interesting info in it: the notes from my phone call with AMD, just prior to the Phenom launch, when I first learned about the TLB erratum. Looking at those notes, I learned something that came as a bit of a shock: I had flubbed a fact in our coverage of the TLB erratum story.

I wrote more than once in our coverage of the erratum that AMD had initially suggested the problem didn’t affect lower clock speeds of the Phenom. Turns out that’s not the case. Here is the text of my notes, verbatim:

TLB problem w/virtualization

2.4 will have the complete fix

Have to enable something in the BIOS for the 2.2 and 2.3

Can degrade perf a little bit

I knew, after receiving this news, that something had caused me to bracket out this problem as a minor detail rather than a major problem, and I later attributed that to the erratum being confined to higher clock frequencies. I even said mistakenly in my Phenom review that the TLB erratum only happened at higher frequencies. I think I may have read this incorrect information online somewhere, but that is no excuse. I had better info myself and didn’t keep it straight.

In truth, the reason I originally bracketed out the TLB erratum as a minor issue was because AMD said it only affected virtualization and that a BIOS-based switch would offer an optional workaround. Since virtualization is much more commonly used in servers than in desktop processors, the TLB erratum seemed like a minor inconvenience affecting a small handful of users, not a big problem.

Of course, as we’ve reported, the TLB erratum can cause system hangs in a broader range of contexts, including desktop workloads, and probably for this reason, AMD has directed its partners not to include a switch in the system BIOS to disable the workaround. The erratum and its workaround will affect virtually all Phenom owners every day, as a result.

I’m proud that we were first to break the news of AMD limiting Opteron shipments due to the erratum, first to provide more detail about the nature of the workaround, and first to quantify its considerable performance impact. I believe this was very important follow-up work after our review of the product, and I stand by my conclusions in the last article, in particular.

But I’m deeply sorry that we flubbed a key fact in our coverage of this story. I should have re-checked my notes before writing the Phenom review, let alone the other stories on this issue. Most importantly, perhaps, I should not have made incorrect assertions that could potentially damage AMD’s reputation during an already difficult period. I would also like to apologize personally to the individual at AMD who initially provided us with the TLB erratum information.

The information AMD first gave us about the problem did indeed mis-portray the TLB erratum as a minor issue for the Phenom, but not for the reasons we reported. As for whether AMD was trying to hide something when it first informed us about the TLB issue, I am now officially agnostic. I’ve not looked into "who knew what when?" timelines in my coverage of this story, and it’s entirely possible AMD did not yet realize the TLB erratum extended beyond virtualization to broader desktop usage patterns prior to the Phenom’s public introduction. I really don’t know about that.

I will be making minor corrections in our three TLB erratum stories to reflect the fact that AMD did not initially say the TLB erratum was limited to higher clock frequencies. Beyond this one issue, I still believe the substance of our TLB erratum coverage is essentially correct—and is very important information for potential owners of Phenom 9500 and 9600 processors.

Comments closed
    • A_Pickle
    • 12 years ago

    You know I will say something to this, though. The TLB-patched benchmarks did leave something to be desired. I didn’t see any video encoding, 3D animation, or gaming benchmarks at all — only memory throughput and synthetic benchmarks.

    Not to be mean — but those are useless. I only came to this site BECAUSE you’re about the ONLY ones that DO benchmark 3D Studio Max, which gives me a nice resource when deciding my hardware of choice on my next upgrade. This synthetic BS is exactly that — it provides no useful information to me other than “Core 2 Quad > Phenom.”

    What’s the point of re-benching Phenom if you’re intent was to restate the already obvious? To tell us just HOW much the TLB erratum fix hurt performance? In that case, why not give us performance measures in real life applications that people use and perform with, so that they can relate to it and understand the significant of the performance detriment more than, “Well, uh, PCMark said so…?”

    Come on. You guys are better than that. Sandra is fun. It draws a fractal. Useful? No. Bust out the games, 3D Studio Max, Adobe Premiere, Photoshop, and Paint.NET if you’re going to do benchmarks, otherwise you will lose readers.

    • RambodasCordas
    • 12 years ago

    So this is “similar” to the Intel Pentium 3 1.13Ghz problem?
    Or the problem exists at any clock speed?
    I didnโ€™t completely understand.

    • flip-mode
    • 12 years ago

    An unfortunate mistake, a necessary apology. Thanks for the honesty. Hopefully little damage was done; the last thing AMD needs is any *more* damage.

    • DrDillyBar
    • 12 years ago

    Good show.
    Maybe this is why I was thinking Virtualization w/ TLB myself.

    • gratuitous
    • 12 years ago

    +1 to the applause for being so expeditiously forthright with published corrections your own “errata.” ๐Ÿ™‚

    What I still can’t get over, though, is how the cure seems worse than the bug.

    The patch, as you’ve quantified, causes an average 10% or more performance penalty across the board. The probability that the race condition involved will actually ever be realized, however, is not known. From your own original tests of the Phenom on launch day, it would appear that (depending on application, of course) it’s not likely to be realized at all, or if so, so rarely that the average user would only curse, reboot, and forget about it.

    To be sure, if virtualization applications are more likely to cause this problem to occur, and hence more likely to affect enterprise implementations, then greater steps to prevent it are called for. But denying individual users and especially enthusiasts even the option to disable this patch is a mistake, imo.

      • Flying Fox
      • 12 years ago

      q[< +1 to the applause for being so expeditiously forthright with published corrections your own "errata." ๐Ÿ™‚ What I still can't get over, though, is how the cure seems worse than the bug. The patch, as you've quantified, causes an average 10% or more performance penalty across the board. The probability that the race condition involved will actually ever be realized, however, is not known. From your own original tests of the Phenom on launch day, it would appear that (depending on application, of course) it's not likely to be realized at all, or if so, so rarely that the average user would only curse, reboot, and forget about it. To be sure, if virtualization applications are more likely to cause this problem to occur, and hence more likely to affect enterprise implementations, then greater steps to prevent it are called for. But denying individual users and especially enthusiasts even the option to disable this patch is a mistake, imo.<]q They are only going to allow the disabling of the patch through OverDrive.

        • gratuitous
        • 12 years ago

        IINM, Overdrive only functions on AMD 770, 780, 790X and FX chipsets, aka Spider platforms. I guess users of any other chipset are just SOL?

          • Flying Fox
          • 12 years ago

          q[

      • just brew it!
      • 12 years ago

      Slow performance is a much less serious issue than crashing or corrupting data. If the chip underperforms, AMD can lower the price a bit so that the processor is competitive price-wise with other similar performing processors, and everyone is relatively happy (except for AMD’s shareholders). But if the bug results in crashes or data corruptions — even if they are relatively infrequent — that is a show-stopper. It would erase all of the gains they’ve made in the server market, since no respectable server manufacturer would use such a chip.

      The performance-robbing patch is definitely the lesser of two evils in this case.

    • Jigar
    • 12 years ago

    Every human being makes mistakes.. So it’s alright. ๐Ÿ™‚

    • eitje
    • 12 years ago

    kudos!

    • Flying Fox
    • 12 years ago

    Another side effect of Web 2.0. Things are reported too quick for anyone to properly verify. Half-truths, mis-truths, non-truths, 3/4-truths, true truths all kind of mix up together from different times and from everywhere. It is understandable it is hard to filter them all.

    This does sound more serious the initially reported. So in a sense it is not supposed to be as hard on DAAMIT as it should have been? So what were the fanboys crying about? Oh right, all these corrections and clarifications are being classified as “piling on”. Sigh… can’t please the fanboys I guess…

    At this point, I just want to know the truth too, about the impact of the bug, and what does that mean to us potential buyers. Should I wait for the B3, or given a certain price level the chips are still a good buy, etc. The time for finger-pointing “who is lowblowing or who is taking the highroad blah blah blah” is over.

      • pixel_junkie
      • 12 years ago

      As we know,
      There are known knowns.
      There are things we know we know.
      We also know
      There are known unknowns.
      That is to say
      We know there are some things
      We do not know.
      But there are also unknown unknowns,
      The ones we don’t know
      We don’t know…

    • pikaporeon
    • 12 years ago

    This is a highly credible site, and it sustains its reputation by owning up to its mistakes when they happen, not by sweeping them under the table. While it’s easy to justify such a mistake, that’s what separates the quality of sites.

    • Snake
    • 12 years ago

    OK, bluntly…*[

      • My Johnson
      • 12 years ago

      The coverage certainly had me confused.

      • FireGryphon
      • 12 years ago

      Maybe Scott is making amends after dissing AMD’s launch of the Spider platform. That might explain some of it, but I’d expect a correction anyway, from this site. TR is credible and reliable, and cotinues to uphold that reputation.

        • Flying Fox
        • 12 years ago

        q[

      • alex666
      • 12 years ago

      I understand Scott’s apology on at least two levels, the first being a journalistic one. As a professional journalist, if you get your facts wrong, then it’s incumbent upon the reporter to correct those errors. I’ve always viewed Scott’s reports as first and foremost journalistic; hence, his correction and apology. In addition, given the current climate of AMD doom and gloom, no one wants to be seen as “bashing” what has been a great company when they are “down” so to speak. So on that level as well, I can understand the desire to clarify any errors.

      Bottom line: Scott is being a professional.

Pin It on Pinterest

Share This