AMD’s 65nm L2 cache looks to be slower

After we published our first look at AMD’s 65nm Athlon 64 X2 processors yesterday, some of you raised questions about funky performance numbers you’d seen from these 65nm CPUs at other sites. We focused on power consumption and overclocking in our initial look at the processors, assuming those would be the most interesting aspects of the die-shrunk parts, which were billed as otherwise offering performance identical to their 90nm predecessors. We should have known better, given the difference in L2 cache performance we noticed between 130nm and 90nm Athlon 64 processors during AMD’s last die shrink.

Thus I found myself on the phone yesterday afternoon asking AMD about 65nm L2 cache latencies, possible performance differences, and why the Athlon 64 X2’s die size was only reduced from 183mm² at 90nm to 126mm² at 65nm, a much smaller size reduction than expected, despite the transistor count estimate remaining steady at 153.8 million. The AMD rep to whom I was speaking didn’t have much in the way of answers for me at the time, and he said that most of the people who would have those answers are already out on vacation for the holidays. But he did say that other folks had just been asking the same set of questions. Lo and behold, Anand published an article today discussing die size questions and increased L2 cache latencies on AMD’s 65nm CPUs.

Since I have the same 65nm AMD processors on hand, I thought I’d run a few quick tests, as well. Here’s a look at L2 cache latency numbers from CPU-Z on the 65nm and 90nm versions of the Athlon 64 X2 5000+:

Uh oh. L2 cache latencies are indeed higher on the 65nm version of the chip. That may help explain some of the slightly slower performance numbers some folks have seen out of these processors. For what it’s worth, the increased latency doesn’t appear to extend to L1 cache speed or main memory. Here’s a look at main memory access latencies:

For full disclosure, here’s a 3D graph of the CPU-Z latency tool result for both CPUs. As ever, the light orange bars represent the block sizes that should fit into L2 cache. Yellow is for L1 cache, and dark orange for main memory.

The L2 cache latency on the 65nm 5000+ is generally higher, no doubt about it. We can also look at L2 cache bandwidth with our simple version of Linpack that uses various matrix sizes. Let’s see how that looks.

The 65nm chip’s L2 cache is markedly slower, and its disadvantage in our Linpack test even persists with matrix sizes that spill over into main memory. This continuing disparity may be the result of the fact that the CPU’s speculative data prefetch algorithm relies on L2 cache, as well. Sandra’s memory bandwidth test shows a similar performance gap:

We’re probably seeing a worst-case scenario for the 65nm when we’re running synthetic memory tests. Performance in real-world applications probably won’t be affected as much as we’re seeing in these tests. Still, taking a step backward in performance is never good, especially when you’re already well behind the competition.

We don’t yet have a full set of performance results for the 65nm chips, but here’s a quick look at results from our MyriMatch benchmark:

The 65nm part isn’t horribly slower, but it is slower.

We’re a little perplexed by these developments. Why would AMD increase the latency of its L2 cache, especially without increasing its size? Why isn’t the die area of the 65nm Athlon 64 X2 even smaller compared to the 90nm version with the same transistor count? There are a number of possibilities, but I’ll refrain from speculating for now, and we’ll await some better answers from AMD.

Update: We now have some answers from AMD.

Comments closed
    • Wintermane
    • 13 years ago

    Ad has a fair number of weak spots cache is one of them.

    • eRacer
    • 13 years ago

    Anandtech received a reply from AMD as to why the cache latency is higher. Supposedly it is to give AMD an option to use much larger L2 caches in the future.

    §[< http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2893&p=3<]§ /[

      • Vrock
      • 13 years ago

      But…didn’t they just scrap all the 1mb parts because there was neglible performance diff from the 512mb models? Oh wait, then they brought them back with higher ratings. Um, make up your mind AMD. Does more cache make signifcantly faster Athlon 64s, or doesn’t it? The benchmarks say no. You say yes, then no, then yes again. Sorry, but the whole thing stinks like yesterday’s underwear.

        • lethal
        • 13 years ago

        I guess the difference between 1mb and 512mb was noticeable 😉

    • Krogoth
    • 13 years ago

    Here is a little reality check for the raving fanboys.

    Who acutally cares about this stuff? Performance freaks and die-hard fanboys. How much do they make-up the market? like >2%.

    What are 65nm A64 being marketed towards? OEM a.k.a (average joe and jane rigs). Do they care about sightly higher latency? Not in the a single damm.

    I do think it is a lot more interesting to figure out what is exactly causing the difference from a technical standpoint.

      • Shintai
      • 13 years ago

      Yum..we already know..L2 latency and the fucked up memory dividers. Anandtech also got a statement from AMD if you turn over there for abit.

      And even tho you are John Doe OEM buyer. It would still suck when it runs better on an older and “slower” lower rated CPU at a friend.

      Maybe its you who needs to come back to reality. And last time I checked, this wasn

        • Krogoth
        • 13 years ago

        Geez, talk about double standards.

        Prescott did suck at the time if not worse then the Brisbane chips. It still sold damm well, eventhought it was the weakest out of the Netburst dynasty.

        I expect the same goes for 65nm K8 CPUs, until Intel decides to do an aggressive price price for its ICM-based products.

    • Pax-UX
    • 13 years ago

    Well, so long AMD and thanks for all the fish’n’chips!

    • dragmor
    • 13 years ago

    Thanks for the 3d cache latency graphs Scott.

    • sluggo
    • 13 years ago

    Somewhat OT, but my Opteron 165’s corresponding latencies (via CPUz 1.38) are:
    512/512: 13nS (versus 5ns on 90nM 5000+)
    8192/512: 109nS (versus 50nS on 90nM 5000+)

    This is with the Opteron (also a 90nM part) running at 2.8GHz, while the 5000+ runs at 2.6GHz. I realize the Opteron’s L2 cache is twice the size, but this wouldn’t explain why the L1 cache access times are so much slower. Am I missing something here?

      • AOEU
      • 13 years ago

      CPU+Z reports latency in cycles, not nano seconds. To calculate the latency in nano seconds do:

      cycle latency/cpu frequency in GHz.

        • sluggo
        • 13 years ago

        If I’m not mistaken, the tables above pretty clearly show CPU-Z reporting latencies in nanoseconds.

    • Proesterchen
    • 13 years ago

    Thanks for the update, Scott!

    I’m definetly looking forward to see what AMD has to say about this, and where exactly this lowers performance and by how much.

      • flip-mode
      • 13 years ago

      Eh, it lowers performance in Linpak and Q4, so enterprise customers will want to avoid this chip.

    • JoshMST
    • 13 years ago

    I’m actually finishing up an article that will probably shed a lot of light on this issue. There are actually many, many factors that are not considered by either TR and Anand (this article has actually been a work in progress for the past week for me, and the results that both TR and AT have spotlighted have filled in some gaps for me). Hopefully my analysis of the situation will be accurate, and that Shin and Proest will be kind in their criticisms ;P

      • Shintai
      • 13 years ago

      Europeans are never kind in their critisism 😛

    • alex666
    • 13 years ago

    These results, here and elsewhere like anandtech, beg the question of why this chip was released in the first place. Ya gotta wonder if there wasn’t some failure in QA + PR + administrative oversight vs. manufacturing per se. As others have suggested, a first run of 65nm might not yield the best performance. That’s fine, but then why release them days before Christmas? AMD as scrooge? BTW, in the interests of full disclosure, all my systems are AMD, and I still have a lot of faith in the company.

      • Buub
      • 13 years ago

      Ugh… the entire world isn’t all about performance at all cost.

      Once again, the reason these chips were released in the first place was to get volume up and costs down.

      Yes, the fact they’re not even smaller begs the question of whether they’re fully achieving that. But the fact that they are smaller means that even if they’re not small enough, they can still make more chips for less money.

      • green
      • 13 years ago

      i think it was shintai that pointed out AMD can now say they hit 65nm for desktops in 2006 as opposed to 2007

    • flip-mode
    • 13 years ago

    Heh, just thought of a funny up-side to all this: now 939 chips look even better. Eh, so do C2Ds.

    • just brew it!
    • 13 years ago

    …and if I may put my speculation hat on for a moment:

    Perhaps the L2 cache was the stumbling block when it came to clock speed scaling. If so, then intentionally adding some latency to the L2 cache may permit the 65nm core to scale to higher clock speeds, at the price of a slight decrease in IPC.

      • Shinare
      • 13 years ago

      My guess is they have doubled the cache but have it disabled in the press samples or the press samples arent able to use it for some reason. This would account for both the larger than expected die and the increase in latency.

      My second guess is that they are fibing about transistor count and have hidden in there some kind of xPU.

      My third guess is that AMD doesn’t really want to sell that many CPUs.

    • Beomagi
    • 13 years ago

    Is that my foot or a rifle target?

    • just brew it!
    • 13 years ago

    Whoa… funky.

    I wonder if this will be fixed in the next stepping?

    • eitje
    • 13 years ago

    do you think you could put that post in as an addendum for the original article? ie – page 6?

    yknow, like some kind of Christmas present for Proesterchen. 😉

    • flip-mode
    • 13 years ago

    Nice follow up TR.

    AMD’s playing it pretty shady lately with their launches. I guess if they had a horn to toot they’d be tooting it, but this wouldn’t look like such a conspiracy if they’d just go ahead and give full disclosure. What could have been a mole hill will now be a mountain; it’s going to be a feeding frenzy.

    Eh,

    FWIW, maybe you could do some sort of negative overlay of those two graphs, cause I can’t tell a damn bit of difference just looking at them.

      • shank15217
      • 13 years ago

      slower caches isn’t a bug so why do they have to disclose anything? Prescott was slower than Northwood, there was no conspiracy.

        • flip-mode
        • 13 years ago

        They don’t have to, but from a PR perspective, and knowing that hardware site pick things over with a nanometer comb, it’s better to head things off at the pass. This could have been a one-liner in a press release instead of a front pager on TR and elsewhere.

        I do, however, agree with you at the same time. And I agree with Charlie D over at theInq, that AMD hasn’t made a whole lot of promises with this chip, and so didn’t really owe anyone anything. This chip is merely a cost saver for AMD and a chance to tweak the manufacturing process.

        Really the performance difference is so small as to be inconsequential. But the way that it came to light is potentially damaging from a PR standpoint.

        • Shintai
        • 13 years ago

        You can say it makes the PR rating more questionable in regards to games and other cache latency dependent applications (Good thing they drop it).

        5200+ -5% is 4940+
        5000+ -5% is 4750+

        But again, so was Northwood vs Prescott. Just in Ghz.

        Also the title should be “Is slower”, not “look to be”.

    • Stranger
    • 13 years ago

    I bet this has to do with getting yields up and possibly die sizes down.
    In less then six months this product is going to be low end stuff for the likes of dell and HP. Altair isn’t that far around the corner.

Pin It on Pinterest

Share This