Thus I found myself on the phone yesterday afternoon asking AMD about 65nm L2 cache latencies, possible performance differences, and why the Athlon 64 X2’s die size was only reduced from 183mm² at 90nm to 126mm² at 65nm, a much smaller size reduction than expected, despite the transistor count estimate remaining steady at 153.8 million. The AMD rep to whom I was speaking didn’t have much in the way of answers for me at the time, and he said that most of the people who would have those answers are already out on vacation for the holidays. But he did say that other folks had just been asking the same set of questions. Lo and behold, Anand published an article today discussing die size questions and increased L2 cache latencies on AMD’s 65nm CPUs.
Since I have the same 65nm AMD processors on hand, I thought I’d run a few quick tests, as well. Here’s a look at L2 cache latency numbers from CPU-Z on the 65nm and 90nm versions of the Athlon 64 X2 5000+:
Uh oh. L2 cache latencies are indeed higher on the 65nm version of the chip. That may help explain some of the slightly slower performance numbers some folks have seen out of these processors. For what it’s worth, the increased latency doesn’t appear to extend to L1 cache speed or main memory. Here’s a look at main memory access latencies:
For full disclosure, here’s a 3D graph of the CPU-Z latency tool result for both CPUs. As ever, the light orange bars represent the block sizes that should fit into L2 cache. Yellow is for L1 cache, and dark orange for main memory.
The L2 cache latency on the 65nm 5000+ is generally higher, no doubt about it. We can also look at L2 cache bandwidth with our simple version of Linpack that uses various matrix sizes. Let’s see how that looks.
The 65nm chip’s L2 cache is markedly slower, and its disadvantage in our Linpack test even persists with matrix sizes that spill over into main memory. This continuing disparity may be the result of the fact that the CPU’s speculative data prefetch algorithm relies on L2 cache, as well. Sandra’s memory bandwidth test shows a similar performance gap:
We’re probably seeing a worst-case scenario for the 65nm when we’re running synthetic memory tests. Performance in real-world applications probably won’t be affected as much as we’re seeing in these tests. Still, taking a step backward in performance is never good, especially when you’re already well behind the competition.
We don’t yet have a full set of performance results for the 65nm chips, but here’s a quick look at results from our MyriMatch benchmark:
The 65nm part isn’t horribly slower, but it is slower.
We’re a little perplexed by these developments. Why would AMD increase the latency of its L2 cache, especially without increasing its size? Why isn’t the die area of the 65nm Athlon 64 X2 even smaller compared to the 90nm version with the same transistor count? There are a number of possibilities, but I’ll refrain from speculating for now, and we’ll await some better answers from AMD.
Update: We now have some answers from AMD.