Single page Print

Memory subsystem performance
We'll start, as ever, with some quick synthetic tests of the memory subsystem, which will help give us the lay of the land before we dive into our real-world benchmarks.

The Phenom does indeed have quite a bit more L1 and L2 cache bandwidth than its predecessors, as you can see. This test is multithreaded, and it shows higher bandwidth scores when more cores and cache are available. Even so, the Phenom 9600 at 2.3GHz achieves higher throughput than the Athlon 64 FX-74, a two-socket "quad core" solution using dual Athlon 64 FX processors. Intel's caches are faster still.

Here's a closer look at how these systems perform when accessing main memory. The Phenom's revised memory controller gets to show off a bit, with much higher throughput than anything else we tested.

AMD's processors with integrated memory controllers have traditionally had the lowest memory access latencies around, but that's no longer the case with Phenom. Intel managed to close the gap somewhat with its memory disambiguation logic in the Core microarchitecture, and AMD has widened that gap with Phenom. The fancy-pants graphs below will show us why.

In these graphs, yellow represents L1 cache, light orange is L2 cache, and dark orange is main memory. What you're seeing here is memory access latencies at various block and step sizes, in a way that exposes latency for the various stages in the memory hierarchy.

Have a look at the red section representing the Phenom 9600's L3 cache. This cache's latencies are about 22 nanoseconds, and the additional task of checking the L3 cache adds latency to main memory accesses, as well. The Phenom includes sharing logic that buffers requests from all four cores—which may all be running at different clock speeds at any given time—coming into the L3 cache. This logic itself undoubtedly adds some delay. Also, as we've mentioned, the L3 cache doesn't run at the full speed of the CPU cores—it runs at the north bridge speed. That means L3 cache performance doesn't scale linearly with core clock speeds. Both the Phenom 9600 and 9900 models have 2GHz north bridges, for example, and both have the exact same 22ns L3 cache latency penalty.

This additional memory latency isn't the end of the world by any means, but the fact the Phenom trades this many nanoseconds of latency for the addition of a relatively small 2MB cache is, well, unusual, to say the least. AMD will almost certainly have to raise north bridge speeds along with core clocks in order to keep performance scaling well.