Single page Print

Memory subsystem performance
With all of the talk about Barcelona's increased throughput, I figured we should put that to the test. Here's a quick synthetic benchmark of cache and memory bandwidth.

Barcelona delivers as advertised on this front, doubling the L1 and more than doubling the L2 cache bandwidth of the older Opteron 2200s, despite having lower clock speeds. Let's take a closer look at the tail end of these results, where we're primarily accessing main memory. I believe these results show memory bandwidth available to a single CPU core, not total system bandwidth, but they're still enlightening.

The improvements to Barcelona's memory controller appear to pay off nicely here. I'm a little dubious about the relatively low results for the Xeons, though. I expect we could see higher results with a different test.

Anyhow, that's bandwidth, but its close cousin is memory access latency. Opterons have traditionally had very low latencies thanks to their integrated memory controllers. How does Barcelona look here?

Well, that's not so good. Let's look a little closer at the results with the aid of some fancy 3D graphs, and I think we can pinpoint a reason for the Opteron 2300s' higher memory access latencies. In the graphs below, by the way, yellow represents L1 cache, light orange is L2 cache, red is L3 cache, and dark orange is main memory. Just because we can.

Ok, stop right there and have a look. The Opteron 2350's L3 cache has a latency of about 23ns, and the 2360 SE's L3 latency is about 19ns. Since latency in the memory hierarchy is a cumulative thing, that's very likely the cause of our higher memory access latencies. I would give you the L3 cache latency in CPU clock cycles, but that's kind of beside the point. Barcelona's L3 cache runs at the speed of the north bridge—so 1.8GHz in the 2350 and 2.0GHz in the 2360 SE. The L3 cache may have some additional latency for other reasons: because cache access between the four cores is doled out in a round-robin fashion and because of the FIFO buffers that sit in front of this cache in order to deal with cores running at what may be vastly different clock speeds.

Adding the L3 cache in this way was undoubtedly a tradeoff for AMD, but it certainly carries a hefty latency penalty. This penalty may become less pronounced when Barcelona reaches higher clock speeds. AMD says the memory controller's speed can increase as clock frequencies do.