Memory subsystem performance
With all of the talk about Barcelona's increased throughput, I figured we should put that to the test. Here's a quick synthetic benchmark of cache and memory bandwidth.
Barcelona delivers as advertised on this front, doubling the L1 and more than doubling the L2 cache bandwidth of the older Opteron 2200s, despite having lower clock speeds. Let's take a closer look at the tail end of these results, where we're primarily accessing main memory. I believe these results show memory bandwidth available to a single CPU core, not total system bandwidth, but they're still enlightening.
The improvements to Barcelona's memory controller appear to pay off nicely here. I'm a little dubious about the relatively low results for the Xeons, though. I expect we could see higher results with a different test.
Anyhow, that's bandwidth, but its close cousin is memory access latency. Opterons have traditionally had very low latencies thanks to their integrated memory controllers. How does Barcelona look here?
Well, that's not so good. Let's look a little closer at the results with the aid of some fancy 3D graphs, and I think we can pinpoint a reason for the Opteron 2300s' higher memory access latencies. In the graphs below, by the way, yellow represents L1 cache, light orange is L2 cache, red is L3 cache, and dark orange is main memory. Just because we can.
Ok, stop right there and have a look. The Opteron 2350's L3 cache has a latency of about 23ns, and the 2360 SE's L3 latency is about 19ns. Since latency in the memory hierarchy is a cumulative thing, that's very likely the cause of our higher memory access latencies. I would give you the L3 cache latency in CPU clock cycles, but that's kind of beside the point. Barcelona's L3 cache runs at the speed of the north bridgeso 1.8GHz in the 2350 and 2.0GHz in the 2360 SE. The L3 cache may have some additional latency for other reasons: because cache access between the four cores is doled out in a round-robin fashion and because of the FIFO buffers that sit in front of this cache in order to deal with cores running at what may be vastly different clock speeds.
Adding the L3 cache in this way was undoubtedly a tradeoff for AMD, but it certainly carries a hefty latency penalty. This penalty may become less pronounced when Barcelona reaches higher clock speeds. AMD says the memory controller's speed can increase as clock frequencies do.
|Cherry Trail debuts as the Atom x5 and x7 series||36|
|End is in sight for Intel's contra-revenue efforts||8|
|Phanteks announces enthusiast-friendly Enthoo Evolv ITX case||16|
|SanDisk unveils microSD card with a whopping 200GB capacity||24|
|Unreal Engine 4 now free for everyone||22|
|Sony's waterproof Xperia Z4 takes on premium tablets||34|
|Samsung's Galaxy S6 is ready for battle at the high end||98|
|Atom x3 chips target cheap phones and tablets, feature ARM graphics||31|
|The TR Podcast 171: Nvidia takes heat, Carrizo runs cool, and Fractal stays quiet||1|
|God you're tiresome.||+59|