Memory subsystem performance
With all of the talk about Barcelona's increased throughput, I figured we should put that to the test. Here's a quick synthetic benchmark of cache and memory bandwidth.
Barcelona delivers as advertised on this front, doubling the L1 and more than doubling the L2 cache bandwidth of the older Opteron 2200s, despite having lower clock speeds. Let's take a closer look at the tail end of these results, where we're primarily accessing main memory. I believe these results show memory bandwidth available to a single CPU core, not total system bandwidth, but they're still enlightening.
The improvements to Barcelona's memory controller appear to pay off nicely here. I'm a little dubious about the relatively low results for the Xeons, though. I expect we could see higher results with a different test.
Anyhow, that's bandwidth, but its close cousin is memory access latency. Opterons have traditionally had very low latencies thanks to their integrated memory controllers. How does Barcelona look here?
Well, that's not so good. Let's look a little closer at the results with the aid of some fancy 3D graphs, and I think we can pinpoint a reason for the Opteron 2300s' higher memory access latencies. In the graphs below, by the way, yellow represents L1 cache, light orange is L2 cache, red is L3 cache, and dark orange is main memory. Just because we can.
Ok, stop right there and have a look. The Opteron 2350's L3 cache has a latency of about 23ns, and the 2360 SE's L3 latency is about 19ns. Since latency in the memory hierarchy is a cumulative thing, that's very likely the cause of our higher memory access latencies. I would give you the L3 cache latency in CPU clock cycles, but that's kind of beside the point. Barcelona's L3 cache runs at the speed of the north bridgeso 1.8GHz in the 2350 and 2.0GHz in the 2360 SE. The L3 cache may have some additional latency for other reasons: because cache access between the four cores is doled out in a round-robin fashion and because of the FIFO buffers that sit in front of this cache in order to deal with cores running at what may be vastly different clock speeds.
Adding the L3 cache in this way was undoubtedly a tradeoff for AMD, but it certainly carries a hefty latency penalty. This penalty may become less pronounced when Barcelona reaches higher clock speeds. AMD says the memory controller's speed can increase as clock frequencies do.
|Marvell takes Cavium under its wing for $6 billion||2|
|Deals of the day: Ryzen and Threadripper CPUs on the cheap and more||16|
|Aorus K9 Optical keyboard senses strokes with infrared light||14|
|ROG Strix XG32VQ and XG35VQ fuse fast VA panels with FreeSync||18|
|ROG Strix GL702ZC takes 16 Ryzen threads on the move||19|
|Rumor: December Radeon drivers will bring a performance OSD||33|
|Intel spins up new assembly-and-test site for Coffee Lake CPUs||11|
|Deal of the day: A laptop with an i5-8250U and Pascal graphics for $680||34|
|G.Skill's DDR4-4400 kit seizes the four-module memory speed crown||19|
|The Ryzen 5 HP Envy x360 is a laptop I've been researching for the past week, and this new sale has made me pull the trigger. Unfortunately, it seems...||+7|