Memory subsystem performance

We start with some synthetic tests of the cache and memory subsystem, and the first one shows us that the 45nm Xeon E5472 pretty much matches its the Xeon X5365 in L1 and L2 cache bandwidth. The only big difference is at the 16MB block size, where the E5472's larger 6MB L2 cache helps out some. Both of these chips run at 3GHz, so they're a clock-for-clock match. We'll want to watch these two to see how much, if any, the Harpertown Xeon E5472s improve per-clock performance.

Let's take a closer look at the tail end of these results, where we're primarily accessing main memory. I believe these results show memory bandwidth available to a single CPU core, not total system bandwidth, but they're still enlightening.

The Stoakley platform's faster bus and higher memory frequencies add up to a nice boost in bandwidth over the older Xeons on the Bensley platform. Again, I don't think we're seeing absolute peak bandwidth, especially from the Xeons, but we can see a relative boost in throughput.

Memory access latencies are essentially unchanged from the older Xeons to the newer. Let's look at this issue in a little more detail. In the graphs below, yellow represents L1 cache, light orange is L2 cache, red is L3 cache, and dark orange is main memory.

As one might expect, the Xeon E5742's memory access latencies are lower at larger block sizes, like 16MB and 32MB, than the X5365's. The faster bus and memory clocks likely deserve credit for that. More impressively, we measured the E5472's 6MB L2 cache at 15 cycles of latency, just one cycle more than the 4MB L2 cache on the Xeon X5365 at the same clock frequency—quite the contrast to the high latencies we found in the quad-core Opterons' new L3 cache.

Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list