Memory subsystem performance

This bandwidth test gives a nice visual for the different levels of the cache and memory hierarchy. Because AMD's lower-level caches don't replicate all of the contents of the higher-level caches, Istanbul's two additional 512KB L2 caches (associated with its two added cores) increase its total effective cache size—and bandwidth—compared to Shanghai.

One new addition we've made for this review is a proper Stream bandwidth test. This version of Stream is multithreaded and can be told how many threads to create. We've chosen the optimal number for each system. As you can see, the Nehalem Xeons have a clear lead in available bandwidth thanks primarily to their three channels of DDR3 1333MHz memory. With no real changes to the memory subsystem, Istanbul achieves no more throughput than Shanghai.

Memory access latencies haven't really changed with Istanbul, either, even though six cores are now sharing the same two memory controllers.

We can get a closer look at access latencies throughout the memory hierarchy with the 3D graphs below. I've colored the block sizes that correspond to different cache levels, with yellow being L1 data cache and brown representing main memory.

The continuity between Istanbul and Shanghai continues here. The Xeon X5550 looks pretty similar, too, but it has smaller L1 and L3 caches, a larger, quicker L3 cache (8MB) and much shorter access times to main memory.