Single page Print

Memory subsystem performance
Now that we've considered power efficiency, we'll move on to our performance results, beginning with some synthetic tests of the CPUs' memory subsystems. These results don't track directly with real-world performance, but they do give us some insights into the CPU and system architectures involved. For this first test, the graph is pretty crowded. I've tried to be selective, generally only choosing one representative from each architecture. This test is multithreaded, so more cores—with associated L1 and L2 caches—can lead to higher throughput.

With six L1 data caches, six L2 caches, and a massive 12MB L3 cache, the Core i7-980X is the fastest solution at nearly every data point.

This graph becomes almost impossible to read once we get to the larger block sizes, where we're really measuring main memory bandwidth. Stream is a better test of that particular attribute.

Gulftown essentially matches Bloomfield here, with near-identical bandwidth scores.

The 980X's very low memory access latencies are even more impressive given the fact that its L3 cache is 50% bigger than Bloomfield's. (Larger caches typically have longer latencies.) Intel informs us that Gulftown's L3 cache runs at the same speed as Bloomfield's, so there's no improvement due to higher frequencies. I do think, however, that we may have to adjust our sample to the 32MB block size soon. Latencies for Gulftown at the 16MB size may be getting partially cushioned by the 12MB L3 cache. At the 32MB sample size, latencies for the i7-975 and i7-980X are almost identical and work out to about 51 ns.

For what it's worth, this benchmark reports that the latency for the Core i7-975's L3 cache is 36 cycles (of the CPU core), while the i7-980's is 43 cycles.