Memory subsystem performance
So how does the Core i7's overhauled cache and memory subsystem perform? We can measure it in various ways to find out. Here are a few synthetic benchmarks designed to do just that.
Whoa. The Core i7, she is fast, no? The Core i7-965 Extreme achieves nearly three times the throughput of the fastest single-socket Core 2 processor, the QX9770. With slower 1066 MHz memory, the Core i7-920 and 940 don't quite reach the same heights, but they're still much, much faster than anything else.
The Phenoms aren't performing quite as well here as one might hope, and part of the reason may be because we ran the Phenom's memory controller in dual 64-bit "unganged" mode rather than 128-bit mode. The 128-bit mode may produce somewhat higher scores in synthetic tests, but we chose to test with unganged mode because its all-around performance could potentially be superior.
The results from this test visually illustrate the throughput of the various levels of the memory hierarchy, and we find that the Core i7's caches are all quite fast. Even at the 512 kB and 1 MB test blocks, where presumably we're well into the L3 cache, the Core i7s achieve considerably more throughput than the Penryn-based QX9770.
The results without Hyper-Threading are curious: higher performance in the L1/L2 cache ranges, but lower performance in the L3 range.
Since it's difficult to see the results once we get into main memory, let's take a closer look at the 256 MB block size:
Among the Intel processors, these results are relatively similar to what we saw in Sandra's first memory bandwidth test at the top of the page, though the numbers are lower. However, not only do the AMD processors perform relatively better, but their measured throughput is actually higher here. Still, the Phenom X4 9950 is not even close to the Core i7-920, let alone the faster options.
These results come from a little cachemem-like latency test program included with earlier versions of CPU-Z, and they give us a sense of what the Core i7's integrated memory controller and revamped cache hierarchy bring to the table. (I've assumed "one tick up" Turbo clock speeds for the Core i7 processors in calculating access times.) Despite having a third cache level and a much larger total cache size, the 965 Extreme gets out to main memory as quickly as an Athlon X2 6400+, our previous champ. Remarkable. The Core i7-920, with its slower "uncore" clocks and 1066 MHz memory, is still quicker than most Core 2 chips.
If you think we've already geeked out beyond all reasonable hope, don't scroll down any further. What you'll see below are 3D graphs of memory access latencies at various block and step size for some of the most interesting processors we tested. We've color coded them just as a guide, although it doesn't mean much. Yellow roughly corresponds to the chip's L1 cache size, light orange to the L2 cache, red to the L3 cache, and dark orange to main memory.
Intel seems to have better managed the problem of L3 cache latency than AMD did with the Phenom, especially in the 965 Extreme, which runs its L3 cache at a full 3.2GHz.