We'll begin our tests with a customary look at memory subsystem performance. These results won't track with performance in most real-world applications, but they can teach us a thing or two about these processors and how they compare.
Although our Intel motherboard has dual channels of 667MHz DDR memory, The Core 2 Duo's path to main memory is limited by its 1066MHz front-side bus. With their on-chip memory controllers, the Athlon 64 processors can take better advantage of the peak bandwidth offered by two channels of DDR2 memory. That said, the Core 2 Duo doesn't achieve the same throughput as the Extreme Edition 965, which also rides on a 1066MHz bus. The gap between these two Intel CPU architectures may stem from the algorithms they each use to govern pre-fetching of data from main memory into the L2 cache. The Netburst processor may be more aggressive here in a way that benefits it in this synthetic test.
Next up is our ancient version of Linpack. This classic benchmark is traditionally used to measure floating-point math performance, but we use this unoptimized version simply to get a look at the "shape" of the memory subsystem. Unfortunately, this rendition of Linpack has a fixed maximum matrix size of 2MB, so we can't really see how the Core 2's entire L2 cache or main memory performs. I would have cut these results out of the review entirely, were they not so dramatic.
The Core 2 processors look to have one heck of a fast cache subsystem, at least in the first 2MB. Neither the Pentiums nor the Athlons come close.
Memory bandwidth is important, but memory access latencies are arguably more important, though the two are interrelated. This result is intriguing, because the Core 2 processors manage to achieve much lower access latencies than the Netburst-based Pentiums, despite using the same memory timings on the same type of motherboard. These numbers, however, are just one sample point in a range of possibilities. Let's look at representatives of the three different microarchitectures in more detail.
The graphs below show results from multiple step and block sizes. I've color-coded the graphs to make them easier to read. For each processor, the yellow areas represent block sizes that fit into the L1 data cache, the light orange areas represent L2 cache, and the dark orange areas represent main memory.
The Athlon 64's built-in memory controller gives it a pronounced and consistent advantage in getting out to main memory quickly, but the Core 2 really does shave 15 to 20 nanoseconds off of main memory access times versus the Pentium Extreme Edition. I hate to speculate too much about the reasons, but they may include the Core 2's lower latency caches (which we see illustrated here), potentially less aggressive pre-fetching (and thus a less saturated bus), and possibly even its ability to move loads ahead of stores via memory disambiguation.
Oh, and CPU geeks may be interested to note that our latency test app reports the Core 2's L1 cache latency is three cycles, for what it's worth.