Memory performance
We're going to start off with memory tests because, well, that's how we generally start things off. Also, these tests are a little more theoretical than the rest, so we'll get them out of the way before we move on to the real contest.

First up is the modified version of the Stream memory benchmark that's included in SiSoft's Sandra. This test measures memory bandwidth, which is one component of memory performance.

The results break down nicely into three separate groups. As expected, the RDRAM systems are fastest here. The Pentium 4 DDR systems are next, and you can see there's no difference between Willamette and Northwood here; they're all talking to the same memory over the same bus, so the memory bandwidth is nearly identical. Finally, the Athlon XP systems can't transfer quite as much data to memory as the Pentium 4 systems. On some memory-intensive tasks, the Pentium 4 will have the advantage.

However, that's only half the story. As you can see on this page of our recent chipset review, RDRAM memory's extra bandwidth comes at a price of higher memory latencies. DDR-based systems are much quicker accessing memory in smaller chunks, which helps them compare well against RDRAM-based systems despite the bandwidth disparity.

The more interesting test here is Linpack, which can give us a nice visual look at Northwood's L2 cache in action. Here's how the results look:

If you're not familiar with a Linpack graph, watch closely. The X axis is the size of the data matrix Linpack is processing, and the Y axis is the calculation speed measured in megaflops. If data fits into a processor's cache, the CPU can process that data much faster. As the size of the data matrix grows, the calculations will get progressively slower.

This graph shows us several things. First, you can see that Northwood's L2 cache is quite a bit larger than Willamette's. Willamette's performance begins to drop off once we get into matrices of about 192K in size, while Northwood peaks at about 384K. Not only that, but the extra cache helps Northwood's peak performance climb much higher than Willamette's can.

Next, notice that the Athlon XP's effective cache size is greater than 256K. Although the Athlon XP has a 256K L2 cache, its L2 cache doesn't replicate the contents of the L1 data cache like the Pentium 4's does. You can even see that the Athlon XP's 64K L1 data cache is much faster than its L2 cache. The Athlon XP's exclusive L2 cache gives it an effective cache size of 320K. However, the Athlon XP's L2 cache is measurably slower than Northwood's.

Now the intriguing bit: the Athlon XP shows us all of its 256K L2 and 64K L1 data cache in Linpack. Performance doesn't drop off sharply until the matrix size hits 320K. The Northwood, however, peaks at about 384K—well below its 512K L2 cache size. I expect the difference here has something to do with the way these two chips manage their respective caches.