Single page Print
Memory performance
As always, these memory tests are for exhibition only; they don't denote application-level performance, just the real-world ability to move memory efficiently. On that front, we're expecting the Athlon XP to be quite a bit more capable than the Athlon Thunderbird, thanks to the XP's hardware pre-fetch. Let's see if our expectations are met.

Both of the Palomino processors—the Athlon XP and Athlon MP—are faster than the T-birds, particularly in the integer tests. That fact bodes well for the Athlon XP; the processor can likely put its DDR memory to good use, as predicted.

Nevertheless, the Pentium 4 with RDRAM rules this test, as usual. Let's use Linpack to dig under the surface a little bit.

Linpack graphs aren't the easiest to read. Take a second to look at the axis labels, though, and you'll get it. We're measuring processing throughput here, in megaflops, for data matrices of different sizes. The small matrices, on the left half of the graph, fit into the processors' L1 and L2 caches, so processing throughput is high. On the right half of the graph, where the matrices are too large to fit into the caches, performance drops. It's there, at the larger data sizes, where we get a better sense of main system memory bandwidth. Overall, the shape of the graph gives us a nice visual picture of how a system's tiered memory architecture performs.

The Pentium 4 reaches its peak at about 192K, when everything fits into its L1 and L2 caches, and the numbers are crunching. Its peak performance is just under the 1.4GHz Athlon's. When it has to go to main memory, once the matrices get bigger than about 256K (in the right half of the graph), the P4 doesn't drop off nearly as much as the others. That's because the P4's very fast accessing its main memory, and its dual RDRAM channels deliver gobs of bandwidth. All in all, it's an impressive performance.

Unlike the Pentium 4, the Athlon XP's L2 cache doesn't duplicate data stored in its L1 data cache. As a result, the Athlon's effective total cache size is larger. The Athlon XP doesn't really drop off going to main memory until we hit about 320K. However, the Athlon XP's L2 cache is a tad slower than the P4's; you can see, at about 64K, where the Athlon XP has to start moving beyond its L1 data cache into it slower L2 cache, and the MFLOPs drop off.

When it comes time to go out to main memory, the Athlon XP easily outruns the T-bird. That's hardware pre-fetch in action.