As always, these memory tests are for exhibition only; they don't denote application-level performance, just the real-world ability to move memory efficiently. On that front, we're expecting the Athlon XP to be quite a bit more capable than the Athlon Thunderbird, thanks to the XP's hardware pre-fetch. Let's see if our expectations are met.

Nevertheless, the Pentium 4 with RDRAM rules this test, as usual. Let's use Linpack to dig under the surface a little bit.

The Pentium 4 reaches its peak at about 192K, when everything fits into its L1 and L2 caches, and the numbers are crunching. Its peak performance is just under the 1.4GHz Athlon's. When it has to go to main memory, once the matrices get bigger than about 256K (in the right half of the graph), the P4 doesn't drop off nearly as much as the others. That's because the P4's very fast accessing its main memory, and its dual RDRAM channels deliver gobs of bandwidth. All in all, it's an impressive performance.
Unlike the Pentium 4, the Athlon XP's L2 cache doesn't duplicate data stored in its L1 data cache. As a result, the Athlon's effective total cache size is larger. The Athlon XP doesn't really drop off going to main memory until we hit about 320K. However, the Athlon XP's L2 cache is a tad slower than the P4's; you can see, at about 64K, where the Athlon XP has to start moving beyond its L1 data cache into it slower L2 cache, and the MFLOPs drop off.
When it comes time to go out to main memory, the Athlon XP easily outruns the T-bird. That's hardware pre-fetch in action.