We'll kick it off, as usual, with a look at memory performance. The Pentium 4 has traditionally excelled at memory bandwidth performance, and the 2GHz model is no exception.
The P4's impressive memory bandwidth has often been attributed to its dual channels of Rambus DRAM, but recent previews
of the Pentium 4 using DDR SDRAM, like the 1.2 and 1.4GHz Athlons do in the test above, have shown impressive memory performance, as well. Our preliminary internal testing with a VIA P4X266 reference motherboard
and PC2100 DDR SDRAM has shown Sandra memory scores in excess of 1000MB/swell above Athlons using the same RAM. The long and short of it is that the Pentium 4 makes great use of a fast memory subsystem.
To get a little more detailed picture of how these processors access memory, we'll look at Linpack results. To keep things simple, I've only included one result per CPU type. Otherwise, the graph's darn near impossible to read.
Of course, this graph isn't terribly easy to read even now without some explanation. What you're seeing is the performance, in MFLOPS, of a floating-point calculation being performed on data matrices of varying sizes. A smaller data matrix will fit into the CPU's L1 an/or L2 caches, allowing for some very fast calculations. As the matrix size grows, the data must be accessed in main memory, slowing things down. Let's break it down:
- The orange line is the Pentium 4. It reaches its peak at about 192K, when everything fits into its L1 and L2 caches, and the numbers are crunching. Its peak performance is just under the 1.4GHz Athlon's. When it has to go to main memory, once the matrices get bigger than about 256K (in the right half of the graph), the P4 doesn't drop off nearly as much as the others. That's because the P4's very fast accessing its main memory, and its dual RDRAM channels deliver gobs of bandwidth. All in all, it's an impressive performance.
- The green line represents the Athlon. Unlike the Pentium 4, the Athlon's L2 cache doesn't duplicate data stored in its L1 data cache. As a result, the Athlon's effective total cache size is larger. The Athlon doesn't really drop off going to main memory until we hit about 320K. However, the Athlon's L2 cache is a tad slower than the P4's; you can see, at about 64K, where the Athlon has to start moving beyond its L1 data cache into it slower L2 cache, and the MFLOPs drop off. Once the Athlon gets to main memory, it's quite a bit slower than the P4, but faster than all the PC133 SDRAM-based systems.
- The two value processors here, the Duron and Celeron, each have an effective cache size of 128K. The Duron whups the Celeron, though, delivering a much higher peak and outrunning all the other PC133-based systems when accessing main memory thanks to its hardware prefetch logic. The new Athlon Palomino, due to replace the current desktop Athlon chips before long, will include this same prefetch mechanism, and it should take better advantage of its available memory bandwidth as a result.
Keep in mind that memory performance alone doesn't determine overall performance, as we'll see below. However, it does give us some important clues about why these different chips perform like they do.