We begin with synthetic memory tests, because they're just so intriguing. As always, these results don't straightforwardly predict performance in real-world applications, but they do have some bearing on the performance of memory-bound tasks.
As expected, the new Athlon 64 chips are able to take good advantage of their dual channels of DDR400 memory. They do especially well in the Sandra test, which uses extensive buffering, tuning, and black magic to squeeze as much bandwidth out of a system as possible. The Pentium 4s have more memory bandwidth available to them, in theory, thanks to the 925X chipset's dual channels of DDR2 533MHz memory, but the results don't work out that way.
Cachemem is a little more relaxed, and probably more representative of many real-world apps. Here, the Prescott-based Pentium 4s do relatively better, probably due to Prescott's very aggressive speculative pre-fetching of data from memory into the L2 cache.
Linpack lays bare the first few stages of the memory hierarchy for all to see, starting with the L1 cache, moving to the L2 cache, and then into L3 cache or main memory. The higher the number of MFLOPS, the faster the CPU is able to do a calculation with a certain data set size. You can see how the scores for all of the processors tend to drop as the data sets get larger and more of the work spills over into a slower part of the memory hierarchy. Also, note the difference between the Athlon 64 chips with 512K and 1MB of L2 cache, which is quite obvious here.
I want to take a quick detour to point out one really notable difference. Have a look at this:
These results were consistent across multiple benchmark runs, and I'm confident they are for real. The 90nm version of the Athlon 64 appears to have a slightly faster L2 cache than the 130nm version. Since performance is the same inside the Athlon 64's 64K L1 data cache, I don't believe this is a difference in the CPU core.
AMD has stated the 90nm and 130nm versions of the Athlon 64 are essentially the same, so I asked them about these results. All they would say is that for the 90nm parts, "some small optimizations were made in the memory controller and also in the way instructions execute." I think this looks more like a change in the way the L2 cache is organized. AMD and Intel both pack their cache transistors in ever tighter over time, and such a change could result in higher performance, as well. Whatever the case, the difference in L2 cache performance appears to result in ever-so-slightly higher performance all around for the 90nm 3500+, as you'll see.
Finally, we have memory access latency numbers, which are pretty clearly defined by CPU type and memory subsystem. The 80ns times for the Pentium 4 DDR2 systems are fairly decent in the grand scheme of things, but the Athlon 64s with integrated memory controllers and ultra-low-latency DDR400 are extremely quick at fetching data from RAM.