Memory performance
We like to kick off our benchmark suite with some tests that measure memory performance. These tests are interesting and informative, but they don't speak directly to overall system performance. Memory bandwidth and access latency will help determine a system's performance in more general tests. We just enjoy the chance to isolate the memory subsystem and poke it a little bit to see what happens.

We'll start with SiSoft's Sandra, the most popular choice on the web for evaluating memory bandwidth. Sandra uses a range of techniques, from buffering to specialized SIMD extensions like SSE and 3DNow!, to achieve its amazingly high bandwidth numbers. These scores are useful because we can see the hardware approach its theoretical peak for memory performance.

As we've come to expect, RDRAM is fastest here, especially in the PC1066 incarnation we're using. However, the biggest thing to notice is how the Athlon XP isn't able to take advantage of the memory bandwidth afforded to it by PC2700 DDR memory. The XP's 266MHz bus is the primary limiting factor in this worst-case memory bandwidth use scenario.

Now we'll look at Cachemem, which will show us both read and write speeds. Cachemem is also a little less gonzo about cramming as much data as possible across the memory interface using every trick in the book. As a result, it might be a little bit better indication of real-world memory access scenarios.

Even here, the Athlon XP isn't really any faster than its predecessors at moving data back and forth to memory.

Now let's look at memory access latency, which is another, related piece of the memory performance puzzle. We've converted our results into nanoseconds so it's possible to compare across different clock speeds and platforms.

Here the Athlon XP fares better. Were its 266MHz bus a serious problem, we might see it causing significant memory access latency penalties, but that just isn't the case. Also note that PC1066 RDRAM has a little bit higher access latency than DDR memory on the Pentium 4, but the difference is negligible.

Finally, we'll whip out a funky Linpack graph and really geek out. This one shows visually how the systems perform in memory-intensive floating-point math calculations when the data will fit into the L1 cache, into the L2 cache, and then only into main memory.

The Pentium 4's faster bus and larger L2 cache give it the lead from about 128K onward, but the XP 2600+ boasts the highest peak performance with smaller data sets.

Now, let's dive into the real-world stuff.