Memory subsystem performance
We'll begin with a look at the cache and memory hierarchies of these processors. This test measures throughput at increasing block sizes in order to probe the performance at each level of the memory hierarchy. I've plotted the results like so:

The Nano's 64K L1 data cache appears to be not just larger, but also quite a bit faster than the Atom's 24KB L1 counterpart, with well over 2X the bandwidth. Once we move into the L2 cache ranges for both chips, though, things nearly even out, with the Atom outperforming the Nano at the block sizes from 128KB to 512KB—that is, until it runs out of cache. The Nano's faster at 1MB due to its larger L2 cache, and its real-world performance should benefit from the larger cache with programs that have working data sets larger than 512KB.

The story on main memory bandwidth is kind of tough to see in the graph above, so here's a closer look at the 256MB block size along with a separate main-memory bandwidth test from Sandra.

Despite disparities in front-side bus speeds (800MHz vs. 533MHz) and memory clocks (667MHz vs. 533MHz), the Nano is only slightly faster than the Atom in two of the three memory bandwidth tests represented here. The Pentium M, with a 533MHz bus and single 400MHz DIMM, achieves higher bandwidth than either of the low-cost processors.

The Nano looks to be quite a bit quicker in accessing memory, at least at this block and stride size. Let's have a look at a fuller picture of cache and memory access latencies. In the graphs below, I've colored the data points that correspond to L1 data caches yellow, while L2 cache is light orange and main memory is dark orange, just as a guide.

The initial latency sample we chose kind of put the Atom in a bad light, since it seems to have trouble at the 8MB block size and 512-byte stride. The adjacent points average out to around 104 ns of latency going to main memory. Still, the Nano's quite a bit quicker to main memory overall, no doubt thanks to its faster bus and RAM clocks.

For the record, CPU geeks, the Atom's L1 data cache latency appears to be three cycles, and its L2 latency averages out to about 16 cycles. The Nano's L1 latency is four cycles, and its L2 latency is about 24.

All told, the Nano seems to have the edge here, with larger caches, a much higher-bandwidth L1 data cache, and quicker access to main memory than the Atom. These factors will have an influence on overall performance in real applications, but they are not really all that important on their own. Let's move on to some true performance tests.