Single page Print

Memory subsystem performance
Our first few tests are synthetic benchmarks that let us inspect the performance of the cache and memory subsystems.

This test is nicely multithreaded, so the caches from all available cores contribute to the throughput measured. You may be surprised to see that the Phenom II X6 1100T achieves higher bandwidth than the FX-8150 at the smaller block sizes, but remember it has six L1 caches where the FX has four. The more apt comparison may be the Phenom II X4 980, with four cores and a 3.7GHz clock frequency. The FX's L1 caches will cover block sizes up to 64KB, and the FX-8150 is faster than the Phenom II X4 980 at each step from 2KB to 64KB. Then again, with only four cores, Sandy Bridge's L1 caches are faster still.

The 256KB to 1MB block sizes are L2 cache territory, and the FX's L2 caches don't look to be especially fast, either, though they do largely outperform the Phenom II X4 980's. Bulldozer's L2 caches may lack for speed, but they're large. At the 4MB data point, the rest of the CPUs are into their L3 caches. The FX is still in its L2 coverage area. The next step up in block size is 16MB, which is right at the outer edge of the FX's effective total cache capacity, since its 8MB of L3 cache doesn't replicate the contents of its 8MB of L2 cache. The FX-8150 again delivers the highest throughput at the 16MB block size, but not by much.

Some of the credit for the FX-8150's strong showing here no doubt goes to its use of 1866MHz DIMMs. However, we've tried 1866MHz memory on the older CPU cores in Llano, and our Stream results topped out at around 15GB/s. Bulldozer's smart data prefetchers and large L2 caches deserve credit for taking good advantage of the available memory bandwidth.

Measuring memory access latencies has gotten to be tricky with the advent of Turbo-style clock speed ramping, because latencies are reported in the number of CPU cycles. Nevertheless, we've chosen to report access latencies with the caveat that our guesses about likely frequencies for these CPUs may be incorrect.

If we're right, the FX comes out looking pretty good, with access latencies comparable to competing Sandy Bridge parts, despite its larger caches. Again, the use of 1866MHz memory may be helping the FX here.

For what it's worth, our tool reports Bulldozer's L1 data cache latency at 3 cycles, L2 at 18 cycles, and L3 at 65 cycles.