Memory subsystem performance
We typically start with some synthetic tests of the CPUs' memory subsystems, just to weed out any new readers who might be intimidated by such things. Can't become too successful, you know. These results don't track directly with real-world performance, but they do give us some insights into the CPU and system architectures involved. For this first test, the graph is pretty crowded. We've tried to be selective, only choosing a subset of the CPUs tested. This test is multithreaded, so more cores—with associated L1 and L2 caches—can lead to higher throughput.
Peer closely at the 4MB block size, and you'll see that the A8-3850 achieves much higher throughput than the Phenom II X4 840. That's the result of the A8's larger 1MB L2 caches ganging up. In fact, the A8-3850's throughput at 4MB is higher than the Phenom II X4 980's, even though the X4 980 has a 6MB L3 cache. Otherwise, no major surprises here.
The A8-3850 is a little slower in the Stream bandwidth test than its relatives in the AMD family tree. I'd blame the need to share bandwidth with the IGP, but we're using a discrete graphics card for the strictly CPU-oriented portion of our tests.
The 3850's memory access latencies are a little bit higher than the rest of the Phenom II/Athlon II lineup, as well. One contributor to this result is the A8's larger L2 caches, which have 20 cycles of latency, versus 15 cycles for the Phenom II X4 840's L2s. That's a relatively minor factor, though. The additional latency must be coming from other places.