Single page Print

Memory subsystem performance
Per our tradition, we're going to start off by comparing the memory subsystems of our CPUs in a few synthetic tests.

Please note that the A10-4600M and Core i7-3720QM have higher-clocked memory than the other two offerings. Because of the discrepancy, the results below won't paint a clear, unadulterated picture of memory controller efficiency. But they will show us something else. You see, the A10-4600M and Core i7-3720QM both support faster RAM than their predecessors. (Both can accommodate DDR3-1600 memory, while the A8-3500M and i7-2670QM are limited to DDR3-1333.) So we're going to be able to see what dividends the faster memory support pays from one generation to the next.

In this basic measure of memory bandwidth, the A10-4600M edges out the A8-3500M by about 13%. Our Ivy Bridge CPU enjoys a similar gain over its forebear. The A10 can't come close to matching the Intel chips, though.

Next up: SiSoft Sandra's more elaborate memory and cache bandwidth test. This test is multithreaded, so it captures the bandwidth of all caches on all cores concurrently. The different test block sizes step us down from the L1 and L2 caches into L3 and main memory.

The A10-4600M's two L1 and L2 caches manage to match the A8's four L1/L2 caches nearly step by step in terms of bandwidth. Neither can keep pace with the Bridge sisters' cache hierarchies, however.

Sandra also includes a new latency testing tool. SiSoft has a nice write-up on it, for those who are interested. We used the "in-page random" access pattern to reduce the impact of prefetchers on our measurements. We've also taken to reporting the results in terms of CPU cycles, which is how this tool returns them. The problem with translating these results into nanoseconds, as we've done in the past with latency measurements, is that we don't always know the clock speed of the CPU, which can vary depending on Turbo responses.

Because it shares 2MB of L2 cache across each dual-core module, the A10 manages a lower latency than the A8 at the 2MB block size.

However, the A10 falls behind the A8 at every other block size, including those small enough to fit into the L1 and L2 caches. The culprit may simply be slower caches on Piledriver. In our desktop tests, Bulldozer fared even worse against the Phenom II X6 1100T, which is based on the same architecture as Llano.