Single page Print

Memory subsystem performance
We typically kick off our CPU test results with a look at the performance of the memory subsystems, and I figure we might as well continue that tradition. These synthetic tests are intended to measure specific properties of the system and may not end up tracking all that closely with real-world application performance. Still, they can be enlightening.

No real surprises here. We've clocked the memory for all of these systems at 1600MHz with an aggressive 1T command rate, and the 3770K does as much with its dual memory channels as any of the other two-channel solutions—though not much more than its predecessor does.

This would be a good time to introduce the various contenders, I think. The 2600K is the Sandy Bridge incumbent, and it's very similar to the Ivy-based Core i7-3770K in most regards, with the slight exception that its base and Turbo clock speeds are 3.4GHz and 3.8GHz, 100MHz slower than the 3770K's respective speeds. The 2600K was the fastest Sandy Bridge derivative when that chip was first introduced, and it should be a nice foil to the 3770K throughout our tests. Yes, had I used a Core i7-2700K instead, we'd have had a true clock-for-clock comparison. What we have here is more of a price-parity comparison, since these two CPUs are priced only $5 apart.

Another interesting contender from Intel is the Core i7-3820, which we have not properly reviewed up to this point. The 3820 is a quad-core Sandy Bridge-E part; it shares the same 3.8GHz Turbo peak with the 2600K, but its base clock is 3.6GHz, 100MHz higher than the 3770K's. The 3820 also has quad memory channels and a 10MB last-level cache. We're curious to see how often its additional platform bandwidth can grant it an advantage over the regular Sandy and Ivy parts. At $294, the 3820 is moderately priced, probably to offset its higher platform costs.

The Core i7-3960X is the 3820's big brother, a six-core Sandy Bridge-E monstrosity with a 3.3GHz base clock, a 3.9GHz Turbo peak, and a 15MB LLC. At $999, it is Intel's fastest desktop processor to date—unless Ivy takes that crown in a bit of an upset. Obviously, the four memory channels on these processors give them substantially more bandwidth, as our test results indicate.

Finally, we have the three contenders from AMD. The FX-8150 is AMD's fastest desktop processor, based on the new "Bulldozer" microarchitecture. Although it's a large, eight-core chip, the FX-8150 lists at $245, substantially cheaper than the 3770K. The FX also has a much higher TDP, or thermal envelope, of 130W, like the Sandy-E parts. With dual channels of 1600MHz memory, it nearly extracts as much throughput as Ivy. The A8-3850 is more like Sandy and Ivy's spiritual competitor, a smaller chip with quad cores and integrated graphics. However, the A8-3850 is based on an older CPU core with less aggressive prefetchers and no L3 cache, so it doesn't do as much with its dual memory channels. Below that is the Phenom II X6, AMD's prior desktop leader, before Bulldozer arrived. The X6 takes even less advantage of this relatively fast RAM, but you may be surprised by how well it keeps pace with the FX-8150 overall.

This test is multithreaded, so it captures the bandwidth of all caches on all cores concurrently. The different test block sizes step us down from the L1 and L2 caches into L3 and main memory. I think the short answer here is that Ivy's internal caches are no faster or slower than Sandy's. Most likely, the 100MHz difference in clock speeds explains the differences between the 3770K and the 2600K here.

This is a new latency testing tool. SiSoft has a nice write-up on it, for those who are interested. We used the "in-page random" access pattern to reduce the impact of prefetchers on our measurements. We've also taken to reporting the results in terms of CPU cycles, which is how this tool returns them. The problem with translating these results into nanoseconds, as we've done in the past with latency measurements, is that we don't always know the clock speed of the CPU, which can vary depending on Turbo responses.

The only real divergence between Sandy and Ivy is at the 8MB data point, when we're right at the edge of the last-level cache. I'd wager the difference there is due to Ivy's improved cache prefetchers, which can cross page boundaries. Perhaps they're not fooled by the in-page randomization in Sandra's access pattern.

I've omitted a lot of the other CPUs for the sake of readability. The Intel chips all deliver very similar results. The FX-8150's latencies don't look so bad here, especially when you consider that its peak clock speed is 4.2GHz and that the entire architecture was apparently intended to run at even higher frequencies.

Some quick synthetic math tests
We don't have a proper SPEC rate test in our suite (yet!), but I wanted to take a quick look at some synthetic computational benchmarks, to see how the different architectures compare, before we move on to more varied and robust application-based workloads. These simple tests in AIDA64 are nicely multithreaded and make use of the the latest instructions, including Bulldozer's XOP in the CPU Hash test and FMA4 in the FPU Julia and Mandel tests. The latter two tests also use Intel's FMA3 AVX instruction.

Looks to me like those estimates of 4-6% IPC gains from Sandy to Ivy are probably about right, although the 3770K also has a 100MHz advantage on the 2600K. I warn you, the question of IPC gains versus clock speed differences is going to haunt you in the following pages. My apologies in advance, folks.

The FX-8150 is competitive only in the CPU Hash test, where its eight integer cores and XOP instruction give it the advantage. Otherwise, in the two FPU-focused tests, the FX's four AVX-capable floating-point units are distinctly disappointing. In theory we'd expect them to be matching Sandy and Ivy clock for clock, but nothing of the sort happens.