Single page Print

Memory subsystem performance
These synthetic tests are intended to measure specific properties of the system and may not end up tracking all that closely with real-world application performance. Still, they can be enlightening.

One of Piledriver's purported tweaks is an improved hardware prefetcher, which populates the L2 cache by examining access patterns and predicting what data will be needed next. Whatever changes AMD has made on that front don't show up in our Stream results, where the FX-8350 matches the FX-8150 almost exactly. Many of the Intel chips extract more bandwidth from the same dual-channel DDR3 memory config. With four channels, the Core i7-3820 and 3960X achieve nearly double the transfer rates.

This test is multithreaded, so it captures the bandwidth of all caches on all cores concurrently. The different test block sizes step us down from the L1 and L2 caches into L3 and main memory.

Although the FX-8350 achieves somewhat higher cache throughput than the FX-8150, we can probably chalk up the differences to the 8350's higher base clock frequency. We might be seeing the effects of Piledriver's larger L1 cache TLB at the 32KB block size, but it's tough to say for sure.

SiSoft has a nice write-up of this latency testing tool, for those who are interested. We used the "in-page random" access pattern to reduce the impact of prefetchers on our measurements. We've reported the results in terms of CPU cycles, which is how this tool returns them. The problem with translating these results into nanoseconds, as we've done in the past with latency measurements, is that we don't always know the clock speed of the CPU, which can vary depending on Turbo responses. At any rate, knowing latency in clock cycles is helpful for understanding, say, the differences between Bulldozer and Piledriver. Imagine that.

Piledriver's memory subsystem doesn't appear to be any quicker, on a per-cycle basis, than Bulldozer's. In fact, the FX-8350's caches are a bit slower at each step up the ladder.

Some quick synthetic math tests
We don't have a proper SPEC rate test in our suite (yet!), but I wanted to take a quick look at some synthetic computational benchmarks, to see how the different architectures compare, before we move on to more varied and robust application-based workloads. These simple tests in AIDA64 are nicely multithreaded and make use of the latest instructions, including Bulldozer's XOP in the CPU Hash test and FMA4 in the FPU Julia and Mandel tests.

The FX-8350 takes the top spot in the CPU Hash test, not surprising given the relatively strong performance of the AMD processors in this integer-focused benchmark. The more FPU-intensive fractal tests are a very different story, with the Sandy and Ivy Bridge-based chips topping the charts. Although Vishera's four FPUs should, in theory, be capable of the same number of peak FLOPS per clock as any Sandy or Ivy quad-core, the FX-8350's throughput here is substantially lower, even with the advantage of a higher clock speed. With the aid of the FMA instruction and a 4GHz base clock, at least the FX-8350's four FPUs are able to outperform the six older FPUs on the Phenom II X6 1100T, a feat the FX-8150 can't duplicate.