Because LuxMark uses OpenCL, we can use it to test both GPU and CPU performance—and even to compare performance across different processor types. OpenCL code is by nature parallelized and relies on a real-time compiler, so it should adapt well to new instructions. For instance, Intel and AMD offer integrated client drivers for OpenCL on x86 processors, and they both support AVX. The AMD APP driver even supports Bulldozer's and Piledriver's distinctive instructions, FMA4 and XOP. We've used the Intel ICD on the Intel processors and the AMD ICD on the AMD chips, since that was the fastest config in each case.
We'll start with CPU-only results.
So, two things. First, the results for the 4950HQ are not a fluke. I had some worry that the Iris Pro graphics drivers might have installed a different version of the Intel OpenCL ICD that was responsible for the 4950HQ's victory, but that's not so. The 4950HQ is also faster than the 4770K when using AMD's APP driver. Looks like that L4 cache is finally showing us some potential.
Second, it appears Intel's OpenCL ICD doesn't yet support FMA on Haswell. I'd expect that instruction to come in very handy here. Perhaps a future update will correct that oversight.
Now we'll see how a Radeon HD 7950 performs when driven by each of these CPUs.
It's hard to beat a modern GPU for this sort of FLOPS-intensive work. I'd sure like to see a Haswell with proper FMA support make a run at it, though.
We can try combining the CPU and GPU computing power by asking both processor types to work on the same problem at once.
Now, let's pull the discrete GPU out of the test systems and see how their IGPs perform in OpenCL.
AMD has based its sales pitch for APUs on converged computing and OpenCL acceleration. Looks to me like Intel isn't willing to cede any ground to its competitor here. Using its eDRAM cache, the Iris Pro 5200 IGP nearly triples the performance of the A10's Radeon IGP. Even the scaled back HD Graphics configs in the 3770K and 4770K outperform the A10's integrated graphics.
The Cinebench benchmark is based on Maxon's Cinema 4D rendering engine. It's multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.
STARS Euler3d computational fluid dynamics
Euler3D tackles the difficult problem of simulating fluid dynamics. Like MyriMatch, it tends to be very memory-bandwidth intensive. You can read more about it right here.
At the very end of our regular suite of benchmarks, we have a bright and shining example of how the 4950HQ's eDRAM cache can make a difference. True to expectations, it's good for computational fluid dynamics. Ok, so maybe that's not the best justification for bringing a GT3e config to a socketed CPU, but I'd still like to see that happen.