Single page Print

Shader performance

Peak shader
arithmetic
(TFLOPS)
Memory
bandwidth
(GB/s)
GeForce GTX 560 Ti 1.4 134
GeForce GTX 560 Ti 448 1.3 152
GeForce GTX 580 1.6 192
GeForce GTX 680 3.1 192
Radeon HD 5870 2.7 154
Radeon HD 6970 2.7 176
Radeon HD 7870 2.6 154
Radeon HD 7970 3.8 264

Our first look at the performance of Kepler's re-architected SMX yields some mixed, and intriguing, results. The trouble with many of these tests is that they split so cleanly along architectural or even brand lines. For instance, the 3DMark particles test runs faster on any GeForce than on any Radeon. We're left a little flummoxed by the fact that the 7970 wins three tests outright, and the GTX 680 wins the other three. What do we make of that, other than to call it even?

Nonetheless, there are clear positives here, such as the GTX 680 taking the top spot in the ShaderToyMark and GPU cloth tests. The GTX 680 improves on the Fermi-based GTX 580's performance in five of the six tests, sometimes by wide margins. Still, for a card with the same memory bandwidth and ostensibly twice the shader FLOPS, the GTX 680 doesn't appear to outperform the GTX 580 as comprehensively as one might expect.

GPU computing performance

This benchmark, built into Civ V, uses DirectCompute to perform compression on a series of textures. Again, this is a nice result from the new GeForce, though the 7970 is a smidge faster in the end.

Here's where we start to worry. In spite of doing well in our graphics-related shader benchmarks and in the DirectCompute test above, the GTX 680 tanks in LuxMark's OpenCL-driven ray-tracing test. Even a quad-core CPU is faster! The shame! More notably, the GTX 680 trails the GTX 580 by a mile—and the Radeon HD 7970 by several. Nvidia tells us LuxMark isn't a target for driver optimization and may never be. We suppose that's fine, but we're left wondering just how much Kepler's compiler-controlled shaders will rely on software tuning in order to achieve good throughput in GPU computing applications. Yes, this is only one test, and no, there aren't many good OpenCL benchmarks yet. Still, we're left to wonder.

Then again, we are in the early days for OpenCL support generally, and AMD seems to be very committed to supporting this API. Notice how the Core i7-3820 runs this test faster when using AMD's APP driver than when using Intel's own OpenCL ICD. If a brainiac monster like Sandy Bridge-E can benefit that much from AMD's software tuning over Intel's own, well, we can't lay much fault at Kepler's feet just yet.