Single page Print

Shader and geometry processing performance

Peak shader
arithmetic
(GFLOPS)
Peak
rasterization
rate
(Mtris/s)
Peak
memory
bandwidth
(GB/s)
GeForce GTX 460 768MB 941 1400 88.3
GeForce GTX 460 1GB 810MHz 1089 1620 124.8
GeForce GTX 470 GC 1120 2500 133.9
GeForce GTX 480 1345 2800 177.4
GeForce GTX 570 1405 2928 152.0
GeForce GTX 580 1581 3088 192.0
Radeon HD 6850 1517 790 128.0
Radeon HD 6870 2016 900 134.4
Radeon HD 5870 2720 850 153.6
Radeon HD 6950 2253 1600 160.0
Radeon HD 6970 2703 1760 176.0
Radeon HD 5970 4640 1450 256.0

Theoretical shader performance is an even trickier subject than the graphics rates we covered on the last page, for reasons we discussed when considering Cayman's VLIW4 SPU design. Scheduling efficiency and utilization will count for a lot, as will other quirks of the individual architectures. In theory, the 6970's peak FLOPS rates are nearly double the GeForce GTX 570's, but Nvidia has a very different approach to shader design involving fewer units, doubled clock frequencies (versus the GPU core clock), and very efficient sequential, scalar scheduling. Also, Cayman's dual vertex engines give it a nice boost in peak rasterization rate over the 5870, but the 6970's theoretical peak rate is still less than two-thirds of the GTX 570's.

The first tool we can use to measure delivered pixel shader performance is ShaderToyMark, a pixel shader test based on six different effects taken from the nifty ShaderToy utility. The pixel shaders used are fascinating abstract effects created by demoscene participants, all of whom are credited on the ShaderToyMark homepage. Running all six of these pixel shaders simultaneously easily stresses today's fastest GPUs, even at the benchmark's relatively low 960x540 default resolution.

Yep, Nvidia's GPUs are faster here, despite their much lower theoretical peak FLOPS counts. Go past that and focus on the question of Cypress' VLIW5 shaders versus Cayman's VLIW4 design for a second, though. In theory, the Radeon HD 5870 can deliver 2.72 GLFOPS to the 6970's 2.7 GFLOPS. In practice, though, the 6970 is over 10% faster, even in this all-graphics workload. That's progress, even if it's not revolutionary.

Up next is a compute shader benchmark built into Civilization V. This test measures the GPU's ability to decompress textures used for the graphically detailed leader characters depicted in the game. The decompression routine is based on a DirectX 11 compute shader. The benchmark reports individual results for a long list of leaders; we've averaged those scores to give you the results you see below.

It's not awful, but Cayman performs relatively poorly in this test, all things considered. The 6950 falls behind the Barts-based Radeon HD 6870, which has no advantage on paper that would predict this outcome. One possible reason for this result is that AMD's driver-based real-time compiler far Cayman may still be fairly immature. There's another possibility, too, which we'll explore in a sec.

Finally, we have the shader tests from 3DMark Vantage.


Clockwise from top left: Parallax occlusion mapping, Perlin noise,
GPU cloth, and GPU particles

The 6900-series cards generally perform as expected in three of these tests, offering minor incremental improvements over the Radeon HD 5870. In a fourth, the Perlin noise test, the 5870 is markedly faster. Why? I'm pretty sure we're seeing Cayman's PowerTune power cap taking effect. AMD specifically mentioned 3DMark's Perlin noise as an application that bumps up against the limits, and the performance would seem to indicate that clock speeds are being lowered.

Even so, notice that the 6970 remains quite a bit faster than the GTX 570 in this benchmark, just as it is in the parallel occlusion mapping test. Both of those are pixel shader-intensive tests, and as we've mentioned, Perlin noise is very arithmetic-heavy. The final two 3DMark tests, however, emphasize vertex shader performance, and the Fermi architecture's distributed geometry processing capabilities give it a clear win. Note that Nvidia's pre-Fermi G80 and GT200 chips (in the 8800 GTX and GTX 280, respectively) don't fare nearly as well, relatively speaking, against the Radeon HD 4870.

Geometry processing throughput
We can measure geometry processing speeds pretty straightforwardly with a couple of tools. The first is the Unigine Heaven demo. This demo doesn't really make good use of additional polygons to increase image quality at its highest tessellation levels, but it does push enough polys to serve as a decent synthetic benchmark.

The Radeon HD 6970 performs as well here as two Cypress chips aboard the Radeon HD 5970, so that's progress. Still, Cayman is no match for the GF110's quad rasterizers and 16 vertex engines.

We can push into even higher degrees of tessellation using TessMark's multiple detail levels.

Hmm. TessMark uses OpenGL rather than Direct3D to access the GPU, and apparently AMD's OpenGL drivers aren't yet fully aware of Cayman's expanded geometry processing capabilities. Frustrating.