Single page Print

Shader performance

Peak shader
arithmetic
(GFLOPS)
Peak
rasterization
rate
(Mtris/s)
Peak
memory
bandwidth
(GB/s)
GeForce GTX 460 1GB 810MHz 1089 1620 124.8
GeForce GTX 470 GC 1120 2500 133.9
GeForce GTX 480 1345 2800 177.4
GeForce GTX 580 1581 3088 192.0
Radeon HD 5870 2720 850 153.6
Radeon HD 6870 2016 900 134.4
Radeon HD 5970 4640 1450 256.0

In recent months, our GPU reviews have been missing a rather important element: tests of GPU shader performance or processing power outside of actual games. Although some of today's games use a fairly rich mix of shader effects, they also stress other parts of the GPU at the same time. We can better understand the strengths and weakeness of current GPU architectures by using some targeted shader benchmarks. The trouble is: what tests are worth using?

Fortunately, we have several answers today thanks to some new entrants. The first of those is ShaderToyMark, a pixel shader test based on six different effects taken from the nifty ShaderToy utility. The pixel shaders used are fascinating abstract effects created by demoscene participants, all of whom are credited on the ShaderToyMark homepage. Running all six of these pixel shaders simultaneously easily stresses today's fastest GPUs, even at the benchmark's relatively low 960x540 default resolution.

You may be looking between the peak arithmetic rates in table at the top of the page and the results above and scratching your head, but the outcome will be no surprise to those familiar with these GPU architectures. The vast SIMD arrays on AMD's chips do indeed have higher peak theoretical rates, but their execution units can't always be scheduled as efficiently as Nvidia's. In this case, the GTX 580 easily outperforms the single-GPU competition. Unfortunately, this test isn't multi-GPU compatible, so we had to leave out those configs.

Incidentally, this gives us our first good look at the shader performance differences between the Radeon HD 5870 and 6870. The 6870 is based on the smaller Barts GPU and performs nearly as well as the 5870 in many games, but it is measurably slower in directed tests, as one might expect.

Up next is a compute shader benchmark built into Civilization V. This test measures the GPU's ability to decompress textures used for the graphically detailed leader characters depicted in the game. The decompression routine is based on a DirectX 11 compute shader. The benchmark reports individual results for a long list of leaders; we've averaged those scores to give you the results you see below.

These results largely mirror what we saw above in terms of relative performance, with the added spice of multi-GPU outcomes. Strangely, the Radeon HD 5970 stumbles a bit here.

Finally, we have the shader tests from 3DMark Vantage.


Clockwise from top left: Parallax occlusion mapping, Perlin noise,
GPU cloth, and GPU particles

These first two tests use pixel shaders to do their work, and the Radeons fare relatively well in both. The Perlin noise test, in particular, is very math intensive, and this looks to be a case in which the Radeons' stratospheric peak arithmetic rates actually pay off.

These two tests involve simulations of physical phenomena using vertex shaders and the DirectX 10-style stream output capabilities of the GPUs. In both cases, the GeForces are substantially faster, with the GTX 580 again at the top of the heap.

Geometry processing throughput
The most obvious area of divergence between the current GPU architectures from AMD and Nvidia is geometry processing, which has become a point of emphasis with the advent of DirectX 11's tessellation feature. Both GPU brands support tessellation, which allows for much higher geometric detail than usual to be generated and processed on the GPU. The extent of that support is the hot-button issue. With Fermi, Nvidia built the first truly parallel architecture for geometry processing, taking one of the last portions of the graphics pipeline that was processed serially and distributing it across multiple hardware units. AMD took a more traditional, serial approach with less peak throughput.

We can measure geometry processing speeds pretty straightforwardly with a couple of tools. The first is the Unigine Heaven demo. This demo doesn't really make good use of additional polygons to increase image quality at its highest tessellation levels, but it does push enough polys to serve as a decent synthetic benchmark.

Notice that the multi-GPU solutions scale nicely in terms of geometry processing power; the alternate-frame rendering method most commonly used for load balancing between GPUs offers nearly perfect scaling on this front. Even so, the GTX 580 is still roughly a third faster than the Radeon HD 5970. Among the AMD solutions, only the dual Radeon HD 6870s can challenge the GTX 580 here, in part because of some tessellation optimizations AMD put into the Barts GPU.

TessMark's multiple tessellation levels give us the chance to push the envelope even further, down to, well, insanely small polygons, and past the Radeons' breaking point. This vast difference in performance once polygon counts get to a certain level will help inform our understanding of some important issues ahead. We can already see how Nvidia's architectural choices have given the GTX 580 a distinct advantage on this front.