Single page Print

Sizing 'em up
Do the math involving the clock speeds and per-clock potency of the latest high-end graphics cards, and you'll end up with a comparative table that looks something like this:

Peak pixel
fill rate
(Gpixels/s)
Peak
bilinear
filtering
int8/fp16
(Gtexels/s)
Peak
rasterization
rate
(Gtris/s)
Peak
shader
arithmetic
rate
(tflops)
Memory
bandwidth
(GB/s)
Asus R9 290X 67 185/92 4.2 5.9 346
Radeon R9 295 X2 130 358/179 8.1 11.3 640
GeForce GTX 780 Ti 45 223/223 4.6 5.3 336
Gigabyte GTX 980 85 170/170 5.3 5.4 224
GeForce Titan X 103 206/206 6.5 6.6 336

We've shown you tables like that many times in the past, but frankly, the tools we've had to test delivered performance in these key rates haven't been all that spectacular. That ends today, since we have a new revision of the Beyond3D graphics architecture test suite that measures these things much more accurately. These are directed tests aimed at particular GPU features, so their results won't translate exactly into in-game performance. They can, however, shed some light on the respective strengths and weaknesses of the GPU silicon.

Nvidia has commited a ton of resources to pixel fill and blending in its Maxwell chips, as you can see. The mid-range GTX 980 surpasses even AMD's high-end Radeon R9 290X on this front, and the Titan X adds even more pixel throughput. This huge capacity for pixel-pushing should make the Titan X ready to thrive in this era of high-PPI displays.

This test cleverly allows us to measure the impact of the frame-buffer compression capabilities built into modern GPUs. The random texture used isn't compressible, while the black texture should be easily compressed. The results back up Nvidia's claims that it's had some compression for several generations, but the form of compression built into Maxwell is substantially more effective. Thus, the Titan X achieves effective transfer rates higher than its theoretical peak memory bandwidth.

The reason we don't see any compression benefits on AMD's R9 290X is because we're hitting the limits of the Hawaii chip's ROPs in this test. We may have to tweak this test in the future in order to get a sense of the degree of compression happening in recent Radeons.

'tis a little jarring to see how closely the measured texture filtering rates from these GPUs match their theoretical peaks. These tools are way better than anything we've seen before.

Since the Fermi generation, Nvidia's GPU architectures have held a consistent and often pronounced lead in geometry throughput. That trend continues with Maxwell, although the GM200's capabilities on this front haven't grown as much as in other areas.

I'm not quite sure what's up with the polygon throughput test. The delivered results are higher than the theoretical peak rasterization rates for the GeForce cards. I have a few speculative thoughts, though. One, the fact that the GTX 980 outperforms the Titan X suggests this test is somehow gated by GPU clock speeds. Two, the fact that we're exceeding the theoretical rate suggests perhaps the GPU clocks are ranging higher via GPU Boost. The "boost" clock on these GeForce cards, on which we've based the numbers in the table above, is more of a typical operating speed, not an absolute peak. Three, I really need to tile my bathroom floor, but we'll defer that for later.

The first of the three ALU tests above is something we've wanted for a while: a solid test of peak arithmetic throughput. As you can see, the GM200 pretty much sticks the landing here, turning in just as many teraflops as one would expect based on its specs. The GTX 980 and R9 290X do the same, while the Kepler-based GTX 780 Ti is somewhat less efficient, even in this straightforward test case.

I have to say that, although the GM200 is putting on a nice show, that big shader array on AMD's Hawaii chip continues to impress. Seems like no matter how you measure the performance of a GCN shader array, you'll find strengths rather than weaknesses.

Now, let's see how all of this voodoo translates into actual gaming performance.