Single page Print

Sizing 'em up
Do the math involving the clock speeds and per-clock potency of the latest high-end graphics cards, and you'll end up with a comparative table that looks something like this:

Peak pixel
fill rate
(Gpixels/s)
Peak
bilinear
filtering
int8/fp16
(Gtexels/s)
Peak
rasterization
rate
(Gtris/s)
Peak
shader
arithmetic
rate
(tflops)
Memory
bandwidth
(GB/s)
Asus R9 290X 67 185/92 4.2 5.9 346
Radeon R9 295 X2 130 358/179 8.1 11.3 640
GeForce GTX 780 Ti 37 223/223 4.6 5.3 336
Gigabyte GTX 980 85 170/170 5.3 5.4 224
GeForce GTX 980 Ti 95 189/189 6.5 6.1 336
GeForce Titan X 103 206/206 6.5 6.6 336

Those are the peak capabilities of each of these cards, in theory. Our shiny new Beyond3D GPU architecture suite measures true delivered performance using a series of directed tests.

The GTX 980 Ti lands squarely in the middle between the GTX 980 and the Titan X in terms of pixel fill rate, which is what we'd expect given the theoretical rates in the table above. Notice that the 980 Ti's peak rate is lower than the Titan X's even though it has the same ROP count (96 pixels per clock) and clock speed. That's because, on recent Nvidia GPUs, fill rate can be limited by the number of shader multiprocessors and rasterizers. The GTX 980 Ti's 22 SMs can only transfer 88 pixels per clock to the ROPs, so its peak throughput is a bit lower than the Titan X's.

This test nicely illustrates the impact of color compression on memory bandwidth. Newer GeForces based on the Maxwell architecture are able to extract substantially more throughput from the easily compressible black texture than the Kepler-based GTX 780 Ti does.

Meanwhile, as Andrew Lauritzen pointed out to us, the Radeon R9 290X doesn't show any compression benefits in this test because it's primarily limited by its ROP throughput. We may have to rejigger this test to sidestep that ROP limitation. I suspect, if we did so, we'd see some benefits from color compression on the Radeon, as well.

Most of these GPUs come incredibly close to matching their peak theoretical filtering rates in this test, the GTX 980 Ti included.

The GeForce cards all somewhat exceed their theoretical peaks in the polygon throughput test. My best guess is that they're able to operate at higher-than-usual clock speeds during this directed test—either that or they're warping the fabric of space and time, but I don't think that feature has been implemented yet.

Looks to me like the GTX 980 Ti is also exceeding its "GPU Boost" clock in our ALU math test, where it scores slightly higher than its 6.1 teraflops theoretical max. Nvidia's Boost clock is just a typical operating speed, not a maximum frequency, so this isn't a huge surprise. Notice, though, that the Titan X tops out at exactly 6.7 teraflops, no higher than expected. The 980 Ti's lower GPU voltage probably gives it an edge over our early-model Titan X.