I was really under the gun when I was trying to finish up my GeForce GTX 970 and 980 review. As a result, I wasn't able to track down the cause of an interesting anomaly in my test results. Have a look at the theoretical peak pixel fill rate of the GTX 970 and 980 reference cards (along with the Asus Strix 970 card we tested) based on the GPU's active ROP count and clock speed:
|GeForce GTX 970||75||123/123||3.9||4.7||224|
|Asus Strix GTX 970||80||130/130||4.2||5.0||224|
|GeForce GTX 980||78||156/156||5.0||4.9||224|
On paper, the GTX 970 ought to be nearly as fast on this front as the 980—and the Asus Strix card ought to be a smidgen faster. The 3DMark color fill test we use has evidently been limited by memory bandwidth at times in the past, but that shouldn't be an issue since all three cards in question have the exact same memory config.
Look at what happened, however, when I ran that synthetic fill rate test:
Despite having superior or equal numbers on paper, the Asus Strix 970 couldn't come close to matching the GTX 980's delivered pixel throughput. I promptly raised an eyebrow upon seeing these results, but I didn't have time to investigate the issue any further.
Then, last week, an email hit my inbox from Damien Triolet at Hardware.fr, one of the best GPU reviewers in the business. He offered a clear and concise explanation for these results—and in the process, he politely pointed out why our numbers for GPU fill rates have essentially been wrong for a while. Damien graciously agreed to let me publish his explanation:
For a while, I've thought I should drop you an email about some pixel fillrate numbers you use in the peak rates tables for GPUs. Actually, most people got those numbers wrong as Nvidia is not crystal clear about those kind of details unless you ask very specifically.
The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :
- The number of rasterizers
- The number of SMs
- The number of ROPs
On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.
On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).
So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :
GTX 750 : 16/16/16
GTX 750 Ti : 16/20/16
GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)
GTX 770 : 32/32/32
GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)
GTX 780 Ti : 40/60/48
GTX 970 : 64/52/64
GTX 980 : 64/64/64
Extra ROPs are still useful to get better efficiency with MSAA and so. But they don’t participate in the peak pixel fillrate.
That’s in part what explains the significant fillrate delta between the GTX 980 and the GTX 970 (as you measured it in 3DMark Vantage). There is another reason which seem to be that unevenly configured GPCs are less efficient with huge triangles splitting (as it’s usually the case with fillrate tests).
So the GTX 970's peak potential pixel fill rate isn't as high as the GTX 980's, in spite of the fact that they share the same ROP count, because the key limitation resides elsewhere. When Nvidia hobbles the GTX 970 by disabling SMs, the effective pixel fill rate suffers.
That means, among other things, that I need to build a much more complicated spreadsheet for figuring these things out. It also means paying extra for a GTX 980 could be the smart move if you plan to use that graphics card to drive a 4K display—or to use DSR at a 4X factor like we recently explored. That said, the GTX 970 is still exceptionally capable, especially given the clock speed leeway the GM204 GPU appears to offer.
Thanks to Damien for enlightening us—and for solving a puzzle in our results that I hadn't yet had time to investigate.