Here’s another reason the GeForce GTX 970 is slower than the GTX 980

I was really under the gun when I was trying to finish up my GeForce GTX 970 and 980 review. As a result, I wasn't able to track down the cause of an interesting anomaly in my test results. Have a look at the theoretical peak pixel fill rate of the GTX 970 and 980 reference cards (along with the Asus Strix 970 card we tested) based on the GPU's active ROP count and clock speed:

  Peak pixel

fill rate

(Gpixels/s)

Peak

bilinear

filtering

int8/fp16

(Gtexels/s)

Peak

shader

arithmetic

rate

(tflops)

Peak

rasterization

rate

(Gtris/s)

Memory

bandwidth

(GB/s)

GeForce GTX 970 75 123/123 3.9 4.7 224
Asus Strix GTX 970 80 130/130 4.2 5.0 224
GeForce GTX 980 78 156/156 5.0 4.9 224

On paper, the GTX 970 ought to be nearly as fast on this front as the 980—and the Asus Strix card ought to be a smidgen faster. The 3DMark color fill test we use has evidently been limited by memory bandwidth at times in the past, but that shouldn't be an issue since all three cards in question have the exact same memory config.

Look at what happened, however, when I ran that synthetic fill rate test:

Despite having superior or equal numbers on paper, the Asus Strix 970 couldn't come close to matching the GTX 980's delivered pixel throughput. I promptly raised an eyebrow upon seeing these results, but I didn't have time to investigate the issue any further.

Then, last week, an email hit my inbox from Damien Triolet at Hardware.fr, one of the best GPU reviewers in the business. He offered a clear and concise explanation for these results—and in the process, he politely pointed out why our numbers for GPU fill rates have essentially been wrong for a while. Damien graciously agreed to let me publish his explanation:

For a while, I've thought I should drop you an email about some pixel fillrate numbers you use in the peak rates tables for GPUs. Actually, most people got those numbers wrong as Nvidia is not crystal clear about those kind of details unless you ask very specifically.

The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :

  • The number of rasterizers
  • The number of SMs
  • The number of ROPs

On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.

On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).

So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :

GTX 750 : 16/16/16

GTX 750 Ti  : 16/20/16

GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)

GTX 770 : 32/32/32

GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)

GTX 780 Ti : 40/60/48

GTX 970 : 64/52/64

GTX 980 : 64/64/64

Extra ROPs are still useful to get better efficiency with MSAA and so. But they don’t participate in the peak pixel fillrate.

That’s in part what explains the significant fillrate delta between the GTX 980 and the GTX 970 (as you measured it in 3DMark Vantage). There is another reason which seem to be that unevenly configured GPCs are less efficient with huge triangles splitting (as it’s usually the case with fillrate tests).

So the GTX 970's peak potential pixel fill rate isn't as high as the GTX 980's, in spite of the fact that they share the same ROP count, because the key limitation resides elsewhere. When Nvidia hobbles the GTX 970 by disabling SMs, the effective pixel fill rate suffers.

That means, among other things, that I need to build a much more complicated spreadsheet for figuring these things out. It also means paying extra for a GTX 980 could be the smart move if you plan to use that graphics card to drive a 4K display—or to use DSR at a 4X factor like we recently explored. That said, the GTX 970 is still exceptionally capable, especially given the clock speed leeway the GM204 GPU appears to offer.

Thanks to Damien for enlightening us—and for solving a puzzle in our results that I hadn't yet had time to investigate.

Comments closed
    • buzukh
    • 5 years ago

    nice information

    • Moses148
    • 5 years ago

    Damn this article, I was so set that the 970 was a better thing to buy (as of the price difference) but now the doubts are back. So which card would you guys recommend?

      • 6GTX9
      • 5 years ago

      Just a little over a month ago, the 780 Ti was sitting at the top of the heap. In comes the 970, at a cheaper price point, beats out the 780 Ti in some if not the majority of benchmarks. And you seriously think you’re making a mistake in buying a 970?

      1st world problems….

    • ewitte
    • 5 years ago

    Upgraded from a 770 to a 970 and I’m usually seeing at least 30÷ fps increase at 4k. With my settings in tomb raider for instance the in game bench went from 44 to 63. Tried on my 1440p screen and it was like 127 😀

    • spugm1r3
    • 5 years ago

    You know, working in engineering, I’m always impressed when someone understands the technical aspect of their job to such a degree, they can roll off a pretty hefty subject as if they were talking about last night’s outing.

    I tend to speak a more colloquial style of explanation, usually including phrases like “bad news bears” and “pretty rad”. Then again, I explain engineering concepts to sales and marketing guys, so it fits.

    • itachi
    • 5 years ago

    Nice ! I speak french from swiss and everytime I looked at Hardware.fr reviews I was surprised by the in depth aspect of the reviews, Damien really know his stuff, good job, bien joué Damien :).

    • TwoEars
    • 5 years ago

    Hmm, I wonder how this ties in to SLI and high-fps gaming, say 100-144hz. Would there be a notable difference in favor of the 980?

    But I agree – the 970 is a price/performance monster no matter what.

      • JustAnEngineer
      • 5 years ago

      Newegg is already raising prices as the cards stay nearly-perpetually out of stock.

    • Anovoca
    • 5 years ago

    Thanks for the update. I am curious though, since I know little about circuitry, when you mention they disable SMs, is that something that a more savoy techie than myself could crack open and re-enable?. From what I got out of what I read, the components are all there, they just aren’t being utilized.

      • Damage
      • 5 years ago

      In some cases, the disabled bits are disabled because of yield issues. They don’t all work well. Some do, I’m sure, but then there’s the problem of figuring out how to re-enable those units. I haven’t heard too many stories of success on this front lately.

        • Prestige Worldwide
        • 5 years ago

        Many 6950s could be flashed to be a 6970 rather easily if I remember correctly.

        It depends on whether the chips are binned, with defective chips being relegated to a lower SKU with the defective bits disabled to create a functioning chip, or disabling parts of a good chip for market segmentation purposes. Both have been known to happen.

      • mczak
      • 5 years ago

      The method of choice for disabling such things permanently nowadays is usually on-chip fuses rather than just card bios or driver. There’s no way to reverse this.
      (Note that sometimes for various reasons disabling units might be done by card bios only for instance like in the case of early HD 6950 in which case flashing a different bios can “fix” this however this is not really the norm – early batches of new hardware though generally probably have a higher chance of this since it is possible the decision how many units to switch off was not actually known early enough to fuse them off properly.)

    • XTF
    • 5 years ago

    [quote<]That means, among other things, that I need to build a much more complicated spreadsheet for figuring these things out.[/quote<] Or you could forget about such theoretical peaks and run (more) real-world benchmarks instead. 😉

      • UberGerbil
      • 5 years ago

      It’s always interesting (to me, at least) to see how efficient the design is — ie, how close it gets in practice to reaching its theoretical ceiling.

    • USAFTW
    • 5 years ago

    Damien Triolet is a genius. I have huge respect for him. He’s the one who originally investigated Crysis 2 uber-tesselation issue.

    • jihadjoe
    • 5 years ago

    Explanation makes sense. ROPs would be useless unless there’s something to feed them, which in this case turned out to be the SMs. Either way having 64 ROPS still gives mid Maxwell a huge advantage even over big Kepler.

    Kinda makes you wonder though why Nvidia didn’t just disable 12 of the ROPS in the GTX970 since they would be effectively useless anyway. I guess yields are high enough that they don’t have to worry about it.

      • mczak
      • 5 years ago

      Disabling ROPs like that is impossible with Nvidia’s design, since they are attached to the memory controllers (kepler/fermi – 8 rop per 64bit MC, maxwell – either 8 (gm107) or 16 (gm204) rop per 64bit MC). Thus, disabling ROPs means disabling memory channels (which nvidia does on lots of keplers).
      FWIW having 12 (out of 64) “excess” ROPs isn’t all that bad, some Keplers have way more excess ROPs. For instance gk107 (gtx 650) has 16 ROPs, but shader export is limited to 8 pixels max (but as Damien said, such ROPs may still be useful).

    • limitedaccess
    • 5 years ago

    How does this translate into performance differences for non synthetic fill rate tests however?

    In past reviews of GPUs using two different die configurations (such as the GTX 780) reviewers, not specific to this site but in general, all note that the different rasterizer count should not actual game performance even if it does affect fill rate tests.

    Does this mean that those claims need to be re-examined since this article now theorizes that it could be performance impacts? I was actually surprised that, as far as I know anyways, no hardware reviewers actually examined that issue in depth. And also if there are added complications (such as frame latency issues) if two GPUs using different configurations are used in SLI?

    • trinibwoy
    • 5 years ago

    Nice, thanks for highlighting this. Damien brought this to light a few years ago but the word has taken a long time to spread.

    The number of SM’s also influences geometry performance as they’re responsible for certain primitive setup tasks.

    • auxy
    • 5 years ago

    This is fascinating! I have thought for some time now that the pixel fill of the Geforce parts was overstated, but it was just a feeling; I couldn’t justify saying so. Now it feels great to be vindicated! ( `ー´)ノ

      • I.S.T.
      • 5 years ago

      I don’t think you really understand what they were saying… The fillrate was more miss measured by non-Nvidia folks because there’s three components to it rather than just one, which everybody else assumed. The GTX 980 on the other hand? Because it’s not partially disabled like the GTX 970, the fillrate measured is closer to the practical peak.

    • ImSpartacus
    • 5 years ago

    Always good to see some humility and some additional education for the audience. I love it.

    • Meadows
    • 5 years ago

    So, would this mean (in simple terms) that both cards are equally good at AA modes and effects and the works, but the 980 should do better at high megapixel counts in general, because that’s a separate issue to the GPU?

    Would this then also mean that, for example, DSR performance could suffer on the 970?

    • hansmuff
    • 5 years ago

    Always very nice to see sites play together. Props to the hardware.fr guys!

    • weaktoss
    • 5 years ago

    I followed the link to Hardware.fr–it’s interesting to read tech terms in other languages. The non-idiomatic literal translations end up sounding strange, even though they’re really not much weirder than the English terms to begin with.

    processeurs à 2 cœurs – “processors with two hearts” (dual core processors)

    Makes total sense, but still sounds so alien. Maybe if I had more French than just forgotten high school courses, it wouldn’t feel so weird.

      • UberGerbil
      • 5 years ago

      My favorite is Icelandic, where they try to use existing Icelandic (or Old Norse) words that have fallen out of use or can be adapted. For example: computer is “tölva,” a portmanteau composed from the words “tala” (number) and “völva” (prophetess).

      (If you want to have some fun, try using [url=https://translate.google.com/?ie=UTF-8&hl=en&client=tw-ob#auto/is/computer%20processor<]Google Translate[/url<] to speak some of these Icelandic terms for you).

        • odizzido
        • 5 years ago

        Number prophet? I like it.

          • UberGerbil
          • 5 years ago

          I know, right? Sometimes I like to pause with my finger above the power button, and intone
          “Oh prophet of numbers, oh seer of digits, conjure for me, something prodigious!”
          {boot screen}

          And when something crashes, you can yell “You blind prophet [i<]bitch[/i<]!" I really wish I could learn some Icelandic swear words, but I could never pronounce them.

            • spugm1r3
            • 5 years ago

            I laughed pretty hard at this.

            • ronch
            • 5 years ago

            I once asked my Prophet of Numbers to give me some lottery numbers, and in return I promised to enshrine it if the numbers came up. Not sure it made the right prophecy thoigh, or at least the prophecy hasn’t been fulfilled yet.

        • Wirko
        • 5 years ago

        If you happen to speak Swedish, [url=https://www.acc.umu.se/~widmark/java/chock/idiom.html<]here's[/url<] a short dictionary of idioms you may find interesting.

        • Alexko
        • 5 years ago

        So Java is developed by Oracle using number prophetesses, eh?

      • tipoo
      • 5 years ago

      I must admit, Exynos 5 Eight-Hearts sounds pretty epic. Final monster in a game.

      • Meadows
      • 5 years ago

      I finally understand the ASUS slogan.

        • derFunkenstein
        • 5 years ago

        Heart-touching, indeed.

          • l33t-g4m3r
          • 5 years ago

          Surgery Simulator FTW!

      • Wirko
      • 5 years ago

      In Spanish it’s procesador de doble núcleo. Sounds nuclear but still makes total sense.

    • AssBall
    • 5 years ago

    Good deal, and thanks for the update. The results on fill rate were confusingly high. Glad there is TR (and friends in France) to help clear the muddy waters of GPU performance.

      • JustAnEngineer
      • 5 years ago

      This is a good explanation to help understand the differences in these GPUs that NVidia’s evil marketing geniuses work so hard to obfuscate.

        • Buzzard44
        • 5 years ago

        Hmm…based strictly on observation, I don’t think many marketing geniuses get to that level of technical detail.

        Usually they’d come up with something like:

        “Super high-def 4K blazing fast support! More than 50x faster than the competition!*”

        <super small print> Compared against an AMD Radeon 5450 </super small print>

          • UnfriendlyFire
          • 5 years ago

          Or sometimes they outright lie:

          78% color accuracy!*

          *No mentions about yellow being rendered as mustard-green due to the Pentile display’s inherent drawback

Pin It on Pinterest

Share This