Testing Turing content-adaptive shading with Wolfenstein II: The New Colossus

Nvidia’s RTX ray-tracing acceleration stack might be getting the lion’s share of press now that Battlefield V has helped debut the technology, but the Turing architecture is packed full of other fun tricks to help developers optimize the rendering performance of their applications. One of these is called variable-rate shading, or VRS. This Turing capability allows graphics cards built with that architecture to put their shading power where it’s needed most in a frame, rather than shading every pixel naively. That naive approach might result in a graphics card “wasting” compute resources to arrive at the same result for each of a broad swath of uniformly colored pixels, for just one example, or working to shade a finely-detailed region of the screen that’s then blurred by motion.

An illustration of the pixel groupings possible with Turing VRS, and how they might be applied to a scene. Source: Nvidia

To let shader power go where it’s needed most, VRS allows developers to subdivide a 3D application’s screen space into a grid of 16×16 pixel regions (among other possible division techniques, ranging from as fine as a per-triangle division or, more logically, per-object). According to Nvidia’s SIGGRAPH presentation on the topic, each of those 16×16-pixel regions can, in turn, have one of several shading rates applied to it, from the naive 1×1 grid that would typically be used to shade every pixel in a scene today, to 2×2 groups of pixels for 1/4-rate shading, all the way out to a quite-coarse set of 4×4 grids for 1/16-rate shading. According to that presentation, then, the parts of a scene that need the most detail can get it via 1×1 shading, while those that are the most uniform or the most likely to be in motion can be shaded at the lowest rates to save shader horsepower for the parts of the scene that need it.

This approach could prove most useful for VR, where foveated rendering demands the ability to subdivide the scene into different regions of rendering resolution, but it has other applications in traditional 3D graphics, as well. VRS and the variety of subdivisions of a scene it allows for are the foundation for a technique called content-adaptive shading, or CAS. Here’s how it works, according to the Nvidia Turing architecture white paper:

In Content Adaptive Shading, shading rate is simply lowered by considering factors like spatial and temporal (across frames) color coherence. The desired shading rate for different parts of the next frame to be rendered are computed in a post-processing step at the end of the current frame. If the amount of detail in a particular region was relatively low (sky or a flat wall etc.), then the shading rate can be locally lowered in the next frame. The output of the post-process analysis is a texture specifying a shading rate per 16 x 16 tile, and this texture is used to drive shading rate in the next frame. A developer can implement content-based shading rate reduction without modifying their existing pipeline, and with only small changes to their shaders.

Wolfenstein II: The New Colossus is the first game to implement CAS, and it is in fact the poster child for how the technique’s analysis step works in the aforementioned white paper. Nvidia provides a visualized example of the result of its post-processing analysis on a frame taken from the game’s good guys’ submarine, Evas Hammer.

Source: Nvidia

In the figure above, the red squares represent the parts of the scene that the algorithm determined suitable for shading with the coarsest rates, while those with no color overlay get the finest 1×1 (or per-pixel) shading rate needed to reproduce fine detail or detail in motion. The radar and computer displays on the Evas Hammer are, from experience, rendered in motion, so it’s no surprise that they get the highest shading rates in the example above.

Source: Nvidia

Despite the differences in shading rates applied to the various parts of a given scene by this technique, Nvidia believes it produces results that are practically equivalent to naively shading every pixel on screen. Wolfenstein II: The New Colossus just got a patch that enables CAS on Turing cards, so we can both test the performance of the technique for ourselves and see how it looks in practice.  Let’s dive in.

 

Our testing methods

If you’re new to The Tech Report, we don’t benchmark games like most other sites on the web. Instead of throwing out a simple FPS average (or even average and minimum FPS figures)—numbers that tell us only the broadest strokes of what it’s like to play a game on a particular graphics card—we can go much deeper. We capture the amount of time it takes the graphics card to render each and every frame of animation before slicing and dicing those numbers with our own custom-built tools. We call this method Inside the Second, and we think it’s the industry standard for quantifying graphics performance. Accept no substitutes.

What’s more, we don’t rely on canned in-game benchmarks—routines that may not be representative of performance in actual gameplay—to gather our test data. Instead of clicking a button and getting a potentially misleading result from those pre-baked benches, we go through the laborious work of seeking out test scenarios that are typical of what one might actually encounter in a game. Thanks to our use of manual data-collection tools, we can go pretty much anywhere and test pretty much anything we want in a given title.

Most of the frame-time data you’ll see on the following pages were captured with OCAT, a software utility that uses data from the Event Timers for Windows API to tell us when critical events happen in the graphics pipeline. We perform each test run at least three times and take the median of those runs where applicable to arrive at a final result. Where OCAT didn’t suit our needs, we relied on the PresentMon utility.

We tested Wolfenstein II: The New Colossus at 3840×2160 using its “Mein Leben!” preset. The game provides fine-grained control over what it calls “Nvidia Adaptive Shading,” although we imagine most people will simply want to choose among the three presets on offer: “Balanced,” “Performance,” and “Quality.” In fact, that’s exactly what we did for our testing.

As ever, we did our best to deliver clean benchmark numbers. Each test was run at least three times, and we took the median of each result. Our test system was configured like so:

Processor Intel Core i9-9980XE
Motherboard Asus Prime X299-Deluxe II
Chipset Intel X299
Memory size 32 GB (4x 8 GB)
Memory type G.Skill Trident Z DDR4-3200
Memory timings 14-14-14-34 2T
Storage Intel 750 Series 400 GB NVMe SSD (OS)

Corsair Force LE 960 GB SATA SSD (games)

Power supply Seasonic Prime Platinum 1000 W
OS Windows 10 Pro with October 2018 Update (ver. 1809)

We used the following graphics cards for our testing, as well:

Graphics card Graphics driver Boost clock speed (nominal) Memory data rate (per pin)
Nvidia GeForce RTX 2080 Ti Founders Edition GeForce

Game Ready

416.94

1635 MHz 14 Gbps
Gigabyte GeForce RTX 2080 Gaming OC 8G 1815 MHz
Asus ROG Strix GeForce RTX 2070 O8G 1815 MHz

Thanks to Intel, Corsair, Gigabyte, G.Skill, and Asus for helping to outfit our test rigs with some of the finest hardware available. Nvidia, Gigabyte, and Asus supplied the graphics cards we used for testing, as well. Have a gander at our fine Asus motherboard before it got buried beneath a pile of graphics cards and a CPU cooler:

And a look at our spiffy Gigabyte GeForce RTX 2080, seen in the background here:

And our Asus ROG Strix GeForce RTX 2070, which just landed in the TR labs:

With those formalities out of the way, let’s get to testing.

 

Wolfenstein II: The New Colossus (3840×2160)


At least in the case of Wolfenstein II—already an incredibly well-optimized and swift-running game—content-adaptive shading provides anywhere from two to five more FPS on average for the RTX 2070, three to six more FPS on average for the RTX 2080, and one to six more FPS on average for the RTX 2080 Ti, depending on whether one chooses the quality, balanced, or performance preset in the game’s advanced options. 99th-percentile frame times correspondingly come down as a result at any preset, but the technique isn’t going to turn an RTX 2070 into an RTX 2080, for example, or an RTX 2080 into an RTX 2080 Ti.

What’s really impressive, though, is that even to my jaded graphics-reviewer eye, I saw practically no difference in image quality at 4K when moving between each preset. I would happily run the balanced or performance presets for this feature anywhere it was available. If there’s a catch to having CAS on at this resolution, I didn’t see one, and the minor increases to average frame rates and decreases in 99th-percentile frame times are appreciated when trying to get the most out of a 4K display.

For those who can see a difference in content-adaptive shading settings and want to tune the experience, Nvidia offered some details for what each of the available parameters in Wolfenstein II control and their potential effects on image quality: 

The Custom preset offers fine-tuning settings. Motion influence modifies the influence motion has on the shading rate. It depends on motion blur and TAA usage, response time of the screen, and personal preference. Higher influence can be used if TAA is turned off.

Color difference sensitivity affects the sensitivity to color differences for neighboring pixels.

The Brightness sensitivity controls the sensitivity to screen brightness. Lower values unlock more performance but reduce image quality. Technically, these parameters depend on room lightness, screen contrast and brightness, screen DPI, distance to the screen, and personal preference.


To gain further insight into the improvements content-adaptive shading offers, we can turn to some more fine-grained charts. These “time spent beyond X” graphs are meant to show “badness,” those instances where animation may be less than fluid—or at least less than perfect. The formulas behind these graphs add up the amount of time our graphics card spends beyond certain frame-time thresholds, each with an important implication for gaming smoothness. Recall that our graphics-card tests all consist of one-minute test runs and that 1000 ms equals one second to fully appreciate this data.

The 50-ms threshold is the most notable one, since it corresponds to a 20-FPS average. We figure if you’re not rendering any faster than 20 FPS, even for a moment, then the user is likely to perceive a slowdown. 33 ms correlates to 30 FPS, or a 30-Hz refresh rate. Go lower than that with vsync on, and you’re into the bad voodoo of quantization slowdowns. 16.7 ms correlates to 60 FPS, that golden mark that we’d like to achieve (or surpass) for each and every frame.

To best demonstrate the performance of these powerful graphics cards, it’s useful to look at our three strictest graphs. 8.3 ms corresponds to 120 FPS, the lower end of what we’d consider a high-refresh-rate monitor. We’ve recently begun including an even more demanding 6.94-ms mark that corresponds to the 144-Hz maximum rate typical of today’s high-refresh-rate gaming displays.

Does CAS smooth out the worst frames these cards have to cope with, as less time spent beyond each of our frame-time thresholds would suggest? Perhaps, but as we noted before, the improvements are minor. At the 16.7-ms mark, the RTX 2070 goes from about a third of a second on frames that would drop its instantaneous rate below 60 FPS to about one-tenth of a second with the performance preset. That’s an improvement, to be sure, but it’s going from a minor concern to even less of a concern.

Flip to the 11.1-ms mark, and the improvements that CAS provides become somewhat more evident. First off, it’s always worth having CAS on, even at the quality preset, since its performance is simply better than the baseline of no CAS at all. From there, though, the deltas in the time our graphics cards spend past 11.1 ms on tough frames are hardly large. From CAS off to CAS’ performance preset, the RTX 2070 saves a little over two seconds of our one-minute test run. The RTX 2080 shaves off just under three seconds, and the RTX 2080 Ti goes from barely any time spent past this post to, well, even less time. We see similar small improvements for all cards at the 8.3-ms threshold, though the improvements are more meaningful for the RTX 2080 Ti. Welcome improvements, to be sure, but nothing game-changing.

Overall, content-adaptive shading is another intriguing Turing technology that seems to be in its infancy. All three of the Turing cards we have on the bench so far aren’t lacking for shader power, and they’re plenty capable of running Wolfenstein II at impressive frame rates even at 4K with maximum settings to start with. We’re curious what CAS might do for potential lower-end Turing cards as a result of this testing, but for now, the tech is simply making great performance a little bit better. If you haven’t played Wolfenstein II through yet, or at all, on a Turing card, you can leave CAS enabled without any worries and enjoy its minor performance-improving benefits.

Comments closed
    • Mr Bill
    • 11 months ago

    Turing Power!, shaders in a half shell!

    • ara
    • 11 months ago

    I found this article interesting. Thanks.

    • cegras
    • 11 months ago

    Jeff, do you have data for CPU overhead? i.e. CPU utilization for CAS off, balanced, performance, and quality?

      • synthtel2
      • 11 months ago

      I’ll eat my hat if it isn’t effectively free.

      • Jeff Kampman
      • 11 months ago

      To my understanding, the CPU should have minimal involvement with this technique. As I noted in the article, the data used to drive CAS comes from a post-processing step on the GPU that mines information from the previous frame to determine the optimal shading rate for areas of pixels in the next frame. The CPU most likely isn’t involved in any of that bookkeeping.

        • cegras
        • 11 months ago

        I see that paragraph now, thanks for pointing that out. Curious that it is done on the GPU, but needs to have some awareness of a particular game. Can you speculate why?

          • Jeff Kampman
          • 11 months ago

          Why is it curious? That’s where all the frame data is…

            • cegras
            • 11 months ago

            Edit: From rereading the article, I guess that the game engine needs to be updated to make the correct calls to the driver, which call a baked in hardware functionality in Turing?

            • Srsly_Bro
            • 11 months ago

            Good to see you in the comments, Jeff. Great tone in your reply.

    • techguy
    • 11 months ago

    I’m as much of a fan of eye candy as the next guy, but if we’re being honest what we need most right now in graphics is more performance, not higher quality pixels. Therefore, I am wholeheartedly in favor of techniques such as this one, and DLSS. RTX is nice, but it’s not the reason I bought a 2080 Ti – it’s the raw performance.

    • Jeff Kampman
    • 11 months ago

    I’ve added some detail to the third page from Nvidia regarding what the various sliders exposed for CAS do in [i<]Wolfenstein II[/i<] for the curious.

    • sweatshopking
    • 11 months ago

    Great job

    • jensend
    • 11 months ago

    This kind of thing keeps popping up in recent reviews (e.g. [url=https://techreport.com/discussion/34105/nvidia-geforce-rtx-2080-ti-graphics-card-reviewed?post=1090921<]DLSS[/url<]):[quote<]What's really impressive, though, is that even to my jaded graphics-reviewer eye, I saw practically no difference in image quality at 4K when moving between each preset.[/quote<] That's not really impressive anymore, sorry. At 4K and normal viewing angles, we are at the limits of human angular resolving power. Fine distinctions and diminishing returns dominate at the limits of human visual acuity, and so questions of image quality become more subtle. If a graphics card manufacturer started faking 4K by rendering at 1440p and using cheap intelligent upscaling, you probably wouldn't easily notice the difference in FPS gameplay. That doesn't by itself mean we should be willing to have performance comparisons where one competitor is rendering less than half the pixel count. Answering questions about whether something is an optimization which can be fairly compared to others' methods or a "Quack" will take careful thought about objective procedures for image quality comparison. "Does it pass the laugh test for non-blind in-game knee-jerk responses" is not going to be enough any more.

      • jensend
      • 11 months ago

      Plenty of downvotes, but no one with any actual reasoning about why one would expect to be able to tell image quality impacts without careful testing.

      The Rayleigh criterion tells us that even if your eyes were somehow optically perfect you can’t resolve better than half an arcminute, just as a matter of the size of your pupil. Real world biology means more like .7 arcminutes at best. 4K at a 40 degree view (IMAX immersive recommendation) is already higher resolution than that.

      When you’re at your perceptual limits, the scene is in motion, and you’re putting effort into a gaming task, there’s no way you’re going to be able to discern many quality changes with any kind of reliability just by going by your gut and no testing protocol. [i<]It's not surprising in the least that at 4K manufacturers can do all kinds of tweaks and not get noticed that easily, nor is it really impressive.[/i<] It takes good objective measures AND careful (double-blind etc) subjective testing to figure out the impact of differences. And should the facts from such testing all be available, it'd still leave more discussion about what comparisons are fair for performance testing.

      • jessterman21
      • 11 months ago

      I think you’re absolutely right – and unfortunately as I’m getting into my mid-30s my eyesight is already declining… (or could be I’m waiting until 10pm to play anything anymore). This sort of thing is fantastic and I think long overdue in PC games.

      My only concern with cutting corners like this is any noticeable blockiness with textures or shadows.

      • Jeff Kampman
      • 11 months ago

      I’m not really even sure what your complaint is, and if anything, it seems to reinforce the notion that we can put shader power to better use than shading every single pixel at native resolution on high-density displays. Computer graphics are rooted in hard math but they are put to use in service of art and human perception, and both of those problem spaces are squishy because every human is different and the needs for every work of art are different.

      If the results of an optimization are under the threshold for just noticeable difference or whatever, then I am perfectly fine with leaving it on in exchange for higher performance. This isn’t just me blowing smoke, either; I have better than 20/20 eyesight and my work at TR basically involves staring at screens all day and worrying about the inherent minutiae thereof.

      You seem to be annoyed that I’m not using some objective measurement of image quality like PSNR but there’s good reason for that; there was a similar discussion about the image quality offered by DLSS when the feature first became available and my own research suggested that objective measurements like PSNR correlate poorly with perceived image quality. If there’s an objective measurement that does correlate well with perceived IQ that we can use, then I’m open to it, of course, but if such a thing exists I haven’t found it yet.

      It is a repeated theme of discussions with the industry that scaling a graphics processor to the point where we can drive sufficiently high-density displays for VR headsets, as just one example, is not going to be practical if that shader horsepower has to shade every pixel equally. Power demands and silicon scaling are just not going to tolerate it.

      If companies can come up with ways of “cheating” that can account for the limitations of the human visual system rather than throwing more hardware at the problem, then I am all for it. Barring a massive leap in silicon or materials scaling that lets developers throw more hardware at a problem, we are just going to have to get smarter about using what we have and VRS/CAS is fascinating for that reason.

        • Klimax
        • 11 months ago

        SSIM.

        • jensend
        • 11 months ago

        Briefly: saying “I couldn’t tell a difference at first glance during gameplay” is not enough for dealing with today’s high end. In choosing 4K over 1440p, high-end gamers are showing that they’re willing to spend [i<]twice as much money[/i<] for an improvement in quality that one may not readily notice at first glance during gameplay. Me personally, I'll let people with deeper pockets play around in this realm of greatly diminished returns; I'll stick with lower resolutions, quality sliders that aren't set to "MAXIMUM PLACEBO," and a much smaller hardware budget. And I certainly agree that there should be ways to render exceptionally high quality scenes without naive brute force. But it's just not "really impressive" that companies can find ways to do less work at 4K without visually obvious issues. If people are dumping huge budgets on small increases in visual fidelity, it'd be important to get clearer verification that they're not bargaining away that precious little visual fidelity increase in exchange for single digit performance gains. Perceptual testing can go a lot further than a non-blind basic reaction. In the realm of audio, where people hit limitations of human perception much earlier, people have been very careful with double blind ABX testing, well-founded rating methods like ABC/HR, etc. Objective metrics aren't a panacea, but people like video codec developers end up relying on them heavily. If you find ten things to change about your codec, each of which has effects below the noticeable difference threshold but improves the metric, putting them all together may well make a subjectively clear improvement. The reverse would hold for work-saving "optimizations." Yes, everyone's known PSNR isn't a good visual metric for quite some time now. Klimax mentioned SSIM; it's better but not tons better. You could try PSNR-HVS-M (despite the name similarity it's a very different metric) or VMAF or VQM.

        • ptsant
        • 11 months ago

        Nice answer Jeff. I think a nice complement to your judgement, since every human is different, is to provide paired images on/off so that all readers can make their own mind. Just like you did with the RTX videos.

        It’s also worth finding and highlighting the actual differences so that we can see to what extent this changes things. Like, zoom into different areas at high magnification and compare pixel by pixel. We know there are changes, but we’d like to see them.

        There is no “gold standard” way of doing this, but giving more info and images to the readers is certainly a good idea.

          • psuedonymous
          • 11 months ago

          I think moving images will be necessary to show the difference between physically accurate rendering and screen-space effects (particularly reflections) where there will be noticeable warping as viewpoint changes.

        • wuzelwazel
        • 11 months ago

        A good alternate to PSNR would be SSIM:

        “The difference with respect to other techniques mentioned previously such as MSE or PSNR is that these approaches estimate absolute errors; on the other hand, SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms.”

        [url<]https://en.wikipedia.org/wiki/Structural_similarity[/url<]

      • cegras
      • 11 months ago

      Ultimately what devs are concerned about is human perception though. If a human cannot reliable differentiate between two settings in blind trial, then all the better.

    • DPete27
    • 11 months ago

    It’s a good idea. I’ll give it that. But one has to question whether it’s worth it to the devs and Nvidia for only a ~6% improvement.

    This, like many/all of Nvidia and AMD proprietary pet projects has to be shouldered almost entirely by the GPU manufacturer. I have to imagine devs aren’t breaking down the doors to sink tons of man hours into learning and properly utilizing these new features, only for them to be realized by half (even less than that here since we’re only talking Turing cards) of the [b<]PC GAMING[/b<] market (not consoles for Nvidia stuff). I certainly agree that stuff like VR rendering efficiency, adaptive shading, and ray tracing to name a few are meh at this point. But the industry has to start somewhere to make progress. Poo poo it all you want, but these are costly gambles that AMD and Nvidia are taking all the time in the effort to improve gaming for us. At least have some deeper appreciation for that.

      • Jeff Kampman
      • 11 months ago

      FWIW, it’s apparently not a major change to a game to implement support for this feature.

    • Kretschmer
    • 11 months ago

    I’m smiling at how well my founder’s edition 1080Ti has aged. Thank you early adopters for funding development of the technologies that I’ll buy into in late 2020!

    • Kretschmer
    • 11 months ago

    No frame rate imaginable would get me to finish such an unfun game. Can we use RTX cores to improve mechanics and remove content repetition?

    • RickyTick
    • 11 months ago

    FWIW…I just happen to be playing Wolfenstein II at this time. Been at it off and on for a couple of weeks. I have a GTX1070 playing on a 27″ 2560×1440 with G-Sync and it’s a beautiful experience and buttery smooth at max settings. Nothing to complain about and no reason to upgrade.

    • Chrispy_
    • 11 months ago

    Sounds like a nice “free” performance boost with no negative consequences, right?

    Actually yes. Several rather significant downsides!

    [list<][*<]You're paying for the tensor cores with your money, These RTX cards really aren't cheap![/*<][*<]Using the tensor cores increases power consumption and you're only getting a 6% improvement at best[/*<][*<]The tensor cores waste die area that should have been used for another couple of GPCs (an extra 1536 CUDA cores)[/*<][/list<]

      • chuckula
      • 11 months ago

      Ok so assuming AMD had ever actually turned on those mythical “primitive shaders” that are supposedly lurking in Vega you would have posted an attack on AMD for the unnecessary cost of using HBM on Vega when they could have used regular GDDR memory?

        • Chrispy_
        • 11 months ago

        I believe I did, actually.
        Thanks for reminding me that I’ve been vindicated!

          • chuckula
          • 11 months ago

          This feature has nothing to do with RTX.

          Look I get it: You discovered that an easy way to get upthumbs is to demonize RTX.
          But guess what: RTX or no RTX, AMD has no answer for Turing, and if you are naive enough to think that Nvidia can’t drop the prices on these parts at some point deep in 2019 when the “miracle” of Navi finally comes out you are in for a disappointment.

            • Spunjji
            • 11 months ago

            “You discovered that an easy way to get upthumbs is to demonize RTX”

            Evidence is in absence. He’s currently at -4!

            • chuckula
            • 11 months ago

            Only because he overplayed his hand and did a copy-n-paste from other articles without even bothering to read the first paragraph of this article where Kampman expressly points out that variable rate shading has nothing to do with ray tracing.

            • Chrispy_
            • 11 months ago

            Honestly, that’s one of the lowest, laziest and worst attempts at strawmanning I’ve seen from you in a while – you’re normally much better than this (do you need a hug?) Why are you bringing raytracing into this discussion? You’re the first one to mention it and it’s irrelevant to anything we’re talking about here.

            I know it’s hard because the tensor cores are in a product named “RTX” but if you’re struggling with that you’re going to really struggle with Turing block diagrams and the rest of the reading comprehension required for discussions on this site!

            • Jeff Kampman
            • 11 months ago

            To be fair, VRS has nothing to do with tensor cores, either.

            • Chrispy_
            • 11 months ago

            Oh, that’s worth knowing.

            You quote Nvidia as saying that it’s a Turing feature, but if Maxwell and Pascal can handle multi-res foveated shading, could this be something they will get with driver updates?

            • Jeff Kampman
            • 11 months ago

            No, because this fine-grained control over shading rates (per polygon, per object, or per 16×16 grid) is a feature of the Turing microarchitecture, not a software thing that can be backported. Pascal and Maxwell could assume fixed screen regions for performing MRS, if I remember the relevant section of the Turing white paper properly, but they don’t have the capability to perform this arbitrary subdivision of screen/object/whatever space.

            • derFunkenstein
            • 11 months ago

            Even if they could, no. Never.

            • Chrispy_
            • 11 months ago

            Yeah, not sure which comments of mine he thinks are popular. I’m pretty sure my comments calling out the RTX cards [i<]aren't[/i<] popular. As for demonizing RTX, I'm merely re-iterating existing information - all of which is either in Nvidia's official Turing architectural presentation slides or is empirical performance data reported by TR and other sites. There's no need for me to have an opinion or rely on anything other than cold, hard facts.

            • sweatshopking
            • 11 months ago

            You’re strawmanning here. I don’t care either way, but you’re not engaging with his point, you’re talking about amd. Again.

            • Chrispy_
            • 11 months ago

            Screw AMD! They’ve dropped the ball these last few generations – Polaris was an adequate answer to GP106 but that’s about it and with their focus on Ryzen, GPU plans have been sidelined whilst Nvidia keeps on bringing new products to market.

            Realistically, Nvidia are competing with themselves and when you [url=https://devblogs.nvidia.com/wp-content/uploads/2018/09/image11-601×1024.jpg<]look at the block diagram[/url<] (which is scaled proportionally to match the raw die shots) you can see that the tensor cores take up as much die area as the FP32 block in [i<]each[/i<] SM, or to look at it another way, the tensor cores account for about 40% of the space that CUDA cores use. In other words, you could get (100/60=1.)66% more CUDA cores on an identical product without tensor cores. Let me put that in simple terms here: RTX 2070 would have 3840 CUDA cores instead of 2304 RTX 2070 would have 4928 CUDA cores instead of 2944 RTX 2070 would have 7232 CUDA cores instead of 4352 Those tensor cores mean that these HUGE dies have far fewer CUDA cores than they otherwise could, and I'm not sure why your brought up RTX features because I certainly didn't.

            • techguy
            • 11 months ago

            I don’t know that block diagrams are meant to be a 1:1 representation of die area 😉

            • Chrispy_
            • 11 months ago

            Historically they always have been; they’re often overlaid onto die shots by people trying to guess core counts and things before Nvidia releases anything concrete and I’ve watched at least one Nvidia video showcase where they faded the block diagram in on top of the die, block by block with explainations as they went along.

            • techguy
            • 11 months ago

            I think your assumption would be safe to make were this particular block diagram in fact overlaid onto the die, but it isn’t.

            • stefem
            • 11 months ago

            Since when block diagrams reflects the actual proportion on the die?

            • rechicero
            • 11 months ago

            I’m mostly a lurker, but your obsession with AMD is becoming worrying (it was always boring and tiresome).

            It might be a good idea that you look for help, really.

      • Krogoth
      • 11 months ago

      Tensor cores are part of the Volta architecture. Turing is really just a modified Volta for graphical usage patterns.

      Removing the tensor cores would require a complete overhaul. Nvidia doesn’t want to the spend the time and captial for a completely separate design.

      • Durante
      • 11 months ago

      Reading this absolutely worthless fanboy drivel as the first comment under a techreport article is disheartening.

      I’d have thought the audience here would generally be savvy enough to know that variable rate shading has absolutely nothing to do with tensor cores. It’s also a really neat feature, especially looking towards e.g. large FoV VR use cases.

    • chuckula
    • 11 months ago

    Two points Jeff:

    1. Thanks for getting these numbers out so fast.
    2. You tested witha 9980XE??? Niiice! If a little on the overkill side for running Wolfenstein.

      • morphine
      • 11 months ago

      It’s Wolfenstein/Doom. There’s no kill like overkill.

      • Srsly_Bro
      • 11 months ago

      Yes, but why not also the 7980xe? Lmao

    • SecretMaster
    • 11 months ago

    Hi Jeff,

    Did you by chance also look at power consumption with adaptive shading? My naive sense of what’s going on leads me to believe that the GPU wouldn’t be working as intensively, and this could lead to a reduction in power draw/temperature.

      • Goty
      • 11 months ago

      This would probably be true for a locked framerate, but it’s unlikely to be the case when the card will just crank out more frames in the same period of time because it’s having to do less shading work.

        • psuedonymous
        • 11 months ago

        Only if performance is shader-bound. If the limiting factor is elsewhere in the pipeline, reducing shader load will reduce overall power.

          • Firestarter
          • 11 months ago

          and/or increase performance, due to higher GPU clocks

          • Goty
          • 11 months ago

          I wonder how common those scenarios are these days.

    • chuckula
    • 11 months ago

    Looks like a similar performance boost compared to when DX12 got introduced relative to DX11.

    Since DX12 was of course a miraculous new technology invented solely by AMD, I’ll of course be even handed and say that this is ALSO a miraculous new technology. That was invented solely by AMD and stolen by Nvidia.

    • Krogoth
    • 11 months ago

    Turing architecture is Nvidia’s first major departure from Fermi. It has plenty of potential, but it’ll take a while for developers to switch over.

    Honestly, the only reason people are gripping about Turing is because of current price points.

      • K-L-Waster
      • 11 months ago

      [quote<]Honestly, the only reason people are gripping about Turing is because of current price points.[/quote<] For some of them that's true (and fair play -- Turing is expensive for what it offers). Others however are indulging in their reflexive "It's NGREEDIA, they're BAAADDDDD!!!" response...

    • rutra80
    • 11 months ago

    Here too it would be interesting to know power efficiency difference between CAS on & off.

      • Spunjji
      • 11 months ago

      Unless you suddenly hit a CPU limit or a maximum framerate, none.

    • rutra80
    • 11 months ago

    So it’s a kind of lossy compression of processing power and/or memory bandwidth. It kinda proves that we’re not yet ready for full blown 4k or VR. You can get a 10MP photo and save it as 0% quality JPG and it will look much worse than 2MP lossless photo. I would resort to streaming if I couldn’t have lossless realtime rendering.
    Also, previous frames dependency introduces input lag which yet again drags it down to streaming level.

      • meerkt
      • 11 months ago

      On the other hand, in a 100% quality JPEG you can’t tell the difference even switching forth and back between lossy/lossless. And still, bytes are saved.

        • Chrispy_
        • 11 months ago

        The 100% or 95% JPEG quality analogy is what this feels like.

        Jeff said he couldn’t tell the difference, and I he even saw it moving where motion artifacts would be visinle that you can’t see in a screenshot.

        • ptsant
        • 11 months ago

        You can tell the difference if you start applying filters and stuff. Processing highlights error in the least significant bits, that may not be normally detectable to the naked eye. There is a reason why pros don’t use it.

          • K-L-Waster
          • 11 months ago

          That’s tangential in this case though, since typically gamers don’t apply post-processing to their games. In this case, it’s “X% JPEG and the result is your final image”, not “X% JPEG and let’s apply [i<]n[/i<] more filters to it".

            • ptsant
            • 11 months ago

            Depends on whether there is downstream post-processing (like FXAA) applied after shading. But I agree that it’s not a major concern. Numerical stability could be an issue for very demanding applications, but not in games.

          • meerkt
          • 11 months ago

          On q=100% JPEGs with 4:4:4 the difference is something like random 1%-off noise, so invisible in all conditions.

      • synthtel2
      • 11 months ago

      This won’t increase latency. A frame needing something from the previous frame is messy for AFR multi-GPU, but typical, fast, and uninteresting in every other context.

        • meerkt
        • 11 months ago

        Thinking about it some more… If pixel reuse from previous frames can be scaled even slightly toward what video compression does, that could be a major performance or quality boost. 1% compression in hi-quality video is common.

          • stefem
          • 11 months ago

          You mean something like Texture Space Shading?

          [url<]https://devblogs.nvidia.com/texture-space-shading/[/url<]

            • meerkt
            • 11 months ago

            I was thinking final screen-space, and reuse not only in terms of shaders.
            Seems like TSS is for shader reuse, and to help with shader aliasing?

    • mad_one
    • 11 months ago

    Good to see some more experimentation on the many little (and not so little) new features introduced with Turing. The main question is how many games will adapt these features.

    If they are commonly implemented, Turing may age noticably better than Pascal (though it will definitely run into memory problems), but unless there are higher volume models forthcoming, developers may not bother.

    • USAFTW
    • 11 months ago

    I wonder how everyone else thinks, but I can’t help but imagine just how much better Turing would perform across the board if they didn’t spend all of that die space (let alone research and development) on RTX and Tensor cores.
    Particularly, I look at TU106. Judging by the codename, it’s supposed to be a successor to the GTX 1060 with its GP106, but it’s actually around 41% bigger than the GP104, which it performs similarly to. It’s almost the same size as the GP102 for crying out loud! If Nvidia stripped out all the Tensor and RTX cores from it, it would probably still compete with the GTX 1080 well, if not better, but it would cost way less to manufacture.

    • null0byte
    • 11 months ago

    What’s up with the 8.3ms graph? It’s showing dramatically less time spent beyond than the 11.1ms graph.

      • Captain Ned
      • 11 months ago

      Looking at the graphs on either side of these measures, I think they just got swapped (and correspondingly mistitled) when Jeff was typing this up.

      • Jeff Kampman
      • 11 months ago

      Sorry, error in our Excel sheet. Thankfully the analysis was consistent…

    • ptsant
    • 11 months ago

    AMD has published a whitepaper on variable rate shading in 2016 but nothing has happened since then (http://developer.amd.com/wordpress/media/2013/12/TexelShading_EG2016_AuthorVersion.pdf ).

    Seems like a very interesting technique, especially for VR and foveated rendering.

    It’s a pity that nvidia only applies this to cards that are already very powerful. Would make much more sense to gain fps in low/mid end cards.

      • dragosmp
      • 11 months ago

      I was thinking the same, how cool would be to have this on my APU.

      It’s not impossible for the guy having done the research for AMD just went to work for Nvidia, got the money to implement and here we are.

      • rutra80
      • 11 months ago

      It’s few fps on that powerful cards, on lower end cards perhaps it would be even less effective? Unless you cut much more quality, which in turn would become noticable, so you could as well resort to lossy streaming services and play the next Farcry on integrated intel graphics at 60fps pseudo-4k.

        • ptsant
        • 11 months ago

        It won’t make a 1060 run 4k, but a 5-6% gain is welcome, even at 1080p. And, especially in cases where you are looking at the lower end of the G-sync range, a few fps more can make a world of difference.

        • chuckula
        • 11 months ago

        Why is there the unfounded assumption that this technique only works on powerful cards that were already running the game at extremely high speeds before the optimization?

        A more rational analysis would make one think that this technique will have a far greater impact on something like a future GTX 2060 that is not already over 100 fps where more efficient shading will clearly have a larger relative performance impact vs. a 2080Ti that wasn’t particularly stressed in the first place.

          • cegras
          • 11 months ago

          And how much of this performance increase would be hamstrung by the proportionately less powerful CPU? Or are you assuming that this budget GTX 2060 will be benched with a 9900K and not a 9600K?

            • synthtel2
            • 11 months ago

            Because it’s impossible to be GPU-bound with a 9600K? What?

            • cegras
            • 11 months ago

            At low enough graphics settings such that 1% FPS is 100+, it’s definitely a CPU bound problem, as Jeff notes in his recommendations on the best gaming CPUs of late 2018.

            • synthtel2
            • 11 months ago

            That’s a tautology. If you force it into being a CPU-bound problem, then yes, it’s a CPU-bound problem. Again: What?

            • cegras
            • 11 months ago

            So if the game is CPU bound, and this technique requires CPU overhead to make the load less for the GPU, will the gains be less than expected if CAS is turned on for a weak CPU?

            I’m not sure why you’re having problems understanding my question.

            • synthtel2
            • 11 months ago

            Even if it did have significant CPU overhead (which it doesn’t), your original point hinges on a 9600K + 2060 rig being very likely to be CPU-bound where a 9900K + 2080 Ti rig is very unlikely to be. That’s nonsense.

            The literal answer to your original question is simple enough – none at all. It just reads like either (charitably) it’s very tangential to the question you actually want(ed) answered, or (uncharitably) you’re just trying to bash the tech and don’t really care about the answer to any such question.

            • cegras
            • 11 months ago

            I think you’re projecting some of your insecurities onto me: ascribing weird motivations to my question and getting defensive, instead of just taking some time to understand my question?

            Put another way, BF1 has a helpful tool that graphs CPU and GPU frame times. For a mixed system where the bottleneck is clearly not in either, is the GPU frame time drop offset by a higher CPU frame time?

            Specifically, I’m interested because I’m part of a small group who seeks maximum 1% FPS on a budget. That’s why I purchased a 9600K instead of a Ryzen. In my example, I picked the 2060 because that’s the slowest card with CAS. That’s what I would have purchased if the price was right.

            • Srsly_Bro
            • 11 months ago

            I think you meant 2070. If you want maximum fps on a budget, get a used 1080Ti. RTX seems to oppose the goals of your group. A 1080Ti has much higher frames in non RTX games.

            • synthtel2
            • 11 months ago

            In order, the answers to the questions you’ve asked so far in this thread are no, mu, yes, and mu. One of the two that has a real answer was answered almost immediately by none other than Jeff Kampman, and the other is rhetorical and obvious.

            If I think someone’s actually putting some effort into a question and wants to understand, I’m happy to try to come up with a more useful answer than mu even if that’s what the literal question really deserves, but I don’t think you are and I tire of this. I’ll be back if you start asking well-formed questions we don’t all already know the answer to.

            • cegras
            • 11 months ago

            Unless numbers are provided, I don’t see where the source of your confidence lies. The white paper certainly doesn’t go into much detail, the description of VRS:

            [quote<]Without VRS, every pixel in the scene in Figure 32 would be shaded individually (the 1 x 1 blue grid case). With VRS, the pixel shading rate of triangles can vary. [b<]The developer has up to seven options to choose from for each[/b<] 16x16 pixel region, including having one shading result be used to color four pixels (2 x 2), or 16 pixels (4 x 4), or non-square footprints like 1 x 2 or 2 x 4. The colored overlay on the right side of Figure 32 shows a possible application to a frame—perhaps the car could be shaded at full rate (blue region) while the area near the car could be shaded once per four pixels (green), and the road to the left and right could be shaded once per eight pixels (yellow). Overall, with Turing’s VRS technology, a scene can be shaded with a mixture of rates varying between once per visibility sample (super-sampling) and once per sixteen visibility samples. The developer can specify shading rate spatially (using a texture) and using a per-primitive shading rate attribute. [b<]As a result, a single triangle can be shaded using multiple rates, providing the developer with fine-grained control.[/b<][/quote<] Certainly implies that the game engine has to do something to specify reduced shading in some parts of a frame. Furthermore, while it is of academic interest to completely remove bottlenecks in just the CPU or GPU, the real world situation is balanced mix where tradeoffs in frame times between either are not clear. This comes on the heels of the BFV discussion about future frame rendering, [url<]https://www.reddit.com/r/BattlefieldV/comments/9vte98/future_frame_rendering_an_explanation/[/url<] Where you can change what may have been a GPU bound problem to be mixed with both GPU/CPU hitting 100%, [i<]with a significant FPS increase. Clearly, the frame rate was not GPU bound at all.[/i<] Again, a mixed situation where reducing frame times on one side may not clearly increase 1% FPS. I have seen nothing in your replies that answers this question besides a simple assertion without proof, or anything from Jeff that isn't a quote from the whitepaper. I'm reasonably familiar with HPC, and I can understand a discussion that involves computational linear algebra. I was looking for an explanation of where the matrix operations are executed, where this differentiation happens, and what kind of operations/filters this technique uses. So either you have this knowledge and refuse to share it because my question was so vague it angered you, or you have nothing more to contribute.

            • Jeff Kampman
            • 11 months ago

            The white paper answers pretty much all of these questions for you:

            “In Content Adaptive Shading, shading rate is simply lowered by considering factors like spatial and temporal (across frames) color coherence. The desired shading rate for different parts of the next frame to be rendered are computed in a post-processing step at the end of the current frame. If the amount of detail in a particular region was relatively low (sky or a flat wall etc.), then the shading rate can be locally lowered in the next frame. The output of the post-process analysis is a texture specifying a shading rate per 16 x 16 tile, and this texture is used to drive shading rate in the next frame. A developer can implement content-based shading rate reduction without modifying their existing pipeline, and with only small changes to their shaders.”

            This is all stuff that’s happening on the graphics card, full stop.

            • cegras
            • 11 months ago

            From the quote, CAS sounds like an implementation with a few extra matrix operations done on the GPU, instructed by the CPU, whereas VRS implies that the game engine would have to specify these areas before the draw call is sent to the GPU. At this point I’m at the limit of my knowledge, and the whitepaper doesn’t provide the details I am curious about. I’ll look for other resources.

            Thank you for taking the time to answer my questions. I appreciate it!

            • cegras
            • 11 months ago

            From here:

            [url<]https://devblogs.nvidia.com/turing-variable-rate-shading-vrworks/[/url<] [quote<]App creates a regular Texture2D, the shading rate resource App calls NvAPI to create a new custom type of shading rate resource view App populates this texture with shading rate pattern App programs shading rate lookup table and enables variable shading rate mode App sets this shading rate resource view[/quote<] Which means the 'app', that lives on the CPU, is doing all of the above. [quote<]One key advantage of this programmability is that the game can keep the same shading rate pattern on the surface. [b<]It only needs to swap out the look-up table[/b<] to quickly change the aggressiveness of the coarse shading. Application requirements determine order and values filled in this look-up table.[/quote<] This information implies that that the CPU probably doesn't process or send data from RAM to VRAM, but that it simply commands the GPU to alter the Texture2D resource that lives on the GPU. I guess that in the case of CAS, even this step is automated away by the nvidia driver. If the shading rate changes quickly from frame to frame, I would expect that this technique would offer less performance, since the shading rate changes drastically frame to frame. I agree with you, but I don't think the quotes you provided from the whitepaper fully answered my question.

            • stefem
            • 11 months ago

            The point is that this technique doesn’t eat any CPU resources at all and I’m frankly unable to figure out where you draw this assumption from.

            • cegras
            • 11 months ago

            And the proof is … where …?

      • psuedonymous
      • 11 months ago

      Variable rate shading had been in use for years. The difference here is its granularity (not a fixed mask) and implementation in hardware.

        • ptsant
        • 11 months ago

        What part is in hardware?

      • DoomGuy64
      • 11 months ago

      If it works on Vega, wouldn’t it work on the Vega based APUs too?

    • Srsly_Bro
    • 11 months ago

    I was going to make a long post. Nope

    The 2070 is trash and the 2080ti is a poor waste of money. Next, pls.

    Edit: @Jeff: Nice showing of the different cards and their potential and what we as users can expect.

    • Aquilino
    • 11 months ago

    I’m liking this Meh Generation very much. It’s so easy to skip!
    Also with those prices, it’s like they’re forcing me not to buy anything.

      • NTMBK
      • 11 months ago

      Turing: The Meh Generation. Boldly going where [s<]no card [/s<]Pascal has gone before.

    • Voldenuit
    • 11 months ago

    [quote<]This Turing capability allows graphics cards built with that architecture to put their shading power where it's needed most in a frame, rather than shading every pixel naively. That naive approach might result in a graphics card "wasting" compute resources[/quote<] Should this be "natively"?

      • Jeff Kampman
      • 11 months ago

      [url<]https://wcipeg.com/wiki/Naive_algorithm[/url<]

      • Chrispy_
      • 11 months ago

      Naive does seem to be an applicable word when talking about RTX cards though!

Pin It on Pinterest

Share This