Nvidia’s RTX ray-tracing acceleration stack might be getting the lion’s share of press now that Battlefield V has helped debut the technology, but the Turing architecture is packed full of other fun tricks to help developers optimize the rendering performance of their applications. One of these is called variable-rate shading, or VRS. This Turing capability allows graphics cards built with that architecture to put their shading power where it’s needed most in a frame, rather than shading every pixel naively. That naive approach might result in a graphics card “wasting” compute resources to arrive at the same result for each of a broad swath of uniformly colored pixels, for just one example, or working to shade a finely-detailed region of the screen that’s then blurred by motion.
An illustration of the pixel groupings possible with Turing VRS, and how they might be applied to a scene. Source: Nvidia
To let shader power go where it’s needed most, VRS allows developers to subdivide a 3D application’s screen space into a grid of 16×16 pixel regions (among other possible division techniques, ranging from as fine as a per-triangle division or, more logically, per-object). According to Nvidia’s SIGGRAPH presentation on the topic, each of those 16×16-pixel regions can, in turn, have one of several shading rates applied to it, from the naive 1×1 grid that would typically be used to shade every pixel in a scene today, to 2×2 groups of pixels for 1/4-rate shading, all the way out to a quite-coarse set of 4×4 grids for 1/16-rate shading. According to that presentation, then, the parts of a scene that need the most detail can get it via 1×1 shading, while those that are the most uniform or the most likely to be in motion can be shaded at the lowest rates to save shader horsepower for the parts of the scene that need it.
This approach could prove most useful for VR, where foveated rendering demands the ability to subdivide the scene into different regions of rendering resolution, but it has other applications in traditional 3D graphics, as well. VRS and the variety of subdivisions of a scene it allows for are the foundation for a technique called content-adaptive shading, or CAS. Here’s how it works, according to the Nvidia Turing architecture white paper:
In Content Adaptive Shading, shading rate is simply lowered by considering factors like spatial and temporal (across frames) color coherence. The desired shading rate for different parts of the next frame to be rendered are computed in a post-processing step at the end of the current frame. If the amount of detail in a particular region was relatively low (sky or a flat wall etc.), then the shading rate can be locally lowered in the next frame. The output of the post-process analysis is a texture specifying a shading rate per 16 x 16 tile, and this texture is used to drive shading rate in the next frame. A developer can implement content-based shading rate reduction without modifying their existing pipeline, and with only small changes to their shaders.
Wolfenstein II: The New Colossus is the first game to implement CAS, and it is in fact the poster child for how the technique’s analysis step works in the aforementioned white paper. Nvidia provides a visualized example of the result of its post-processing analysis on a frame taken from the game’s good guys’ submarine, Evas Hammer.
In the figure above, the red squares represent the parts of the scene that the algorithm determined suitable for shading with the coarsest rates, while those with no color overlay get the finest 1×1 (or per-pixel) shading rate needed to reproduce fine detail or detail in motion. The radar and computer displays on the Evas Hammer are, from experience, rendered in motion, so it’s no surprise that they get the highest shading rates in the example above.
Despite the differences in shading rates applied to the various parts of a given scene by this technique, Nvidia believes it produces results that are practically equivalent to naively shading every pixel on screen. Wolfenstein II: The New Colossus just got a patch that enables CAS on Turing cards, so we can both test the performance of the technique for ourselves and see how it looks in practice. Let’s dive in.
Our testing methods
If you’re new to The Tech Report, we don’t benchmark games like most other sites on the web. Instead of throwing out a simple FPS average (or even average and minimum FPS figures)—numbers that tell us only the broadest strokes of what it’s like to play a game on a particular graphics card—we can go much deeper. We capture the amount of time it takes the graphics card to render each and every frame of animation before slicing and dicing those numbers with our own custom-built tools. We call this method Inside the Second, and we think it’s the industry standard for quantifying graphics performance. Accept no substitutes.
What’s more, we don’t rely on canned in-game benchmarks—routines that may not be representative of performance in actual gameplay—to gather our test data. Instead of clicking a button and getting a potentially misleading result from those pre-baked benches, we go through the laborious work of seeking out test scenarios that are typical of what one might actually encounter in a game. Thanks to our use of manual data-collection tools, we can go pretty much anywhere and test pretty much anything we want in a given title.
Most of the frame-time data you’ll see on the following pages were captured with OCAT, a software utility that uses data from the Event Timers for Windows API to tell us when critical events happen in the graphics pipeline. We perform each test run at least three times and take the median of those runs where applicable to arrive at a final result. Where OCAT didn’t suit our needs, we relied on the PresentMon utility.
We tested Wolfenstein II: The New Colossus at 3840×2160 using its “Mein Leben!” preset. The game provides fine-grained control over what it calls “Nvidia Adaptive Shading,” although we imagine most people will simply want to choose among the three presets on offer: “Balanced,” “Performance,” and “Quality.” In fact, that’s exactly what we did for our testing.
As ever, we did our best to deliver clean benchmark numbers. Each test was run at least three times, and we took the median of each result. Our test system was configured like so:
|Processor||Intel Core i9-9980XE|
|Motherboard||Asus Prime X299-Deluxe II|
|Memory size||32 GB (4x 8 GB)|
|Memory type||G.Skill Trident Z DDR4-3200|
|Memory timings||14-14-14-34 2T|
|Storage||Intel 750 Series 400 GB NVMe SSD (OS)
Corsair Force LE 960 GB SATA SSD (games)
|Power supply||Seasonic Prime Platinum 1000 W|
|OS||Windows 10 Pro with October 2018 Update (ver. 1809)|
We used the following graphics cards for our testing, as well:
|Graphics card||Graphics driver||Boost clock speed (nominal)||Memory data rate (per pin)|
|Nvidia GeForce RTX 2080 Ti Founders Edition||GeForce
|1635 MHz||14 Gbps|
|Gigabyte GeForce RTX 2080 Gaming OC 8G||1815 MHz|
|Asus ROG Strix GeForce RTX 2070 O8G||1815 MHz|
Thanks to Intel, Corsair, Gigabyte, G.Skill, and Asus for helping to outfit our test rigs with some of the finest hardware available. Nvidia, Gigabyte, and Asus supplied the graphics cards we used for testing, as well. Have a gander at our fine Asus motherboard before it got buried beneath a pile of graphics cards and a CPU cooler:
And a look at our spiffy Gigabyte GeForce RTX 2080, seen in the background here:
And our Asus ROG Strix GeForce RTX 2070, which just landed in the TR labs:
With those formalities out of the way, let’s get to testing.
Wolfenstein II: The New Colossus (3840×2160)
At least in the case of Wolfenstein II—already an incredibly well-optimized and swift-running game—content-adaptive shading provides anywhere from two to five more FPS on average for the RTX 2070, three to six more FPS on average for the RTX 2080, and one to six more FPS on average for the RTX 2080 Ti, depending on whether one chooses the quality, balanced, or performance preset in the game’s advanced options. 99th-percentile frame times correspondingly come down as a result at any preset, but the technique isn’t going to turn an RTX 2070 into an RTX 2080, for example, or an RTX 2080 into an RTX 2080 Ti.
What’s really impressive, though, is that even to my jaded graphics-reviewer eye, I saw practically no difference in image quality at 4K when moving between each preset. I would happily run the balanced or performance presets for this feature anywhere it was available. If there’s a catch to having CAS on at this resolution, I didn’t see one, and the minor increases to average frame rates and decreases in 99th-percentile frame times are appreciated when trying to get the most out of a 4K display.
For those who can see a difference in content-adaptive shading settings and want to tune the experience, Nvidia offered some details for what each of the available parameters in Wolfenstein II control and their potential effects on image quality:
The Custom preset offers fine-tuning settings. Motion influence modifies the influence motion has on the shading rate. It depends on motion blur and TAA usage, response time of the screen, and personal preference. Higher influence can be used if TAA is turned off.
Color difference sensitivity affects the sensitivity to color differences for neighboring pixels.
The Brightness sensitivity controls the sensitivity to screen brightness. Lower values unlock more performance but reduce image quality. Technically, these parameters depend on room lightness, screen contrast and brightness, screen DPI, distance to the screen, and personal preference.
To gain further insight into the improvements content-adaptive shading offers, we can turn to some more fine-grained charts. These “time spent beyond X” graphs are meant to show “badness,” those instances where animation may be less than fluid—or at least less than perfect. The formulas behind these graphs add up the amount of time our graphics card spends beyond certain frame-time thresholds, each with an important implication for gaming smoothness. Recall that our graphics-card tests all consist of one-minute test runs and that 1000 ms equals one second to fully appreciate this data.
The 50-ms threshold is the most notable one, since it corresponds to a 20-FPS average. We figure if you’re not rendering any faster than 20 FPS, even for a moment, then the user is likely to perceive a slowdown. 33 ms correlates to 30 FPS, or a 30-Hz refresh rate. Go lower than that with vsync on, and you’re into the bad voodoo of quantization slowdowns. 16.7 ms correlates to 60 FPS, that golden mark that we’d like to achieve (or surpass) for each and every frame.
To best demonstrate the performance of these powerful graphics cards, it’s useful to look at our three strictest graphs. 8.3 ms corresponds to 120 FPS, the lower end of what we’d consider a high-refresh-rate monitor. We’ve recently begun including an even more demanding 6.94-ms mark that corresponds to the 144-Hz maximum rate typical of today’s high-refresh-rate gaming displays.
Does CAS smooth out the worst frames these cards have to cope with, as less time spent beyond each of our frame-time thresholds would suggest? Perhaps, but as we noted before, the improvements are minor. At the 16.7-ms mark, the RTX 2070 goes from about a third of a second on frames that would drop its instantaneous rate below 60 FPS to about one-tenth of a second with the performance preset. That’s an improvement, to be sure, but it’s going from a minor concern to even less of a concern.
Flip to the 11.1-ms mark, and the improvements that CAS provides become somewhat more evident. First off, it’s always worth having CAS on, even at the quality preset, since its performance is simply better than the baseline of no CAS at all. From there, though, the deltas in the time our graphics cards spend past 11.1 ms on tough frames are hardly large. From CAS off to CAS’ performance preset, the RTX 2070 saves a little over two seconds of our one-minute test run. The RTX 2080 shaves off just under three seconds, and the RTX 2080 Ti goes from barely any time spent past this post to, well, even less time. We see similar small improvements for all cards at the 8.3-ms threshold, though the improvements are more meaningful for the RTX 2080 Ti. Welcome improvements, to be sure, but nothing game-changing.
Overall, content-adaptive shading is another intriguing Turing technology that seems to be in its infancy. All three of the Turing cards we have on the bench so far aren’t lacking for shader power, and they’re plenty capable of running Wolfenstein II at impressive frame rates even at 4K with maximum settings to start with. We’re curious what CAS might do for potential lower-end Turing cards as a result of this testing, but for now, the tech is simply making great performance a little bit better. If you haven’t played Wolfenstein II through yet, or at all, on a Turing card, you can leave CAS enabled without any worries and enjoy its minor performance-improving benefits.