Popping the hood on Nvidia’s Turing architecture

It’s Turing Day at TR. We’ve been hearing about the innovations inside Nvidia’s Turing GPUs for weeks, and now we can tell you a bit more about what’s inside them. Turing implements a host of new technologies that promise to reshape the PC gaming experience for many years to come. While much of the discussion around Turing has concerned the company’s hardware acceleration of real-time ray-tracing, the tensor cores on board Turing GPUs could have even more wide-ranging effects on the way we game—to say nothing of the truckload of other changes under Turing’s hood that promise better performance and greater flexibility for gaming than ever before.

A die shot of the TU102 GPU. Source: Nvidia

On top of the architectural details that we can discuss this morning, Nvidia sent over both GeForce RTX 2080 and RTX 2080 Ti cards for us to play with. As of this writing, those cards are on a FedEx truck and headed for the TR labs. Nvidia has hopped on the “unboxing embargo” bandwagon, meaning we can show you the scope of delivery of those cards later today. Performance numbers will have to wait, though. First, Nvidia is pulling back the curtain on the Turing architecture and the first implementations thereof. Let’s discuss some of the magic inside.

Despite Nvidia’s description of ray-tracing as the holy grail of computer graphics during its introduction of the Turing architecture, these graphics cards do not replace rasterization—the process of mapping 3D geometry onto a 2D plane and the way real-time graphics have been produced for decades—with ray-tracing, or the process of casting rays through a 2D plane into a 3D scene to directly model the behavior of light. Real-time ray tracing for every pixel of a scene remains prohibitively expensive, computationally speaking.

Some potential roles for ray-tracing and rasterization in hybrid rendering. Slide: Colin Barré-Brisebois, EA SEED

Instead, the company wants to continue using rasterization for the things it’s good at and add certain ray-traced effects where those techniques would produce better visual fidelity—a technique it refers to as hybrid rendering. Nvidia says rasterization is a much faster way of determining object visibility than ray-tracing, for example, so ray-tracing only needs to enter the picture for techniques where fidelity or realism is important yet difficult to achieve via rasterization, like reflections, refractions, shadows, and ambient occlusion. Nvidia notes that the traditional rasterization pipeline and the new ray-tracing pipeline can operate “simultaneously and cooperatively” in its Turing architecture.

Logical representations of the pipelines for ray-tracing and rasterization. Source: Nvidia

The software groundwork for this technique was laid earlier this year when Microsoft revealed the DirectX Raytracing API, or DXR, for DirectX 12. DXR provides access to some of the basic building blocks for ray-tracing alongside existing graphics-programming techniques, including a method of representing the 3D scene that can be traversed by the graphics card, a way to dispatch ray-tracing work to the graphics card, a series of shaders for handling the interactions of rays with the 3D scene, and a new pipeline state object for tracking what’s going on across raytracing workloads. 

The RTX platform. Source: Nvidia

Microsoft notes that DXR code can run on any DirectX 12-compatible graphics card in software as a fallback, since it behaves as a compute-like workload. That fallback method won’t be a practical way of achieving real-time ray-traced performance, though. To make DXR code practical for use in real-time rendering, Nvidia is implementing an entire platform it calls RTX that will let DXR code run on its hardware. In turn, GeForce RTX cards are the first hardware designed to serve as the foundation for real-time ray-traced effects with DXR and RTX.

 

RT cores take a load off

Real-time ray tracing has remained elusive because, to paraphrase a famous quote, ray-tracing is fast but computers are slow. Trying to figure out what triangle or triangles a ray will intersect with in a scene is extremely computationally expensive, and it can be difficult to organize scene data in a way that lets a processor exploit locality of reference while ray tracing. If a ray could interact with practically any triangle in a scene, those cases make it difficult to keep temporally-local or spatially-local data in cache. A ray might ultimately behave in a way that’s friendly to the cache, or it might not. This is not a problem that the Turing architecture necessarily seeks to solve, or can solve—it’s just a fact of life of ray-tracing.

One way developers can help to make ray-tracing more efficient, however, is through the use of an acceleration structure—an organization of geometry data that helps bundle stuff that’s spatially local and reduces the amount of work necessary when testing the objects a ray interacts with in a scene.

A general representation of a bounding volume hierarchy. Source: Schreiberx via Wikimedia Commons, CC-BY-SA 3.0

The typical way data is organized to accelerate ray-tracing is through a tree structure called a bounding volume hierarchy, or BVH. The top level of a BVH might contain one or more bounding shapes (usually, boxes) that themselves might contain further groups of subdivisions of the scene. Ultimately, the last level of the BVH tree contains triangle data. When a ray is cast, software that uses a BVH doesn’t go straight to work trying to find out which, if any, triangles that ray hits. Instead, it limits the scope of work by first testing whether the ray intersects the bounding shapes at high levels of the BVH tree and traverses each level of it, ultimately arriving at only those triangles that the ray would actually interact with before the GPU performs any further shading work.

A representation of how the Turing SM handles ray-tracing acceleration for BVH traversal and triangle intersection. Source: Nvidia

The RT core inside each Turing streaming multiprocessor (SM) makes the real-time ray-tracing portion of hybrid rendering possible by accelerating the process of BVH traversal and ray-triangle intersection testing, freeing up the shader multiprocessor to do other work.

Without the RT core in each SM, determining which bounding volumes and triangles are intersected by a given ray would require immense amounts of traditional floating-point shader power—so much so that it’s prohibitive for real-time rendering applications. For reference, Nvidia says the GTX 1080 Ti can cast 1.1 gigarays per second with its 11.3-TFLOP shader array, while the RTX 2080 Ti can cast 10 gigarays per second or more thanks to its RT cores.

A representation of how BVH traversal works in practice. Source: Nvidia

One of the challenges of maintaining bounding volume hierarchies is that the bounding volumes themselves can change as objects in the scene move, requiring the refitting of those shapes and possibly the insertion or removal of nodes from within the BVH tree. Nvidia handles initial construction and refitting of the BVH in the driver, while the actual casting of rays and the resultant shading work are handled by the developer through the DXR API.


An example of how AI denoising can increase the image quality of ray-traced scenes. Source: Nvidia

Even with the acceleration of ray-tracing operations that the RT core provides, Nvidia cautions that applications will not be able to suddenly begin casting hundreds of rays per pixel in real time. Instead, the second pillar of real-time ray tracing and hybrid rendering in Turing cards comes from denoising filters. In traditional ray tracing, the number of rays cast per pixel might need to be large in order to achieve a quality result. That’s a tradeoff that’s not necessarily amenable for attempting to integrate ray-traced effects into the real-time rendering pipeline. Fewer rays cast per pixel can result in coarse-looking noise, and noisy reflections or shadows would prove exceedingly unpleasant to the eye in photorealistic environments.

With GeForce RTX cards, the hope is that developers can cast relatively few rays per pixel before using denoising algorithms to clean up the resulting image. Denoising allows ray-traced effects with small numbers of rays cast to arrive at a result whose quality is similar to that of a scene with many more rays cast. Nvidia isn’t specific about the denoisers it’s using in its RTX platform, although the company says that it’s using both AI and non-AI denoising algorithms depending on what produces the best result for a given application. In any case, the ray-traced portion of the hybrid rendering pipeline wouldn’t be possible without the reduction in rays cast that denoising permits.

 

Tensor cores bring artificial intelligence to gaming PCs

To make AI models like denoising filters practical for use on gaming PCs, Turing cards include the tensor cores that Nvidia first unveiled as part of its Volta architecture. These cores provide accelerated processing for tensor operations, a type of matrix multiplication that’s incredibly useful and versatile for performing AI inferencing. As a refresher, Inferencing is the use of trained deep-learning models to perform a computational task.

Denoising won’t be the biggest, or even the only, application for deep learning models on Turing cards. Many game developers are hopping on the bandwagon for Nvidia’s Deep Learning Super Sampling, or DLSS, technology. Nvidia describes DLSS as a replacement for temporal anti-aliasing, a technique that combines multiple frames by determining motion vectors and using that data to sample portions of the previous frame. Nvidia notes that despite the common use of temporal AA, it remains a difficult technique for developers to effectively employ. For my part, I’ve never enjoyed the apparent blur that TAA seems to add to the edges of objects in motion.

To attack some of the limitations of TAA, Nvidia took its extensive experience using deep learning to recognize and process images and applied it to games. DLSS depends on a trained neural network that’s exposed to a large number of “ground truths,” perfect or near-perfect representations of what in-game scenes should look like via 64x supersampling. Nvidia also says it captures matching scenes to those ground truths, only rendered normally, as input. 

The DLSS neural network is then trained by taking an input scene and prompting the network for an image that it thinks matches the 64x supersampled output. Nvidia refines this output by performing back propagation on the network, adjusting the weights for each neuron using the differences between the output result and 64x ground truths as a guide.

An example of DLSS in the Epic Games Infiltrator demo. Temporal AA on top, DLSS on bottom

Once the model is sufficiently trained on those images, Turing cards can use it to render scenes “at a lower input sample count,” according to Nvidia, and then infer what the final scene should look like at its target resolution. Nvidia says DLSS offers similar image quality to TAA with half the shading work, all while avoiding blur and other unpleasant artifacts that can occur with TAA. The example above, from Epic Games’ Infiltrator demo, shows how DLSS avoids the ugliness that can result with TAA, all while delivering as much as twice the performance in terms of average frame rate at 4K resolution on the RTX 2080 Ti.

Another and perhaps more exciting possibility with DLSS is a mode called DLSS 2x. DLSS 2x allows the graphics card to render the scene at its target resolution rather than at a lower base sampling rate. The result of that rendering is then run through the DLSS network and used to approximate the output quality of the 64x super-sampled ground truth. Nvidia says this image quality would be “impossible” to produce using any other form of real-time graphics. \

A conception of how NGX services fit into the rendering path. Source: Nvidia

DLSS is just one product of what Nvidia calls the Neural Graphics Framework, or NGX. NGX provides an API to game developers that exposes several AI models, or “neural services,” to game engines for use on client PCs. Nvidia pre-trains NGX models in its own data centers and provides them to end users as “neural services” by way of GeForce Experience. On top of DLSS, Nvidia also touts the possibility of AI InPainting, a method of removing and replacing content within an image that’s computer-generated yet appears realistic. Nvidia suggests InPainting could be used to naturally replace undesirable power lines in an image, for example.

Other NGX applications include AI Slow-Mo, where a neural network inserts interpolated frames into a video to simulate slow-motion recordings of a sequence, and AI Up-Res, which can upscale an image by two, four, or eight times its original size while maintaining a more natural appearance than bicubic filtering alone.

This article remains under construction. Sorry for the mess and keep checking back for more.

Comments closed
    • BorgOvermind
    • 1 year ago

    nV can’t even do right the basic 2D graphics of their presentations.

    White text or bright green ? Epic…fail.

    Everything else is marketing, as usual.

    The only good recent thing since the 1k series is that they finally have worthy processing power in their cards.

    • maroon1
    • 1 year ago

    DLSS looks even better than native 4K (it removes jagged edges better at least), and same time it performs much better (RTX 2080 is about 2x of GTX 1080 when you use DLSS)

      • Freon
      • 1 year ago

      [quote<]DLSS looks even better than native 4K[/quote<] DLSS at what native res looks better than native 4K? And does "native 4k" mean no AA at all?

    • Ninjitsu
    • 1 year ago

    [quote<]DLSS is just one product of what Nvidia calls the Neural Graphics Framework, or NGX. NGX provides an API to game developers that exposes several AI models, or "neural services," to game engines for use on client PCs. Nvidia pre-trains NGX models in its own data centers and provides them to end users as "neural services" by way of GeForce Experience.[/quote<] eww no thanks

      • ptsant
      • 1 year ago

      That’s a very important bit. I wonder, do game developers get access to the neural processing hardware or is this restricted to pro cards?

      • Waco
      • 1 year ago

      Yeah. No Geforce Experience for me.

    • ptsant
    • 1 year ago

    So, DLSS is trained on 64x supersampling. Basically, a way of getting something close to 64x without spending so much time. Sounds nice, but I want to see side-to-side comparisons.

    • mad_one
    • 1 year ago

    So I was right about DLSS doing some form of undersampling, though it seems not to be the checkerboard rendering I expected (some sources stated that the actual rendering resolution depends on the game).

    Still, might be a good tradeoff for 4K displays, not so much for lower resolutions.

    While RT will be very interesting for the future, the tensor cores and tricks like variable rate shading are far more interesting for this generation. I’m curious what the devs will be doing with it. Some AI filter magic to speed up ambient occlusion maybe?

    • synthtel2
    • 1 year ago

    Some thoughts on skimming the whitepaper:

    * Concurrent integer execution sounds like it should actually be worth a general boost per-TFLOP over Pascal, and those are some impressive memory compression improvements.

    * The RT marketing hype in this supposedly technically-focused document is ludicrous. It is cool stuff, but it isn’t a tenth as cool as you’d think by listening to Nvidia rave about it.

    * I’m not entirely up on my raytracing implementation details, but I know enough to say that 10 Gray/sec claim needs to be taken with a whole ocean of salt. In ideal conditions, yeah, maybe. (The thumbnails on page 33 sure look like some manner of ideal conditions.) In realistic conditions, I’ll bet the real benefit is just that it’s freeing up the shaders to do other work at the same time, and even then they’ve still got to share memory bandwidth.

    * Of course the DLSS comparison shots are indistinguishable, they’ve been demolished by JPEG compression artifacts! The kind of detail that got lost is only exactly what AA differences would show up in, nothing important. Is there some non-PDF version of this whitepaper everyone except me is reading? If not, it’s worse than CEO math.

    * Mesh shading sounds potentially pretty nifty.

    * Variable rate shading sounds very nifty, especially when it comes to foveated rendering. If there’s one feature mentioned here I really want in my graphics card, this is definitely it.

      • Laykun
      • 1 year ago

      Mesh shading make so much sense at this point simply because of the inclusion of ray tracing. They need GPU side LOD generation and culling for objects being rendered with ray tracing for it to render at any decent speed on the CPU (otherwise you’d be making multiple draw calls per object for both rasterization and ray tracing).

        • synthtel2
        • 1 year ago

        I don’t think it factors into raytracing much. BVH generation looks to be strictly CPU-side, and everything the GPU needs for raytracing should be either a part of that BVH or accessible through it. LODs and culling are things that would be handled before BVH generation.

        More draw calls aren’t automatically terrible anyway. Light pre-pass (aka deferred lighting, not deferred shading) needs all the directly visible geometry to be handled twice, and it was popular at a time when draw calls were very strictly limited to a few thousand per frame. Rendering everything for GI or similar (no frustum culling) is a lot more expensive than that, but we’ve also got orders of magnitude more draw calls to go around these days.

          • Laykun
          • 1 year ago

          The BVH tree is generated somewhere driver level on nvidia, that doesn’t mean that culling and LOD picking isn’t done on the GPU though. I wouldn’t be surprised if they still use the hardware on the GPU to do LOD management and culling. I don’t believe that LODs and culling would actually need happen prior to BVH generation as you might just giving a scene object list to the driver and going from there (which sounds like what you’ll be doing for the traditional rasterization part with mesh shaders).

          The problem we come into these days with draw calls per mesh is that the more dynamic features we add the more times we multiply the draw calls. For example, cascade shadow maps require all objects around the player to have at least 1 draw call, maybe more, we also want to add dynamic GI that renders into a 3D volume like a shadow map, that again means another draw call per object around the player, then there’s distance field ambient occlusion, same deal, each object that contributes to the global distance field needs a “draw call” (although that’s not on a per frame basis).

          But even then, to have the ability to do the visibility culling on the GPU is a huge boon for freeing up CPU resources as that’s generally the more expensive part that seems to balloon out of control more often than it should (this is in the context of making km^2 size maps where the artists can screw up all sorts of settings causing performance problems). I think if you have the ability to do static visibility volumes for your map then you’ve got no real problems, but once you start working on km^2 size maps, generating those visibility volumes becomes impractical incredibly quickly so you have to switch to more dynamic means of culling, and that’s where I see the biggest benefit for mesh shaders.

            • synthtel2
            • 1 year ago

            It could theoretically go submit to driver (CPU) -> cull (GPU) -> update BVH (CPU) -> trace (GPU), but the extra trip to the GPU and back isn’t free, and culling doesn’t get to be much of a thing for that BVH anyway. Rays could be coming at pretty much anything from pretty much any angle; what criteria would you even use? The only ones I can think of wouldn’t be worth doing.

            The BVH leaves are triangles. LOD/culling must be done before BVH generation, because you’ve got to know which triangles are going into the BVH.

            Without mesh shaders anywhere in the picture, adding things to the BVH isn’t even a process involving draw calls as we know them (though the driver could make it look very similar). The GPU isn’t involved until the finished BVH gets sent over. This is very unlike most non-RT algos that accomplish similar things.

            • Laykun
            • 1 year ago

            I think you misunderstand, I mean submit draw list (CPU) -> BVH (CPU) -> Culling and LOD generation (GPU) –> trace (GPU). Probably the most important part would be LOD picking on the GPU rather than culling, but you could probably cull out some parts of the BVH using the culling part of the GPU which might result in more optimal BVH lookups/traversal.

            Interesting that you say triangles are the leaf nodes for BVH trees, I’m sure they could be, is that nvidia’s approach? That seems like a pretty large task to perform runtime on the CPU in realtime. The wiki article for BVH states that object are the leaf nodes, not triangles, which makes more sense to me in terms of granularity, it’d be an impossible task to generate an octree of the scene for culling where each leaf node is a triangle, that just doesn’t make sense, and I’m sure it makes equally no sense for BVH. If it is objects being leaf nodes then you could totally do LOD picking GPU side.

            EDIT: With further thought though it starts to make less sense to me to use Mesh Shaders with RTX as the two models of drawing are incompatible. It looks to me like mesh shaders generate draw calls for specific meshes/lists, which would then populate your gbuffer for deferred shading(for example), where as with RTX you’re casting a ray from the pixel on your screen out into the scene and traversing the BVH tree, which doesn’t fit Mesh Shaders at all because you don’t know which mesh a ray is going to intersect so a ray trace could literally hit anything (I’m very keen to see how they’re doing this part as well due to memory lookups etc.). I really need to read up on how RTX is going to work tbh.

            • synthtel2
            • 1 year ago

            Page 31 of the [url=https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf<]whitepaper[/url<] (37 of the PDF) looks like the closest they get to saying directly that the BVH goes all the way down to triangles. Page 2 paragraph 3 of the TR article also works. BVHs for most other purposes only go down to object-level, but trying to go from there down to triangle-level without the help of the BVH is more trouble for raytracing than most other purposes. They're doing incremental updates rather than building the BVH from scratch every frame, so it's messy but the speed is plausible enough. The memory constraints check out, and even if they're transferring the whole thing over PCIe every frame it should still be able to handle well into the millions of triangles. It's messy for sure, but Nvidia has never shied away from this kind of mess.

            • Laykun
            • 1 year ago

            It depends if they go down to the per triangle level or if they do it for batches of triangles. If you’ve got an animated mesh with 50k triangles that’s probably not going to perform well runtime if you’re doing it on a per triangle basis, just doesn’t make sense.

            • synthtel2
            • 1 year ago

            If you’re throwing 50k triangle animated meshes at it, it’s in trouble either way. As you’re saying, redoing that much of the BVH every frame is a problem (assuming that one mesh isn’t the only thing moving in the environment), but so is trying to figure out whether you actually hit a triangle out of those 50k without a BVH. If you LOD it down to 5k triangles, regenerating it in the BVH down to individual triangles every frame works, but tracing down to mesh-level with the BVH and then trying to figure out where you stand in those remaining 5k without still doesn’t.

            Let’s say we’re working with clusters of 64 triangles as leaf nodes. Brute-forcing the ray-triangle intersection test for each of those 64 every time we run out of BVH is way too slow; with the sloppier bounds, we’re going to run out of BVH a lot more often, and this slow operation applies to every single ray we trace. We need some kind of acceleration structure for each triangle cluster… like maybe a BVH?

            • Pancake
            • 1 year ago

            You’re both probably right. The killer for BVH is random reads from graphics RAM which may not be all that good on the cache depending on the geometric complexity of a scene. And then having to rearrange the BVH structure as a scene changes dynamically. The killer for brute force calculating which triangle in a bunch is intersecting is computation. But there are lots of ALUs on the graphics card.

            My guess without knowing the specifics of the architecture is probably how much geometry fits in a cache line.

      • Mat3
      • 1 year ago

      Variable rate shading seems similar to the checkerboard rendering the PS4 Pro and Xbox One X have.

        • synthtel2
        • 1 year ago

        Were it applied downward by a factor of 2 to the whole screen at once, it would be. It’s capable of so much more, though; there are legitimately a lot of places detail can be reduced without it being a problem even in normal gaming (if some scenery is heavily out of focus / DoF’ed nobody will miss three-quarters of the pixels), and it’s a big boost to foveated rendering’s practicality.

    • Fonbu
    • 1 year ago

    DLSS seems to have some value in all this shenanigan’s from Nvidia. But how is it in motion? This seems to have parallels that I remember from a few years ago, MFAA (Multi-frame sampled AA) released with the Maxwell GTX 900 series! MFAA also had lists of supported games.

    • Andrew Lauritzen
    • 1 year ago

    > “Nvidia says DLSS offers similar image quality to TAA with half the shading work, all while avoiding blur and other unpleasant artifacts that can occur with TAA. The example above, from Epic Games’ Infiltrator demo, shows how DLSS avoids the ugliness that can result with TAA, all while delivering as much as twice the performance in terms of average frame rate at 4K resolution on the RTX 2080 Ti.”

    Alright, I get this is an “NVIDIA marketing says…” sort of situation that you are quoting, but we need to be careful of some of these grandiose claims around DLSS. I’m sure it’s a great technique for what it is, but at its core it is simply a clever upsampling algorithm and it’s important not to lose sight of that. Comparing 1/2 res upsampled DLSS to full resolution + TAA 4k rendering performance is fairly ridiculous.

    If NVIDIA all of a sudden started claiming that you should compare 1080p 4x MSAA to 4k “native” rendering, I think most folks here would know to take that with a grain of salt. But it’s true – with 4x MSAA you’re absolutely getting *even better than* the edge AA quality that supersampling would have given you, at much better performance – it’s like magic!

    Of course it’s not magic: you’re shading less and thus you’re susceptible to other kinds of aliasing among other things. *DLSS is no different.* While it is almost certainly a more clever upsampling filter, it actually has even less information than MSAA, which at least has the benefit of super-sampled depth/coverage. It’s kind of like “super FXAA” to be honest.

    Anyways not trying to throw shade on the technique itself, but the marketing for it is getting a bit ridiculous. NVIDIA is clearly looking for ways to justify all that extra neural net hardware (and associated cost!) that they are putting on consumer chips this time around so keep that in mind whenever they push associated techniques is all.

      • derFunkenstein
      • 1 year ago

      I was reading that section just now thinking there was no way that a half-res render of the scene would look nearly as good as full res, and that using the supposed “2x” mode is the only way to not lose detail. That seems to be a (simplified) version of what you’re saying here, yes?

        • Andrew Lauritzen
        • 1 year ago

        Yeah that’s a fair summary. Whether or not it will be a visual problem depends on pixel densities and the content itself, but the comparison is not entirely fair. On the other hand, we all know that raw 4k rendering and shading @ native resolution is a hilarious waste of computation power compared to other stuff that it could be applied to to improve the final image more, so its likely that in a lot of cases the difference won’t be obvious on current content. No one is really designing content that aliases heavily at 4k for obvious reasons 😉

          • auxy
          • 1 year ago

          [quote<] we all know that raw 4k rendering and shading @ native resolution is a hilarious waste of computation power compared to other stuff that it could be applied to to improve the final image more, so its likely that in a lot of cases the difference won't be obvious on current content. No one is really designing content that aliases heavily at 4k for obvious reasons ;)[/quote<] ( ;一一) [url<]https://i.imgur.com/dDIlbon.png[/url<]

            • Andrew Lauritzen
            • 1 year ago

            You saw the emoji right? 😛

            The point being yeah – anything that aliases at 4k is gonna be far worse when rendered at half res and there’s no way even the smartest upsampling algorithm in the world can correct that loss of information in all cases. Sure this one has the “content-specific” advantage, but you’re still going to see issues that would be lessened or not occur in 4k native rendering.

          • RAGEPRO
          • 1 year ago

          Yeah I’m with auxy on this one boss, sorry. Everything still aliases like crazy at 4K, heh.

      • Freon
      • 1 year ago

      My first gut feel was NV was going to try to push DLSS as equivalent to 4x or 8x SSAA. It wouldn’t be upscaling from below native, but rather just compared to rather silly AA levels no one uses and sold as equivalent.

      • techguy
      • 1 year ago

      Nvidia states in the whitepaper
      on page 35 paragraph 3:
      [i<]DLSS allows faster rendering at a lower input sample count, and then infers a result that at target resolution is similar quality to the TAA result, but with roughly half the shading work[/i<] I'm not so sure that "lower input sample rate" equates to the game being rendered at lower resolution, at least not the way the whitepaper describes it.

        • RAGEPRO
        • 1 year ago

        It does, yeah. The “input samples” for the DLSS post-processing shader are the otherwise-already-shaded pixels in the buffer. Since it’s a post-processing effect, fewer input samples means fewer pixels.

          • Andrew Lauritzen
          • 1 year ago

          RAGEPRO is correct – the statement very clearly says they are rendering at lower resolution.

          And indeed how else would they be making anything faster in the first place? If they were rendering at the same resolution and *adding* DLSS on top, things would get slower across the board. They’ve just renamed that mode to DLSS4x or something and aren’t showing benchmarks from it while they instead show you benchmarks of something with “super sampling” in the name that is actually sub-sampling.

            • Freon
            • 1 year ago

            I don’t think that’s at all evident.

            I think this is more likely rendering to standard 1x or perhaps with just 2x MSAA and then using DLSS to blend. They assert the quality is that of something like 4x SSAA or better. If you buy that the post processing looks subjectively or objectively as good as 4x SSAA (or whatever), you have lowered your sample count and done exactly what is being described. This has nothing to do with rendering to 1080p or 1440p or 1843p or whatever, then upscaling using AA to 4K.

            It MIGHT, but that seems very unclear at best, and my take is it likely is not.

            edit: it’s also still possble the DLSS algorithm is taking in more than a simple final color buffer. It may have access to depth buffer or subpixel samples like MSAA or even TAA samples and being performed “on the fly” per tile and before some of the information is destroyed (i.e. MSAA data) by committing a final frame buffer. We can’t be sure yet. They are being very cagey about specifics. It’s trivial to feed a NN multiple vectors. RGB is already three. [n-1], [n-2], [z] are just more inputs. These are already calculated and costs are paid for standard rendering or AA techniques.

            • synthtel2
            • 1 year ago

            “Lower input sample count” literally means “lower input resolution”. It might not be obvious if you’re not used to reading that kind of paper, but if you are, there’s not really any room for doubt about it. If there were any doubt, “but with roughly half the shading work” ought to seal the deal. Nvidia explicitly says that’s against a TAA result.

            MSAA doesn’t factor into this anywhere. TAA certainly doesn’t involve it, and DLSS doesn’t appear to either. If it *were* involved, it’d therefore be slowing down DLSS but not TAA, which harms your point rather than helping it.

            There is no possible way they could get the kind of performance they are without reducing input resolution. TAA is fast, and if DLSS were free they still wouldn’t see anything near that kind of boost from it if it were just about the cost of the AA itself.

            • Andrew Lauritzen
            • 1 year ago

            Yeah, all of this. To add another point: their examples are using the UE4 infultrator demo. UE4 does not even support MSAA or anything similar in their deferred rendering path; the only way to get higher performance with fewer input samples is to lower the resolution of shading.

            This was precisely the concern that prompted my original post sadly. NVIDIA are being intentionally misleading here because they have a clear agenda to justify the the fact that they are asking consumers to pay (a decent amount!) for hardware that doesn’t currently have a great consumer use case (the ML stuff). While I think it’s fine that they are betting that it will in the future, I don’t approve of nearly lying to consumers by playing games with marketing presentations and naming.

            It’s sort of sad that DLSS itself is getting tied up in this nonsense (even the name!) as I’m sure the technique is neat and useful. It likely *is* doing lots of really smart stuff with which samples it is shading and how it is combining them and so on, and it plausibly could produce better pictures overall than shading more samples. I’m not disputing any of that. Just NVIDIA pulling their usual thing and putting it behind a proprietary API and misleading consumers about how it works and how fast the new hardware is relative to the old puts a cloud over it, as it has with previous NVIDIA tech.

            It’s the usual thing: NVIDIA has a lot of smart engineers but a bad company culture and business practices.

            /rant 🙂

            • Pancake
            • 1 year ago

            Good insight that it’s probably more than just RGB.

            Anyway, I think it can BOTH be lower resolution and/or reduced sub-sampling. And probably user selectable. Just like how you can control resolution/AA options in any game today.

        • Freon
        • 1 year ago

        Right, I did not take this as rendering to lower-than-final resolution. I think 4K is still 4K, but with fewer subpixel or temporal samples per final pixel. I.e. they’re just rendering at standard 1x or with 2x MSAA then post processing that, claiming it is equivalent quality to something like 4x rotated SSAA or 8x SSAA or whatever.

      • wtfbbqlol2
      • 1 year ago

      I’m a bit skeptical about DLSS’s ability to handle subpixel details, but I think the neural network reconstruction, if done right, can do much better than FXAA. If DLSS “super FXAA” at half rez gets it super close to full rez, sign me up.

      The reason I believe DLSS has a chance of positively surprising us where other non-TAA post-process anti-aliasing technique falls short (in subpixel stuff) is this:
      – Say a game is trying to render a distant fence but only manages to show some broken lines. Based on context, my brain is able to fill in the blanks and tell me those broken lines represent a fence. If a neural network can recognize that too, then it has a good chance of reproducing an image much closer to an non-aliased image. I’m told the key thing in most things neural network is having the right training data set (nVidia is using a 64x supersampled images as ground truth so that sounds like a good start) and extra bit of neural network training “art” (choosing the best activation function and stuff like that).

      I am frustrated that nVidia hasn’t shown a high quality side-by-side comparison though, ugh. This is the best I could find that isn’t a pile of JPEG artifacts:
      [url<]https://developer.nvidia.com/sites/default/files/akamai/RTX/images/infiltratorDLAA002.png[/url<]

    • DancinJack
    • 1 year ago

    That die shot is badass.

      • CScottG
      • 1 year ago

      -I was thinking that I could totally see Jensen Huang wearing a shirt that looks like that.

    • Wirko
    • 1 year ago

    As for the headline, “delidding” would resound louder here than “popping the hood”.

    • UberGerbil
    • 1 year ago

    Well, Turing Day is (or should be) June 23rd every year (this past one would’ve been #106). Von Neumann Day (#115) will be December 28th, but that just represents a Halting Problem after Boxing Day.

    • davidbowser
    • 1 year ago

    Hands raised for anyone that misread the headline as “Popping the wood”

    Jeff REALLY likes him some GPU architecture.

    • Chrispy_
    • 1 year ago

    Nice coffee-break reading with the Nvidia marketing rubbish stripped out and replaced with English!

    Thanks Jeff, keep it coming 🙂

    • tay
    • 1 year ago

    Just buy a VA panel, and you get free Temporal AA. No need for deep learning – AMD
    *thinking.gif*

    • Mat3
    • 1 year ago

    AMD and IMG should team up.

    • thedosbox
    • 1 year ago

    [quote<]Nvidia cautions that applications will not be able to suddenly begin casting hundreds of rays per pixel in real time.[/quote<] There's a surprise. /s More seriously, it'll be fascinating to see how quickly this capability progresses. [quote<] Nvidia has hopped on the "unboxing embargo" bandwagon, meaning we can show you the scope of delivery of those cards later today [/quote<] I'm going to be disappointed if there's no cat. You did it for AMD, so should be doing it for nivida too. Otherwise completely rational accusations of "TR IS BIAS" will be made.

      • Krogoth
      • 1 year ago

      Otherwise, it wouldn’t a Purfectly traced review. 😉

      • kuraegomon
      • 1 year ago

      roflcopter

    • NarwhaleAu
    • 1 year ago

    Jeff, if I’m in the market for a new graphics card, would the performance of the new cores justify waiting instead of trying to grab a cheap 1080 Ti? I can wait a few months / until next year, but somewhere around the $600 to $700 mark is about my limit. I’m curious on what your perspective is.

    Thanks for the great article! I’m really curious about rasterization performance between the 10 and 20 series.

      • djayjp
      • 1 year ago

      He’ll recommend waiting for benchmarks/a review.

        • Jeff Kampman
        • 1 year ago

        Yeah, we’re not discussing performance today. Stay tuned.

      • Pholostan
      • 1 year ago

      Bait for WenchMarks

    • Usacomp2k3
    • 1 year ago

    I’ll be honest, I couldn’t understand alot of that. I think once we can see hardware and visual examples it’ll make a big difference in my understanding.

    • Krogoth
    • 1 year ago

    The world keeps on Turing….
    Rays are tracing everywhere……
    Navi is still a poor clap……
    Vega stands no chance……
    while Jacket Man is laughing in glee….

      • chuckula
      • 1 year ago

      Tune in next week for more As the World Turings.

        • Redocbew
        • 1 year ago

        “Raja, there’s something I need to tell you. Volta is our son.”

        Le gasp!

          • Krogoth
          • 1 year ago

          Raja: “But, now I’m with Intel!”

          The drama escalates!

    • Jeff Kampman
    • 1 year ago

    Sorry for the incomplete article, folks—there’s a lot to digest in Turing and I’ve had access to the white paper and supporting materials for less than a day. We’ll be adding to the piece as time goes on, so please keep checking back.

      • mad_one
      • 1 year ago

      “Behind the article”
      Watch as Jeff Kampman creates an article, frame by frame!

      (seriously, take your time, I have no idea how Nvidia thought it would be a good idea to release the information this late)

      • Jeff Kampman
      • 1 year ago

      OK, after 10 hours of sleep and copious amounts of coffee I’m finishing this piece while I download a terabyte or so of games to test these cards with. Sorry again for the delay.

        • psuedonymous
        • 1 year ago

        Hey Jeff, while you have Nvidia collared to answer architectural questions, can you ask if DLSS (or a DLSS-like technique) is only applicable to completed buffers, or if it can be applied to in flight textures? E.g. taking one medium-res texture and generating a pile of unique high res variants (without the normal stochastic noise techniques).

          • Freon
          • 1 year ago

          I have a sneaking suspicion that the tensor math is available in flight on textures or tiles, and that’s why tensors are part of the SM and not a separate chunk of the die like the RT cores.

        • JustAnEngineer
        • 1 year ago

        You can copy your Steam library from an existing drive to a new one more quickly than you can download it again.

          • Jeff Kampman
          • 1 year ago

          That assumes I have the games already. We’re getting plenty of new stuff to support this article.

            • Anovoca
            • 1 year ago

            I would recommend Path of Exile. Beside the fact that it is free, the new expansion delve has added a slew of new visual goodies that can really make a gpu sweat at max settings.

      • Kougar
      • 1 year ago

      No worries. For the past year Anandtech has made a habit of releasing launch day articles with a bunch of blank pages and only filling them in as the day progresses and wrapping up on the second day. Sort of a joke how Anandtech writes a conclusion page for an article that still has a dozen blank pages in it with a third of the tests still undone.

        • derFunkenstein
        • 1 year ago

        Even the posts Anandtech publishes complete have a lot of off-the-cuff guesswork and pseudo-English. Their hands-on iPhone XS post was atrocious. The comments on that post are even worse, though. Commenters were sometimes correcting things incorrectly. Writing concise, clear (Americanized) English is apparently harder than it should be.

        edit: as evidenced by my repeated editing of this comment. 😆

      • Bonusbartus
      • 1 year ago

      Thanks for the great article anyway!

      I just read through some of the older GPU architecture articles and I started wondering.
      It seems there is a trend of first introducing specialized hardware in GPUs, and later offloading them to general purpose things (like the shaders).
      In the earlier cards there were dedicated parts of hardware for the Transform and Lighting, for Anisotropic filtering, for Antialiassing.
      I think it would make for an awesome article if Techreport could write something about the timeline/history of GPUs and maybe zoom in on some older hardware implementations?
      I cannot find one source explaining how the Hardware T&L blocks are implemented/function (I know they were proprietary). I am also wondering what the consequences for performance/image quality are when dedicated hardware gets removed in favor of general purpose stuff.
      For instance the ATI R300 gpu had the “SmoothVision 2” anti-aliassing block and TruForm for tesselation.
      Do games using these features look different (worse) on newer cards?

Pin It on Pinterest

Share This