Nvidia announces three Quadro RTX cards powered by Turing GPUs

As part of the proceedings at Nvidia's SIGGRAPH keynote this evening, the company took the wraps off the first graphics cards powered by its Turing architecture for hybrid rendering with real-time ray tracing—certainly the biggest change in the production of computer graphics since the introduction of unified shaders with the GeForce 8800 GTX in 2006.

According to David Kanter, who is on the ground at SIGGRAPH, Turing includes a new functional unit called an RT core that accelerates ray-tracing-related functions like traversing bounding volume hierarchies and handling triangle intersection. It also includes a new version of Nvidia's tensor cores to perform deep learning training and inferencing operations critical to AI denoising of ray-traced scenes.

The third pillar of Turing is its traditional shader array, assembled using groups of new Turing streaming multiprocessors (SMs). Turing chips will have as many as 4608 CUDA cores capable of performing as many as 16 TFLOPS of FP32 calculations in parallel with 16 TIPS (trillion integer operations per second). Turing parts can also operate on reduced-precision data types at rates of 125 TFLOPS for FP16, 250 TOPS for INT8, and 500 TOPS for INT4.

Nvidia CEO Jensen Huang revealed that the largest Turing GPU so far will be 754 mm² in area, smaller than the 815 mm² Volta V100 but still a massive GPU by any stretch of the imagination. Turing chips will be paired with Samsung 16-Gb density GDDR6 memory modules.

  Memory Ray tracing CUDA cores Tensor cores
Quadro RTX 8000 48 GB 10 GRays/s 4608 576
Quadro RTX 6000 24 GB 10 GRays/s 4608 576
Quadro RTX 5000 16 GB 6 GRays/s 3072 384

Nvidia is announcing three Quadro RTX cards this evening. The Quadro RTX 8000 will use the largest Turing GPU with 4608 CUDA cores and 576 tensor cores. It'll have a 48-GB pool of memory, and two of these cards can be paired up through NVLink to achieve a 96-GB pool of memory. Each RTX 8000 can perform 10 GRays/s of ray-tracing processing. The RTX 6000 cuts the size of the RTX 8000's memory pool in half but maintains the same provisions of CUDA cores and tensor cores as the RTX 8000. RTX 6000s can also be paired using NVLink.

The RTX 5000 most likely takes advantage of a smaller Turing GPU to do its thing. The card has 3,072 CUDA cores and 384 tensor cores, and its GPU can perform 6 GRays/s of ray-tracing operations. This card has 16 GB of memory on board and can be paired up using NVLink.

Quadro RTX cards will support USB Type-C video output and the VirtualLink standard for delivering power and pixels to next-generation VR headsets over a single cable.

Nvidia estimates that the Quadro RTX 8000 with 48 GB of memory on board will have a $10,000 street price. The RTX 6000 with 24 GB of memory will run $6,300, while the Quadro RTX 5000 with 16 GB of memory will carry a $2300 suggested price tag. Nvidia projects that the cards will become available in the fourth quarter of this year. We can't wait to find out more.

Comments closed
    • anotherengineer
    • 1 year ago

    hmmmmm

    Think I will wait for the RTX 8800 Ultra Black Leather ed.

    • Sahrin
    • 1 year ago

    Unified shaders were introduced in the R500 series (technically the Xbox 360’s GPU was first – but the X1800 XT was also launched before) not the 8800 series.

    AMD beat nVidia by like 2 years…

      • techguy
      • 1 year ago

      AMD didn’t have a USA (Unified Shader Architecture) design until R600, AKA HD 2900 which was notoriously late compared to Geforce 8800, by about 6 months no less. Kind of reminds you me of today’s situation with Pascal and Vega. Volta/Turing/Ampere/JenHsun’sJacket3000 is a whole new territory for the competitive situation with these vendors (or lack thereof) but I digress.

      The Radeon 2000 series wikipedia article has this to say in the first line:

      “This article is about all products under the brand “Radeon HD 2000 Series”. They all contain a GPU which implements TeraScale 1, ATI’s first Unified shader model microarchitecture for PCs.”

        • Eversor
        • 1 year ago

        It depends on how you look at things. Xbox Xenos (rumored codename R500) was out in 2005 on consumer devices, just not the PC.

          • techguy
          • 1 year ago

          X1800 did not use R500 (Xenos), it used R520, which was a separate, non-unified design. Sure, R500 was unified, but it also was never available in a PC graphics card, so probably shouldn’t be directly compared to one?

      • Eversor
      • 1 year ago

      I don’t get the down votes, since you are right in some ways.

      The R500 was never officially the Xbox Xenos, which was the first unified chip to market. RV5xx and R520/580 were not unified designs but 16/8 and later 48/8 pixel/vertex designs.

      • tipoo
      • 1 year ago

      The 360s Xenos has unified shaders, but the x1800 did not. They didn’t launch unified shaders in standalone desktop cards until the 2900 series.

      That gap between shipping the first unified shader device in a mass market consumer device, and getting one out after the 8800, was always interesting to me. The buyout may have messed the schedule up.

    • demolition
    • 1 year ago

    So will it run Crysis ray-tracing edition?

    • tipoo
    • 1 year ago

    So what exactly is the ray tracing block, from a functional unit point of view?

    Is it similar to the CUDA ALUs, but with reduced precision? Increased? That would also sound similar to the neural engine with a bunch of in that case reduced precision cores, if that’s it the future seems made of cores of varying precision.

    • jihadjoe
    • 1 year ago

    Oh that poor Radeon Pro card that came out two days ago…

      • Hattig
      • 1 year ago

      Is vastly cheaper (under $1000), and is targeted at a different market segment.

      The real question is what Vega 20 will bring and at what price. I don’t think AMD has the resources to have added raytracing acceleration, and they’re way behind when it comes to AI unless you do it on the GPU cores. AMD’s decision is to do everything in a generalised manner on general purpose GPU cores, and I think that the future will show that to be a bad decision versus having application specific accelerators.

      When will we get a 1 POPS INT2 Tensor Accelerator eh Nvidia?

        • willmore
        • 1 year ago

        Bah! I need a 2 POPS INT1 processor!!! I do all of my math in GF(2)!

        • gerryg
        • 1 year ago

        I think you just contradicted yourself. Different market segments, right? It will take time to know if the market segments they did target with their new card, using generalized architecture, will pan out. But for the “different market segment” that Nvidia is targeting with Quadro RTX, yeah AMD isn’t really playing there and hasn’t said they are, so apples/oranges. There’s still a lot of money to be made in the different areas and price points each is targeting. And the RT cores are still so new that it’s going to be a while before they go mainstream and have broader software support, so AMD still has time to decide if they want to go there or not. And hey, why not let the green team blaze the trail and take some lumps first before entering the market.

    • USAFTW
    • 1 year ago

    I guess it’s only a matter of time until the new GeForces are released.
    And we know, going by what Jen Hsun considers “a very long time”, his wife must be disappointed.

    • Shobai
    • 1 year ago

    In your third paragraph, are ‘TIPS’ and ‘TOPS’ interchangeable?

      • Beahmont
      • 1 year ago

      No. I was confused at first too, but they are not the same.

      TIPS = Trillions of Int-ops Per Second.

      TOPS = Tensor Operations Per Second.

        • Shobai
        • 1 year ago

        Thanks!

    • RtFusion
    • 1 year ago

    I wonder how Turing will be positioned with Volta if it also gets adapted into the Tesla lineup.

    These are impressive graphical pieces of hardware for sure.

      • the
      • 1 year ago

      Turring likely won’t make it into a Tesla based upon FP64 rates. Even [i<][b<]if[/b<][/i<] Turing provided half rate FP64, it'd be behind Volta. There are other things like the six nvLink ports on Volta that make it suited for larger HPC nodes as well.

        • RtFusion
        • 1 year ago

        So, could it be then that the rumored Ampere be the successor then to Volta? I could see nVidia going on the path of two architectures, one main for graphics and professional visual workloads and the other for HPC and Machine Learning. And that both would share some parts from each other.

        • NTMBK
        • 1 year ago

        Not all Teslas are aimed at FP64! Remember the Tesla K10, with a pair of GK104 (aka Geforce 680) chips on one board? Certain HPC workflows just need high throughput of FP32 (or even FP16), and don’t really care about double precision.

      • Krogoth
      • 1 year ago

      Volta = general compute

      Turing = graphical compute

      They are siblings of each other, but I suspect Nvidia will start to diverge their chips designs moving forward.

    • ozzuneoj
    • 1 year ago

    Just for kicks I’d love to know what the TDP of the RTX 8000 is. 48GB of RAM and the ability to reproduce what we saw in those demos (especially that Star Wars one) in real time… yikes.

      • DancinJack
      • 1 year ago

      A guess? 300W.

        • Leader952
        • 1 year ago

        250 watts.

    • techguy
    • 1 year ago

    How much you want to bet the specs listed for Quadro RTX 5000 match GeForce RTX 2080?

      • DPete27
      • 1 year ago

      9 generations in the future is a pretty tough prediction.

        • techguy
        • 1 year ago

        Who said anything about 9 generations from now? RTX 2080 is almost certainly the name of the next Geforce “flagship” (until the real flagship 2080 Ti part is released). Or did you miss this? [url<]https://hothardware.com/news/nvidia-trademark-confirm-geforce-rtx-quadro-rtx-branding-turing-gpus[/url<] And this? [url<]https://www.youtube.com/watch?v=F7ElMOiAOBI[/url<]

      • techguy
      • 1 year ago

      To expound on my earlier comment: this is disappointing, unless Turing has some *serious* efficiency improvements compared to Pascal, or drastically higher clocks (which the Quadro RTX FP rates don’t seem to bear out). I’ll wait until reviews are out to make my purchase decision but I have a feeling I’ll continue waiting until 2080 Ti or whatever its called.

      • Eversor
      • 1 year ago

      There is no way they could produce 754mm² chips for consumer cards without them being limited editions $1000+ cards. Even at that price, certainly only after yields are further improved.

        • techguy
        • 1 year ago

        Look again. The RTX 5000’s specs are significantly lower than the 6000 and 8000 parts. It is almost certainly using a smaller (albeit, not detailed as of yet) die.

      • K-L-Waster
      • 1 year ago

      Hmmm, probably a bit high. I suspect it would be closer to whatever the RTX version of the Titan ends up being called.

        • techguy
        • 1 year ago

        Compare specs carefully. The Quadro RTX 5000 has 3072 cuda cores and 16GB RAM (implying it has a 256-bit bus). Compared to the high-end Turing Quadro RTX parts (6000 and 8000) with 4608 cuda cores and 24/48GB RAM (implying they have a 384-bit bus). Tell me – why would RT104/TU104 (still don’t know the codenames for these chips) be cutdown for an x80 SKU? That’s not typical. Look at Pascal. GP104 as featured on GTX 1080 was full-featured. Same for GM204 on GTX 980, and GK104 on GTX 680.

        I think you may be conflating the x80 SKU with the x80 *Ti*/Titan SKUs, which generally use the “big” variant of a given architectural design e.g. GP102, GM200, etc. In this case the “big” variant of Turing ought to be RT102/TU102 upon which the Quadro RTX 6000 and 8000 are based.

    • chuckula
    • 1 year ago

    [quote<]500 TOPS for INT4[/quote<] Pick a number! Any number! As long as it's between 0 and F [inclusive].

      • NTMBK
      • 1 year ago

      Wouldn’t that be UINT4? 😉

      • Bobs_Your_Uncle
      • 1 year ago

      Truly we have entered the glorious new age of the [b<][i<][u<]AccroNumeraNym[/u<][/i<][/b<] as is evidenced by this excerpt from the article, (no doubt drawn from [u<]Official Nvidia Market-Speak[/u<]: [i<]"... at rates of 125 TFLOPS for FP16, 250 TOPS for INT8, and 500 TOPS for INT4."[/i<] No longer are we constrained by the bonds of intelligible language, and have "progressed" on to the Brave New World of AlphaNumeric-Proprietary-Comms-On-Demand Salad¹. ¹(Licenses for APCODSalad Translation Software©®™ are available at reasonable annual subscription rates, should anyone imagine there might be some conceivable means to understand what that $1,000+ gizmo they just bought actually does. Good Luck with that and godspeed in your quest!)

    • drfish
    • 1 year ago

    I want to go to there.

    • brucethemoose
    • 1 year ago

    [quote<]The RTX 5000 most likely takes advantage of a smaller Turing GPU to do its thing. [/quote<] I dunno, seems like they just disabled 1/3 of the die. 2/3 the cores, tensor cores, memory bus, about the same clockrate given the performance. A "small" GPU with so much die space dedicated to stuff other than raster graphics would also be a major departure from Nvidia's previous lineups.

      • Jeff Kampman
      • 1 year ago

      Disabling 1/3 of a die this large either suggests bad yields (unlikely given that Nvidia is already happily busting the reticles at TSMC with GV100), a price so high that they can afford to absorb even the salvage dies and not take a hit in the margins, or both.

      If Nvidia is using 16 Gb GDDR6 packages from Samsung across its lineup, 16 GB of memory maps perfectly onto a chip with a 256-bit-wide memory bus using eight GDDR6 packages. That’s the strongest evidence to me that RTX 5000 uses a smaller chip.

      Bigger Turing™ uses a 384-bit bus, implying as many as 12 packages which maps perfectly onto the 24 GB/48 GB buffers Nvidia specifies.

        • brucethemoose
        • 1 year ago

        Well to get 256 bit, they could also cut out the 1/3 of the memory bus, which Nvidia has done in the past. So I wouldn’t call that particularly strong evidence.

        But yeah, I think you’re right now. My speculation was based on an assumtion that this absolutely, positively wasn’t gaming silicon… But with the gaming Turing hints, they just have to have a smaller 256 bit GPU in the stack.

    • smilingcrow
    • 1 year ago

    Interesting timing with Nvidia releasing radical new tech which may well expand multiple industries on the same day that AMD release a chopped down version of their server platform for HEDT/Workstation.
    Why? Because Nvidia are looking at replacing even more CPU workloads with GPUs so whilst there’s a battle with AMD fighting Intel for the current CPU market share Nvidia aim to reduce that market size.
    Not a good day for Intel overall.

Pin It on Pinterest

Share This