This week at SIGGRAPH, Nvidia introduced its Turing microarchitecture, the next major advance for the green team since Pascal made its debut over two years ago. If you're not already familiar, Turing includes RT cores that accelerate certain operations related to ray-tracing and tensor cores for accelerating certain AI operations to render the results of those ray-tracing calculations usable for real-time rendering, among other benefits that we're likely unaware of.
A Turing GPU.
Based on information that Nvidia revealed at SIGGRAPH, some back-of-the-napkin calculations, and a waterfall of leaks today, I wanted to see how the rumored GeForce RTX 2080 and GeForce RTX 2080 Ti will stack up against today's Pascal products—at least hypothetically.
Before we begin, I want to be clear that this article is mostly speculative and something I'm doing for fun. That speculation is based on my prior knowledge of Nvidia's organization of its graphics processors and the associated resource counts at each level of the chip hierarchy. It's entirely possible that my estimates and guesstimates are wildly off. Until Nvidia reveals an architectural white paper or briefs the press on Turing, we will not know just how correct any of these estimates are, if they are correct at all. I've marked figures I'm unsure of or produced using educated guesses with question marks.
The Volta SM. Source: Nvidia
My biggest leap of faith about Turing is that its basic streaming multiprocessor (or SM) design is not fundamentally that different from those in the Volta V100 GPU of June 2017. Nvidia will almost certainly drop the FP64 capabilities of Volta from Turing to save on die area, power, and cost, since those compute-focused ALUs have practically no relevance to real-time rendering. The company needs to make room for those RT cores, among other, better things it might be doing with the die area.
Past that, though, Nvidia has already said that Turing will maintain the independent parallel floating-point and integer execution paths of Volta. Furthermore, the number of tensor cores on the most powerful Turing card revealed so far, combined with some simple GPU math, suggests the Turing SM will maintain the same number of tensor cores as that of Volta. Those signs suggest we can fairly safely speculate about Turing using the organization of the Volta SM. That leap of faith is necessary here because Nvidia hasn't revealed the texturing power of Turing yet. Volta uses four texturing units per SM, so that's the fundamental assumption I'll work with for Turing, as well.
I also believe, without confirmation, that Nvidia will be releasing two Turing GPUs. One, which I'll call “bigger Turing,” should power the Quadro RTX 6000 and Quadro RTX 8000, as well as the purported RTX 2080 Ti. That 754-mm² chip has a 384-bit memory bus and as many as 4608 CUDA cores, and I'm guessing it's organized into 72 SMs and six Graphics Processing Clusters (or GPCs).
The “smaller Turing” apparently has a 256-bit memory bus, and it likely powers the Quadro RTX 5000 and the purported RTX 2080. That card likely has 48 SMs, organized into four GPCs. Judging by today's leaks, Nvidia seems to be using slightly cut-down chips in GeForce RTX products (likely as a result of yields). Fully-active Turing chips seem to be reserved for Quadro RTX cards.
|Radeon RX Vega 56||1471||64||224/112||3584||2048||410 GB/s||8 GB|
|GTX 1070||1683||64||108/108||1920||256||259 GB/s||8 GB|
|GTX 1080||1733||64||160/160||2560||256||320 GB/s||8 GB|
|Radeon RX Vega 64||1546||64||256/128||4096||2048||484 GB/s||8 GB|
|RTX 2080?||~1800?||64?||184/184?||2944?||256||448 GB/s||8 GB?|
|GTX 1080 Ti||1582||88||224/224?||3584||352||484 GB/s||11 GB|
|RTX 2080 Ti?||~1740?||88?||272/272?||4352?||352?||616 GB/s?||11 GB?|
|Titan Xp||1582||96||240/240||3840||384||547 GB/s||12 GB|
|Quadro RTX 8000||~1740?||96?||288/288?||4608||384||672 GB/s||24 GB|
|Titan V||1455||96||320/320||5120||3072||653 GB/s||12 GB|
This first chart primarily shows how the move from 8 GT/s GDDR5, 10 GT/s GDDR5X, and 11 GT/s GDDR5X to 14 GT/s GDDR6 will affect our contenders, as well as their basic (estimated) resource counts. We know that Nvidia claims a 16 TFLOPS FP32 math rate for the Quadro RTX 6000's GPU, so that means a roughly 1740-MHz boost clock range. The potential RTX 2080's clock speed, on the other hand, is a total guess from the gut.
|RX Vega 56||94||330/165||5.9||10.5|
|RX Vega 64||99||396/198||6.2||12.7|
|GTX 1080 Ti||139||354/354||9.5||11.3|
|RTX 2080 Ti?||153?||473/473?||10.4?||15.1?|
|Quadro RTX 6000||167?||501/501?||10.4?||16.0?|
This second set of theoretical measurements shows that unlike the transition from the GTX 980 Ti to the GTX 1080, the RTX 2080 is unlikely to eclipse the GTX 1080 Ti in measures of traditional rasterization performance (likely how most users will first experience the card's power, as software adapts to the hybrid-rendering future that Turing promises). The 2080 could certainly come close to the 1080 Ti in texturing power and shader-math rates, but its pixel fill rate and peak rasterization rates aren't much changed from its Pascal predecessor (at least, if my guesses are right).
My guesstimates about the RTX 2080 Ti, on the other hand, suggest a real leap in performance for a Ti-class “bigger” GeForce. The texturing power of the purported 2080 Ti is quite a bit higher than even that of the Titan V's by my estimate, and its triangle throughput, peak pixel fill rate, and peak FLOPS are basically chart-topping for consumer Nvidia graphics processors. That should lead to some truly impressive performance figures, even before we consider the possibilities opened up by the card's ray-tracing acceleration hardware.
Nvidia will be holding a Gamescom event this Monday, August 20, where we expect to learn all about these purported Turing GeForces. We won't be on the ground in Cologne, Germany for the event, but we will be monitoring the live stream and will bring you all the details we can as we learn them. Stay tuned.