Single page Print

Nvidia's GeForce GTX 780 Ti graphics card reviewed


Now witness the firepower of this fully armed and operational battle station
— 7:10 PM on November 7, 2013

Boy, this is entertaining. AMD uncorks the Radeon R9 290X and captures the GPU performance crown. Nvidia counters by announcing price cuts and promising to introduce a mysterious new GPU, the GeForce GTX 780 Ti. Along the way, we've seen a host of new features and buzzwords injected into the high-end graphics space as part of the conversation: AMD's Mantle low-level graphics API, Nvidia's creamy smooth G-Sync display tech, and the TrueAudio DSP block built into the new Radeons. Prices have dropped pretty dramatically in the past month, too.

For a dying market, high-end PC graphics sure has a lotta vitality. Makes you wonder about the whole post-PC narrative that's so popular right now. Hmm.

Anyhow, these are fun times. Things get even more interesting today, since Nvidia is finally pulling back the curtain on the GeForce GTX 780 Ti, its answer to the Radeon R9 290X. The GTX 780 Ti's purpose in life is crystal clear: to be the best single-GPU graphics card in the world. We knew this fact even before the card arrived in Damage Labs. The intriguing question, in my view, was how Nvidia would achieve this feat. After all, the GK110 chip that powers the Titan and the GTX 780 has been around for quite some time now. How much additional goodness did Nvidia really have left in reserve?

The GeForce GTX 780 Ti
Yeah, turns out the green team was holding back quite a bit. The GK110 is the largest graphics processor ever made, over 100 mm² larger than AMD's new Hawaii chip. We've known for ages that no GK110-based product—not even in the expensive, compute-focused Tesla lineup—comes with all of the chip's units enabled. Nvidia has stated publicly that the chip has a total of 15 SMX units onboard, yet the GeForce Titan has one SMX disabled and the GTX 780 is down three SMX units. So the simplest way Nvidia could raise its game was to enable all of the GK110's available units. But surely it wouldn't be that easy, or they'd have done it already, right?

GPU
base
clock
(MHz)
GPU
boost
clock
(MHz)
ROP
pixels/
clock
Texels
filtered/
clock
Shader
pro-
cessors
Memory
transfer
rate
Total
memory
path
width
(bits)
Peak
power
draw
Price
GeForce GTX 760 980 1033 32 96 1152 6 GT/s 256 170W $249
GeForce GTX 770 1046 1085 32 128 1536 7 GT/s 256 230W $329
GeForce GTX 780 863 902 48 192 2304 6 GT/s 384 250W $499
GeForce GTX 780 Ti 876 928 48 240 2880 7 GT/s 384 250W $699
GeForce GTX Titan 837 876 48 224 2688 6 GT/s 384 250W $999

Nvidia has evidently been keeping some juice in reserve for an occasion like this one. The GeForce GTX 780 Ti is powered by a GK110 with all 15 SMX units enabled, granting it a grand total of 2880 shader processors and 240 texels per clock of filtering power. That's . . . plentiful, a nice increase from the GTX 780 and Titan. Impressively, the GTX 780 Ti also has higher base and boost clock speeds than those other cards, while operating at the same 250W power limit.

When I asked Nvidia where it found the dark magic to achieve this feat, the answer was more complex than expected. For one thing, this card is based on a new revision of the GK110, the GK110B (or it is GK110b? GK110-B?). The primary benefit of the GK110B is higher yields, or more good chips per wafer. Nvidia quietly rolled out the GK110B back in August aboard GTX 780 and Titan cards, so it's not unique to the 780 Ti. Separate from any changes made to improve yields, the newer silicon also benefits from refinements to TSMC's 28-nm process made during the course of this year. You can imagine Nvidia has been sorting its GK110B chips into different bins depending on their quality since at least August. The ones deployed on the 780 Ti are presumably the best of the best. The end result of all these measures is a GK110-based product with 15 SMX units enabled that achieves higher clock speeds at nice, tame voltages.

That's not the whole story, either. You may recall that, in my R9 290X review, I explained how AMD's Hawaii chip benefited from an engineering tradeoff. The design team chose to implement a simpler physical interface in order to allow a very wide 512-bit path to memory in less die area. The downside was that GDDR5 operational speeds would be relatively low, at 5 GT/s, but the additional width would make up the slack. The tradeoff worked. Although substantially smaller than GK110, the Hawaii chip in the R9 290X had more memory bandwidth, as much as 320 GB/s.

Well, the GTX 780 Ti is the revenge of the other approach to that tradeoff. The GK110 has a narrower 384-bit memory path, but it happily pairs up with GDDR5 memory chips running at 7 GT/s to achieve 336 GB/s of bandwidth—a bit more than the 290X.

Peak pixel
fill rate
(Gpixels/s)
Peak
bilinear
filtering
int8/fp16
(Gtexels/s)
Peak
shader
arithmetic
rate
(tflops)
Peak
rasterization
rate
(Gtris/s)
Memory
bandwidth
(GB/s)
Radeon HD 5870 27 68/34 2.7 0.9 154
Radeon HD 6970 28 85/43 2.7 1.8 176
Radeon HD 7970 30 118/59 3.8 1.9 264
Radeon R9 280X 32 128/64 4.1 2.0 288
Radeon R9 290 61 152/86 4.8 3.8 320
Radeon R9 290X 64 176/88 5.6 4.0 320
GeForce GTX 770 35 139/139 3.3 4.3 224
GeForce GTX 780 43 173/173 4.2 3.6 or 4.5 288
GeForce GTX Titan 42 196/196 4.7 4.4 288
GeForce GTX 780 Ti 45 223/223 5.3 4.6 336

Overall, the GTX 780 Ti has the highest theoretical peak capacities for texture filtering, rasterization, and memory bandwidth of any single-GPU solution today. That's true even though the numbers in the table above are somewhat skewed. You see, we compute these theoretical peak rates based on the "boost" clocks for each graphics card. Trouble is, Nvidia's boost clocks are intended to reflect the card's typical operating frequency, not the absolute peak. Nvidia doesn't advertise the max clock speeds for its products. Meanwhile, AMD's boost clock reflects an upper limit, and as we've been learning with the R9 290 series, typical operating frequencies can sometimes be substantially lower than that.

To accentuate this point, the green team pointed out that the GTX 780 Ti's actual peak clock speed is 993MHz. At that frequency, the 780 Ti's theoretical maximum shader arithmetic rate is 5.72 gflops, higher than the R9 290X's best case.

For, you know, whatever that's worth. We'll be measuring delivered performance here shortly.

Nvidia has added one other feature to the GTX 780 Ti that's not present in the GTX 780 or Titan, something called power balancing. This card takes its power from three sources: the PCIe slot, a six-pin power input, and an eight-pin power input. In normal operation, those sources ought to be more than adequate to meet its 250W power budget. When overclocking, though, it's possible one of the three inputs could become overburdened and unable to supply more power. Nvidia says the 780 Ti can pull power from the other inputs in order to get everything it needs. The result should be some extra overclocking headroom in cases where input power is the primary limitation.