Last week at its GPU Technology Conference, Nvidia unveiled the first details of its upcoming GK110 GPU, the "real" Kepler chip and bigger brother to the GK104 silicon powering the GeForce GTX 600 series. Although the GK110 won't be hitting the market for some time yet, Nvidia's increasing focus on GPU-computing applications has changed the rules, causing the GPU firm to show its cards well ahead of the product's release. As a result, we now know quite a bit about the GK110's architecture and the mix of resources it offers for GPU-computing work. With a little conjecture, we can probably paint a fairly accurate picture of its graphics capabilities, too.
Let's start with the GK110's basic specifications. Since we've known the GK104's layout for a while now, the exact dimensions of its bigger brother have been the subject of some speculation. Turns out most of our guesses weren't too far from the mark, although there are a few surprises. We don't have its exact dimensions yet, but the chip itself is likely to be enormous; it packs in 7.1 billion transistors, roughly double the count of the GK104. The die shot released by Nvidia offers some clear hints about how those transistors have been allocated, as you can see below.
The GK110 is divided into five of the deep green structures above, which are almost certainly GPCs, or graphics processing clusters, nearly complete GPUs unto themselves. Each of those GPCs houses three SMX cores, and Nvidia has confirmed the chip hosts a total of 15 of those. By contrast, the GK104 has four GPCs with two SMX cores each, so the GK110 nearly doubles its per-clock processing power.
Ringing three sides of the chip are its six 64-bit memory controllers, giving it an aggregate 384-bit path to memory, 50% more than the GK104. That's not an increase in interface width from the big Kepler's true predecessor, the Fermi-based GF110, but GDDR5 data rates are up by roughly 50% in the Kepler generation, so there's a bandwidth increase on tap, regardless. Looks like the PCI Express interface is on the upper edge of the chip; it has been upgraded to Gen3, with twice the peak data rates of Gen2 devices.
Because it has a dual mission, serving both the GPU computing and video card markets, the GK110 has a bit different character than GK104. As you've likely noted, in some cases it has twice the capacity of GK104, while other increases are closer to 50% or so. More notably, the GK110 has some compute-oriented features that the GK104 lacks, including ECC support (for both on-chip storage and off-chip memory) and the ability to process double-precision floating-point math at much higher rates. (The GK104 has token double-precision support at 1/24th the single precision rate, only to maintain compatibility. Single-precision datatypes tend to be entirely sufficient for real-time graphics and most consumer applications involving GPU computing.)
Nvidia said repeatedly at the show that increasing double-precision performance was a major objective for the big Kepler chip, and it appears the firm is on track to deliver. The GF110-based Tesla M2090 card is rated for a peak of 666 DP gigaflops, and Nvidia claims the GK110-based Tesla K20 will exceed one teraflops. If we assume a relatively conservative clock rate of 700MHz for the Tesla product, we'd expect the K20 to double the M2090's throughput, to 1.3 teraflops.
The ceiling may be even higher than that. Nvidia's press release about the K20 cryptically says the GK110 "delivers three times more double precision performance compared to Fermi architecture-based Tesla products," and Huang said something similar in his keynote. In other presentations, though, the 3X claims were tied to power efficiency, as in three times the DP flops per watt, which seems like a more plausible outcome—and a very good one, since power constraints are paramount in virtually any computing environment these days. In order to deliver full-on three times the DP flops of Fermi-based Tesla cards, the K20 would have to run at nearly 1GHz. It's possible the K20 could reach that speed temporarily thanks to Nvidia's new driver-based dynamic voltage and frequency scaling mechanism (dubbed GPU Boost in the GeForce products), but it seems unlikely the K20 will achieve sustained operation at that frequency.