If the GPU world were a wildlife special on the National Geographic channel, the G80 processor that powers GeForce 8800 GTX graphics cards would be a stunningly successful apex predator. In the nearly two years that have passed since its introduction, no other single-chip graphics solution has surpassed it. Newer GPUs have come close, shrinking similar capabilities into smaller, cooler chips, but that's about it. The G80 is still the biggest, baddest beast of its kinda chip, as we said at the time, with "the approximate surface area of Rosie O'Donnell." After it dispatched its would-be rival, the Radeon HD 2900 XT, in an epic mismatch, AMD gave up on building high-end GPUs altogether, preferring instead to go the multi-GPU route. Meanwhile, the G80 has sired a whole range of successful offspring, from teeny little mobile chips to dual-chip monstrosities like the GeForce 9800 GX2.
Of course, even the strongest predator has a limited time as king of the pride, and the G80's reign is coming to a close. Today, its true heir arrives on the scene in the form of the GT200 graphics processor powering the GeForce GTX 200-series graphics cards. Despite being built on a smaller chip fabrication process, the GT200 is even larger than the G80, and it packs nearly twice the processing power of its progenitor.
This new contender isn't content with just ruling the same territory, either. Nvidia has ambitious plans to expand the GPU's processing domain beyond real-time graphics and gaming, and as the GPU computing picture becomes clearer, those plans seem increasingly viable. Join us as we dive in for a look at this formidable new processor.
The GT200 GPU: an overview
The first thing to be said about the GT200 is that it's not a major departure from Nvidia's current stable of G80-derived GPUs. Instead, it's very much a refinement of that architecture, with a multitude of tweaks throughout intended to improve throughput, efficiency, and the like. The GT200 adds a handful of new capabilities at the edges, but its core graphics functionality is very similar to current GeForce 8- and 9-series products.
As any graphics expert will tell you, determining what's changed involves the study of Chiclets, of course. Nvidia has laid out the Chiclets in various flavors and patterns in order to convey the internal organization of GT200. Behold:
Shiny, but with a chewy center!
Arranged in this way, the Chiclets have much to tell us. The 10 large groups across the upper portion of the diagram are what Nvidia calls thread processing clusters, or TPCs. TPCs are familiar from G80, which has eight of them onboard. The little green boxes inside of the TPCs are the chip's basic processing cores, known in Nvidia's parlance as stream processors or SPs. The SPs are arranged in groups of eight, as you can see, and these groups have earned their own name and acronym, for the trifecta: they're called SMs, or streaming multiprocessors.
Now, let's combine the power of all three terms. 10 TPCs multiplied by three SMs times eight SPs works out to a total of 240 processing cores on the GT200. That's an awful lot of green Chiclets and nearly twice the G80's 128 SPs, a substantial increase in processing potentialnot to mention chewy, minty flavor.
One of the key changes in the organization of the GT200 is the increase from two to three SMs inside of each thread processing cluster. The TPCs still house the chip's texture addressing and filtering hardware (brown Chiclets), but the ratio of SPs to texturing units has increased by half, from 2:1 to 3:1. We've seen a growing bias toward shader power versus texturing over time, and this is another step in that direction. Even with the change, though, Nvidia remains more conservative on this front than AMD.
The lower part of the diagram reveals a corresponding rise in pixel-pushing power with the increase in ROP (raster operator) partitions from six on the G80 (GeForce 8800 GTX) and four on the G92 (GeForce 9800 GTX) to eight on the GT200. Since each ROP partition can output four pixels at a time, the GT200 can output 32 pixels per clock. And since each ROP partition also hosts a 64-bit memory controller, the GT200's path to memory is an aggregated 512 bits wide.
In short, the GT200 has a whole lot of pretty much everything.
One thing it lacks, however, is support for DirectX 10.1. Some folks had expected Nvidia to follow AMD down this path, since AMD introduced DX10.1 support in its Radeon HD 3000 series last fall. DX10.1 introduces extensions that expose greater control over the GPU's antialiasing capabilities, among other things. Nvidia says its GPUs can handle some DX10.1 capabilities, but not all of them. That prevents it from claming DX10.1 support, since Microsoft considers it an all-or-nothing affair. Curiously, though, Nvidia says it is working with game developers to support a subset of DX10.1 extensions, even though Microsoft may not be entirely pleased with the prospect. I believe that work includes addressing problems with antialiasing and game engines that use deferred shading, one of the places where DX10.1 promises to have a big performance impact. Curiouser and curiouser: Nvidia is cagey about exactly which DX10.1 capabilities its GPUs can and cannot support, for whatever reason.