Single page Print

NVIDIA's GeForce 7800 GTX graphics processor

Power MADD

SO DOES IT HAVE new features, better performance, or what? That's generally the first question one might ask when discussing a new generation of graphics chip—or at least, it used to be. That question has become increasingly less relevant as graphics chips have gained the ability to process a rich set of datatypes, including floating-point numbers that can be very accurate representations of light and color. In fact, since the introduction of the Radeon 9700, graphics processors have been capable of rendering just about any sort of effect one might wish to see. The big question now is: can it pump out that sort of tasty eye candy in real time, or is it just a slideshow? There are still some important secondary questions about the arrangement of computational resources on the GPU and how they impact performance, but at the end of the day, a graphics processor is now judged by its real-world computational power.

By that standard, NVIDIA's GeForce 6 series graphics processors have had plenty of success. Since the introduction of the GeForce 6800 Ultra last April, NVIDIA has rolled out a top-to-bottom rework of its entire GPU lineup, with the exception of the integrated graphics core in its nForce chipsets—and we spotted early versions of the new C51G chipset at Computex, complete with GeForce 6-class integrated graphics. The scope of the product refresh is impressive, but the real news is the potency of those products. In nearly every segment from uber-high-end setups down to cheapo $59 graphics cards, NVIDIA has contended closely with ATI for market leadership. And in some cases, like the $199 sweet spot for gamers' graphics cards, the GeForce 6 series has been the uncontested leader. Cards based on the NV4x series of chips have packed a wallop in terms of pixel processing power, enabling new sorts of effects in real time that game developers have only just begun to employ.

Now, NVIDIA is back with a new high-end chip, known by the code-name G70, that promises significantly more power than the GeForce 6800 Ultra. To add to the intrigue, the G70 chip is closely related to the RSX graphics processor that NVIDIA is developing for the PlayStation 3. Can NVIDIA duplicate the success of the NV40 series by following it up with a worthy successor? Can the GeForce 7800 GTX really deliver on NVIDIA's claims of up to twice the shader power of the GeForce 6800 Ultra? And if you run a pair of GeForce 7800 GTX cards together in an SLI config, will the fabric of time and space warp? Keep reading for some answers.

Inside the G70 GPU
Despite NVIDIA's protestations to the contrary, the G70 architecture is clearly derived from the NV40. Not that there's anything wrong with that. The NV40 and G70 both have a full implementation of Shader Model 3.0, the programming model for Windows graphics that looks like it will be the standard until Microsoft's Longhorn OS arrives. Shader Model 3.0 packs plenty of capability and precision for real-time graphics use in the next year or two, at least, and ATI's upcoming R520 chip is rumored to support SM3.0, as well.

In order to understand how the G70 differs from and improves upon the NV40 architecture, let's take a look at a simplified block diagram of the G70 design.

Block diagram of the G70 architecture. Source: NVIDIA.

A vertex shader unit diagram. Source: NVIDIA.

Intimidating, maybe, but less so once you realize how much parallelism is invovled. After all, most of those functional blocks are just carbon copies of one another. We'll refer back to this diagram as we discuss how the G70 has changed in various ways. Those ways include:
  • More vertex shaders — At the top of the diagram, you can see eight parallel functional units. These are the vertex shaders that handle geometry processing for the GPU. A scene with lots of polygon detail in it may cause the vertex shaders to become a bottleneck, although NVIDIA does do some offloading of vertex processing to the host CPU when the GPU is especially busy. The NV40 has six vertex shader units, and the G70 adds two more. Because vertex processing is, like much in graphics, easily parallelizable, the additional vertex processing units should provide a straightforward performance boost in vertex processing-limited situations.

    Not only does G70 have more vertex units, but those units can complete more operations per clock than the NV40's. NVIDIA says it has tweaked the vertex shader units so that they now can process MADD (multiply-add) operations in a single clock cycle. These tweaked vertex units are purportedly up to 30% faster in scalar math ops, as well.

  • More pixel shaders — Pixel shaders are the rock stars of the graphics pipeline these days, because they provide most of the fancy lighting and shading effects. Most discussions of "shader programs" or GPU computational capacity have to do with pixel shaders.

    Across the middle of the block diagram are the G70's pixel shader units, arranged in groups of four. This grouping of pixel shaders into "quads" is not a change from NV40; this diagram is simply more accurate than the ones NVIDIA supplied to us at the time of our GeForce 6800 Ultra review. The shader units in each quad share some on-chip resources, including an L2 texture cache. We have known about the grouping of shader pipes into quads for some time now (ATI does it, too), but the really interesting thing in the diagram above is the 2x2 arrangement of pixel shader units, so that two of them are in line with two others. This arrangement isn't really a faithful representation of the flow of data inside the chip; the G70's "quad" piplines actually operate on four fragments concurrently and in parallel.

    A pixel shader unit diagram. Source: NVIDIA.

    What we do know is that the G70 is a very wide SIMD (single-instruction, multiple-data) machine. That is, as I understand it, the G70 typically executes the same shader in parallel on all of the would-be pixels (also known as fragments) in all of its shader quads at once. For the G70's six quads, that comes out to 24 shaders running the same program on 24 pixel fragments simultaneously. Such massive parallelism may sound impressive, but it does present some potential performance obstacles for the G70. Shader Model 3.0 includes support for shader programs with dynamic branching based on conditionals, which means it's possible that some fragments in the G70's pipelines could take one path down a branch, while the rest might take another. Would the GPU then have to take all fragments through both branches, masking out the ones that didn't participate in the active branch? That's quite possibly what happens sometimes, and it could potentially make the G70 a little less efficient than a narrower design. In fact, the GeForce 6600 GT is an NV40-derived design that's half as wide as the original NV40, and its surprisingly good performance could be the result of somewhat higher efficiency. Nevertheless, graphics is an inherently parallel task, and the G70's six pixel shader quads should serve it well.

  • Better pixel shaders — Probably the most notable change in the G70 is the rearchitecturing of the pixel shader pipelines so they have more arithmetic capacity. NVIDIA says they analyzed over 1300 of the most common shader algorithms in games and applications in order to determine the likely usage model for the G70's shaders. That analysis led NVIDIA to produce a shader unit that can do twice as many MADD (multiply-add) operations per clock as those in the NV40. Where the NV40 could do a single four-component MADD operation per fragment per clock, the G70 can do two four-component MADDs per fragment per clock. This doubling of computational capacity hearkens back to the days of the original GeForce chip, when the GeForce2 came along and more or less doubled peak performance by adding a second texturing unit per pipe, only this time, what's doubled is the capacity to handle an operation common in many programmable shaders. Of course, performance isn't only about MADD operations, so the gains provided by this tweak alone will vary depending on the application.