BY NOW, graphics aficionados are very familiar with the story of NVIDIA's rise to the top in PC graphics, followed by a surprising stumble with its last generation of products, the NV30 series. NV30 was late to market and, when it arrived, couldn't keep pace with ATI's impressive Radeon 9700 chip. Worse, the GeForce FX 5800 had a Dustbuster-like cooling appendage strapped to its side that was loud enough to elicit mockery from even jaded overclocking enthusiasts. NVIDIA followed up the FX 5800 with a series of NV30-derived GPUs that got relatively better over time compared to the competition, but only because NVIDIA threw lots of money into new chip revisions, compiler technology, and significantly faster memory than ATI used.
Even so, the GeForce FX series was relatively slow at pixel shader programs and was behind the curve in some respects. ATI's R3x0-series chips had more pixel pipelines, better antialiasing, and didn't require as much tuning in order to achieve optimal performance. More importantly, NVIDIA itself had lost some of its luster as the would-be Intel of the graphics world. The company clearly didn't enjoy being in second place, and sometimes became evasive or combative about its technology and the issues surrounding it.
But as of today, that's all ancient history. NVIDIA is back with a new chip, the NV40, produced by a new, crystal-clear set of design principles. The first NV40-based product is the GeForce 6800 Ultra. I've been playing with one for the past few days here in Damage Labs, and I'm pleased to report that it's really, really good. For a better understanding of how and why, let's look at some of the basic design principles that guided NV40 development.
- Massive parallelism Processing graphics is about drawing pixels, an inherently paralleliziable task. The NV40 is has sixteen real, honest-to-goodness pixel pipelines"no funny business," as the company put it in one briefing. By contrast, NV30 and its high-end derivatives had a four-pipe design with two texture units per pipe that could, in special cases involving Z and stencil operations, process eight pixels per clock. The NV40 has sixteen pixel pipes with one texture unit per pipe, and in special cases, it can produce 32 pixels per clock. To feed these pipes, the NV40 has six vertex shader units, as well.
An overview of the NV40 architecture. Source: NVIDIA
All told, NV40 weighs in at 222 million transistors, roughly double the count of an ATI Radeon 9800 GPU and well more than even the largest desktop microprocessor. To give you some context, the most complex desktop CPU is Intel's Pentium 4 Prescott at "only" 125 million transistors. Somewhat surprisingly, the NV40 chip is fabricated by IBM on a 0.13-micron fabrication process, not by traditional NVIDIA partner TSMC.
By going with a 0.13-micron fab process and sixteen pipes, NVIDIA is obviously banking on its chip architecture, not advances in manufacturing techniques and higher clock speeds, to provide next-generation performance.
- Scalability With sixteen parallel pixel pipes comes scalability, and NVIDIA intends to exploit this characteristic of NV40 by developing a top-to-bottom lineup of products derived from this high-end GPU. They will all share the same features and differ primarily in performance. You can guess how: the lower end products will have fewer pixel pipes and fewer vertex shader units.
Contrast that plan with the reality of NV3x, which NVIDIA admits was difficult to scale from top to bottom. The high-end GeForce FX chips had four pixel pipes with two texture units eacha 4x2 designwhile the mid-range chips were a 4x1 design. Even more oddly, the low-end GeForce FX 5200 was rumored to be an amalgamation of NV3x pixel shaders and fixed-function GeForce2-class technology.
NVIDIA has disavowed the "cascading architectures" approach where older technology generations trickle down to fill the lower rungs of the product line. Developers should soon be able to write applications and games with confidence that the latest features will be supported, in a meaningful way, with decent performance, on a low-end video card.
A single, superscalar pixel shader unit. Source: NVIDIA
- More general computational power The NV40 is a more capable general-purpose computing engine than any graphics chip that came before it. The chip supports pixel shader and vertex shader versions 3.0, as defined in Microsoft's DirectX 9 spec, with support for long instruction programs, looping, branching, and dynamic flow control. Also, NV40 can process data internally with 32 bits of floating-point precision per color channel (red, green, blue, and alpha) with no performance penalty. Combined with the other features of 3.0 shaders, this additional precision should allow developers to employ more advanced rendering techniques with fewer compromises and workarounds.
- More performance per unit of transistors Although GPUs are gaining more general programmability, this trend stands in tension with the usual mission of graphics chips, which has been to accelerate graphics functions through custom logic. NVIDIA has attempted to strike a better balance in NV40 between general computing power and custom graphics logic, with the aim of achieving more efficiency and higher overall performance. As a result, NV40's various functional units are quite flexible, but judiciously include logic to accelerate common graphics functions.
By following these principles, NVIDIA has produced a chip with much higher performance limits than the previous generation of products. Compared to the GeForce FX 5950 Ultra, NVIDIA says the NV40 has two times the geometry processing power, four to eight times the 32-bit floating-point pixel shading power, and four times the occlusion culling performance. The company modestly says this is the biggest single performance leap between product generations in its history. For those of us who are old enough to remember the jump from the Riva 128 to the TNT, or even from the GeForce3 to the GeForce4 Ti, that's quite a claim to be making. Let's see if they can back it up.