Under the hood of NV35
The NV35 chip is based on the NV30, but NVIDIA has made a number of strategic tweaks to NV35 in order to boost performance. Like the NV30, this chip is manufactured by TSMC on its 0.13-micron fab process. The transistor count is up by roughly 10 million, from about 125 million for NV30 to about 135 million for NV35. NVIDIA says they've added some bits and removed others, but they won't talk about what's been chopped out. I can, however, tell you about what's been added and tweaked. Let's look at the list.

  • A 256-bit memory interface — The GeForce FX 5800 Ultra had a 128-bit memory interface. Even with fast DDR-II memory, the card's memory bandwidth peak was at 16GB/s. That ain't puny, but the competing ATI cards had about 20GB/s with slower RAM and wider paths to memory. NVIDIA has moved to a 256-bit memory path and, at least for now, fallen back to regular DDR memory for the FX 5900 Ultra. The 5900 Ultra's memory clock is now 425MHz, or effectively 850MHz thanks to the magic of DDR, for total memory bandwidth of 27.2GB/s. That's world-beating memory bandwidth, and you've gotta respect it.

  • Twice the pixel shading power — NVIDIA claims the NV35 has double the floating-point pixel shading power of the NV30, which suggests twice the number of FP pixel shader units. I'd like to tell you how many pixel shader units the two chips have, but NVIDIA is keeping a lid on such dirty little secrets.

    Like the NV30, the NV35 can compute FP pixel shaders with either 64 or 128 bits of precision, and it can freely intermix 64-bit and 128-bit datatypes as it goes. There is a speed versus precision tradeoff involved in this process, so developers will want to choose carefully.

    ATI handles this tradeoff a little differently. Radeon R300-series chips do their pixel shader calculations with 96 bits of precision, evenly splitting the difference between 64 and 128 bits. The ATI chips generally perform well with FP pixel shader programs, but they offer less peak precision than the NVIDIA chips.

  • Improved Z and color compression — NVIDIA has added more cache and additional gates to the NV35 in order to improve the chip's compression engine for Z and color data. NVIDIA's official marketing name for this change is Intellisample HCT, where HCT stands for "high-res compression technology." The NV35's compression ratio for Z and color data is still 4:1, but NVIDIA says the chip can do compression "more often" thanks to the hardware tweaks.

    The impact of these changes should be fairly straightforward. Improved Z compression should improve the chip's effective pixel-pushing power, especially at higher resolutions. Improved color compression will also boost fill rate, mainly when antialiasing is in use. Both technologies conserve memory bandwidth, and the GeForce FX 5900 Ultra will have truckloads of memory bandwidth to start with.

  • Those funky NV30-style pixel pipes — Like its predecessor, the NV35 has a four-pipeline design with two texturing units for traditional forms of rendering. In special cases, though, like stencil or Z rendering, the NV35 can process eight pixels per clock cycle. NVIDIA says this ability should improve performance in next-gen games and applications where advanced techniques like shadowing are used more frequently.

  • Swanky new drivers — NVIDIA has dubbed its upcoming 44.03 driver release "Detonator FX," and these new drivers promise much better performance in a range of games. In a no doubt related move, NVIDIA has reworked its texture filtering algorithms for techniques like anisotropic filtering in the 44.03 drivers.

    The texture filtering routines in previous GeForce FX driver revisions have been a source of controversy because NVIDIA's engineers cut some corners, sacrificing visual quality and technical correctness for performance. As a result, I'm unsure of how to interpret NVIDIA's claims about the Detonator FX drivers. They say these new drivers take a "motion-based approach" to eliminating common texture artifacts like sparkles. Thus, these algorithms are not just about "focusing on still images." The resulting algorithms are intended to produce a "best of both worlds" result for quality and performance.

    Now, I'm open to innovations in this arena if the end result is better looking graphics at higher frame rates. Technical correctness is not the end-all, be-all in my book. However, I'm a little skeptical about what NVIDIA is doing. The previous driver generations offered a relatively straightforward tradeoff between image quality and performance, and I'd like to see something more innovative. I will have to spend more quality time with these drivers before I know what to think of them. However, my initial impressions are positive. We'll have more on this topic soon.


UltraShadow allows developers to define a depth bounding box for shadows cast from a light source
  • Accelerated shadowing — This one probably belongs under the "new drivers" header, but we'll give it its own space since NVIDIA has given the feature its own marketing term: UltraShadow. UltraShadow is a means of cutting down on the amount of geometry required for stencil shadow volumes, which is one of the best and most common shadowing techniques in graphics today. Stencil shadow volumes produce very accurate shadows, but they require exponentially more geometry per scene. UltraShadow allows developers to define limits in 3D space—specifically, along the Z axis—where a light source might cast a shadow. By providing depth bounds for shadows from each light source, developers should be able to produce effectively shadowed scenes with less geometric complexity.

    And if they do it right, it shouldn't even look funny.

    I don't believe any new hardware is required to support UltraShadow, so the benefits of this technique ought to trickle down to all cards in GeForce FX line.

    NVIDIA has created an OpenGL extension to expose this capability to developers, and they say a patent is pending on the technology. DirectX will have to be extended to support UltraShadow, and we'll have to see whether and when that happens.

Those are the basics for NV35. The core clock rate of the GeForce FX 5900 Ultra will be 450MHz, down 50MHz from the NV30 chip in the GeForceFX 5800 Ultra. The GeForce FX 5900 Ultra 256MB should be available in June, if all goes as planned, for only $499 American money. A non-Ultra variant of the GeForce FX 5900 with 128MB should hit the market in June for $399, but core and memory clock speeds for that product have not been finalized. Tantalizingly, NVIDIA has plans to introduce a cheaper 128MB "value" card based on NV35, as well. This card should be available in July for the bargain-basement price of $299.

In all, the GeForce FX 5900 Ultra looks like one sweet graphics card, and it ought to be very fast, especially in current games. The big question in my mind is whether or not the thing will offer the right combination of price and performance to distract enthusiast's attention from ATI's Radeon 9600 and 9800 Pro cards. Speaking of which...