Inside the G70 GPU (continued)
- Fewer ROPs per shader The ROPs (raster operators) pictured in the lower portion of the main chip block diagram handle the task of converting fragments into pixels by doing multisample antialiasing (if needed), doing color and Z compression, and writing the completed pixel value out to the frame buffer. One felicitous quirk of the NV40/G70's architecture is the separation of the ROP subsystem from the pixel shaders. In the NV40/G70, the pixel shaders apply both shader effects and textures to pixel fragments. A crossbar switch situated between the pixel shaders and the ROPs makes sure that the ROPs are fully utilized by routing shaded and textured fragment data to any available ROP.
The NV40 has 16 ROPs and 16 pixel shaders, but we learned that a one-to-one arrangement is probably overkill when we saw the performance of the GeForce 6600 GT, whose four ROPs were no apparent bottleneck for its eight pixel shaders. For the G70, NVIDIA has elected to use 16 ROPs to process fragments coming from the chip's 24 pixel shaders. In very simple cases where only one texture or shader is being applied to a fragment, the G70 may prove to be no faster at pixel pushing than an NV40 at the same clock speed. In most common cases, though, the G70 should be faster.
Incidentally, like the NV40's, the G70's ROP pipelines can write one Z (depth) value and one color value per clock. In cases where writing a color value isn't necessary, the color ROP unit can instead write a second Z value. This double-speed Z capability is useful in accelerating common shadowing techniques and, NVIDIA has often said, is one reason why its GPUs excel in Doom 3.
- Faster high-dynamic-range effects The NV40 and G70 include hardware support for texture filtering and blending of color values in a frame buffer using high-precision floating-point math. This capability is key to achieving really solid performance with those gorgeous high-dynamic-range (HDR) lighting effects that are all the rage with the Xbox 360 and Playstation 3 kids these days. The NV40/G70 will do texture filtering and blending using 16 bits of floating-point precision per color channel, or 64 bits per pixel. (In fact, the color format is compatible with Pixar's OpenEXR standard.)
The G70 has been optimized to handle HDR lighting faster than the NV40 in a couple of ways. First, of course, is the additional pixel shader capacity. Beyond that, NVIDIA says the chip's FP16 texture fetch and filter capabilities are much faster, in part because texture caching has been optimized for high-precision textures. (I think that means mainly that the texture cache is larger, although NVIDIA wasn't long on specifics.) We can test this performance improvement pretty straightforwardly using the HDR rendering mode in Far Cry, and we'll do so.
Like the NV40, the G70 can do trilinear filtering and up to 16X anisotropic filtering for FP16 textures, but its ROPs still can't do multisampled antialiasing with FP16 color formats. Supersampling is possible, but likely painfully slow.
Another possibility for hardware acceleration of HDR effects is in the conversion of high-dynamic range images into a low-dynamic-range format, like 32-bit integer color, for output to a displayalso known as tone mapping. Tone mapping is required because today's displays don't have the dynamic range necessary to show HDR images. The NV40 and G70 have to use their pixel shaders to perform this task, and it could be accelerated in hardware. However, NVIDIA's David Kirk says he'd prefer to have more generally available shader power than separate, dedicated logic for tone mapping.
- Improved antialiasing The G70 retains many of the NV40's limitations when it comes to antialiasing, but it does have a couple of new tricks up its sleeve. The first of those tricks is what NVIDIA calls transparency antialiasing. Transparency AA handles the all-too-common case where a texture with a transparency in it shows ugly, jagged outline edges, even though multisampled antialiasing is in use. We will discuss the specifics of transparency AA, with examples, later in the review.
The second of the G70's tricks is an old-new thing: gamma-adjusted blends. ATI has been doing gamma-adjusted (or corrected, depending on who you ask) blends since the debut of the Radeon 9700, and we've repeatedly pointed out that doing so produces superior results. Now, the G70 offers this same basic capability with no performance hit. It's simply exposed as a control-panel option, although it's not enabled by default. Oddly enough, this feature is not available with the latest drivers on the NV40, so it's apparently new. We'll talk more about gamma-correct AA later, as well.
- Purer video capabilities The NV40 was supposed to have a very nice video processing engine built into it that would accelerate the encoding and decoding of advanced video formats like WMV. However, that first attempt at a video processing engine apparently wasn't up to the task, and the NV40 thus lacked some of the video acceleration capabilities NVIDIA had originally claimed it would have. Later chips in the NV4x series got a revised version of the video processing engine that worked as planned, and NVIDIA packaged these capabilities into a brand it calls PureVideo. With the G70, NVIDIA's high-end cards finally get a proper PureVideo implementation.
Also, because NVIDIA GPUs do load balancing between their video processing engine and their pixel shaders, the G70 should be able to deliver more power for video processing than any other NVIDIA GPU. This additional power could be especially helpful in accelerating the playback of high-definition video formats, including the new H.264 standard.
- Tweaks here and there The G70 has a number of other minor tweaks throughout, including better compression in the memory controller and up to 30% more efficiency in the cull and setup portion of the geometry piperight below the vertex shaders in the main chip diagram.