Inside the G70 GPU (continued)
![]() A close-up of a ROP pipeline. Source: NVIDIA. |
The NV40 has 16 ROPs and 16 pixel shaders, but we learned that a one-to-one arrangement is probably overkill when we saw the performance of the GeForce 6600 GT, whose four ROPs were no apparent bottleneck for its eight pixel shaders. For the G70, NVIDIA has elected to use 16 ROPs to process fragments coming from the chip's 24 pixel shaders. In very simple cases where only one texture or shader is being applied to a fragment, the G70 may prove to be no faster at pixel pushing than an NV40 at the same clock speed. In most common cases, though, the G70 should be faster.
Incidentally, like the NV40's, the G70's ROP pipelines can write one Z (depth) value and one color value per clock. In cases where writing a color value isn't necessary, the color ROP unit can instead write a second Z value. This double-speed Z capability is useful in accelerating common shadowing techniques and, NVIDIA has often said, is one reason why its GPUs excel in Doom 3.
The G70 has been optimized to handle HDR lighting faster than the NV40 in a couple of ways. First, of course, is the additional pixel shader capacity. Beyond that, NVIDIA says the chip's FP16 texture fetch and filter capabilities are much faster, in part because texture caching has been optimized for high-precision textures. (I think that means mainly that the texture cache is larger, although NVIDIA wasn't long on specifics.) We can test this performance improvement pretty straightforwardly using the HDR rendering mode in Far Cry, and we'll do so.
Like the NV40, the G70 can do trilinear filtering and up to 16X anisotropic filtering for FP16 textures, but its ROPs still can't do multisampled antialiasing with FP16 color formats. Supersampling is possible, but likely painfully slow.
Another possibility for hardware acceleration of HDR effects is in the conversion of high-dynamic range images into a low-dynamic-range format, like 32-bit integer color, for output to a displayalso known as tone mapping. Tone mapping is required because today's displays don't have the dynamic range necessary to show HDR images. The NV40 and G70 have to use their pixel shaders to perform this task, and it could be accelerated in hardware. However, NVIDIA's David Kirk says he'd prefer to have more generally available shader power than separate, dedicated logic for tone mapping.
The second of the G70's tricks is an old-new thing: gamma-adjusted blends. ATI has been doing gamma-adjusted (or corrected, depending on who you ask) blends since the debut of the Radeon 9700, and we've repeatedly pointed out that doing so produces superior results. Now, the G70 offers this same basic capability with no performance hit. It's simply exposed as a control-panel option, although it's not enabled by default. Oddly enough, this feature is not available with the latest drivers on the NV40, so it's apparently new. We'll talk more about gamma-correct AA later, as well.
Also, because NVIDIA GPUs do load balancing between their video processing engine and their pixel shaders, the G70 should be able to deliver more power for video processing than any other NVIDIA GPU. This additional power could be especially helpful in accelerating the playback of high-definition video formats, including the new H.264 standard.
| Friday night topic: The trouble with Best Buy | 127 |