Single page Print

NVIDIA's GeForce 6200 with TurboCache

Graphics memory gets hierarchical

A WHILE BACK, we reviewed the GeForce 6200, and we were a bit perplexed. The GeForce 6200 was a success on most fronts. It brought GeForce 6-class features to graphics cards in the $129 range, including DirectX 9 with Shader Model 3.0¬óno small feat. Performance was quite good, and the product made sense overall, save for one thing: the 6200 was based on the NV43 GPU, the same graphics chip used in GeForce 6600 cards, but with much of its rendering power disabled. Graphics companies sell tons of low-end GPUs. Why would NVIDIA manufacture a bunch of NV43 graphics chips, which are relatively larger chips, only to cut them down to half the rendering power?

The answer, it turns out, is pretty simple: that version of the GeForce 6200 was just a stop-gap measure, not the real thing. The real GeForce 6200 is based on a new and much smaller chip, the NV44, with an intriguing new technology, dubbed TurboCache, that allows graphics cards to use system memory in combination with a smaller amount of local graphics RAM to deliver decent low-end performance. Read on for our take on NVIDIA's new low-end GPU.

The NV44 GPU
The GeForce 6200 with TurboCache is indeed a scaled-down version of the NV4x architecture that powers the entire line of GeForce 6 graphics cards, but it's not just that. NVIDIA has reworked the memory management portions of the NV4x pipeline in order to allow this new NV44 GPU to use system memory for rendering. Here's a block diagram of the NV44.

Block diagram of the NV44 GPU. Source: NVIDIA.

The NV44 is a new and substantially smaller chip than its predecessors. It packs three vertex shader engines, just like the NV43, but has only four pixel shader pipelines. Those pixel shader units handle both programmable pixel shading and texturing, and they are linked to a pair of raster operators, or ROPs, by a fragment crossbar. This crossbar acts like a load-balancer, sending pixel fragments to available ROPs as needed. The ROP then writes the pixel result to memory. Thus, the NV44 is limited to writing two pixels per clock, but it's able to keep the ROPs working by feeding them from four pixel pipes. This architecture is a little bit different from the usual arrangement, in which each pixel pipeline is directly connected to the raster operators, but it's proven very effective in the GeForce 6600 line.

The ROP subsystem is the place where NVIDIA has saved the most transistors on the GeForce 6200. In addition to cutting the number of ROPs down to two (from 16 in the NV40 and four in NV43), NVIDIA's engineers have scaled back some of the ROP pipeline's capabilities in NV44, removing support for color compression, Z compression, and OpenEXR 16-bit floating-point blending and filtering. The removal of color compression will primarily affect performance with multisampled antialiasing, while Z compression will cost some performance overall. OpenEXR blending and filtering is nice to have, but this thing won't be fast enough to handle 16-bit FP blending in real time graphics, anyhow.

All told, NVIDIA was able to reduce the NV44 to a comparatively tiny 75 million (or so) transistors¬ómuch less than the 222 million estimated transistors on the GeForce 6800 Ultra.

The NV44 GPU

More importantly, the NV44's die size is much smaller than other NV4x chips. By my measurements, the chip is 10mm by 11mm, or 110mm2. The NV43, for comparison, is about 156mm2. Both chips are manufactured by TSMC on its 110nm fab process, so the die size difference is the result of the trimmed-down ROPs and pixel shader pipes.