What's new, what's not
For graphics freaks like us, one of the most exciting developments of the past few weeks has been NVIDIA's new willingness to divulge information about the internals of its graphics processors. The NVIDIA media briefings on GeForce 6800 were chock full of block diagrams of the NV40 chip, its internal units, and all kinds of hairy detail about how things worked. This was a big change from the NV30 days, to say the least. We now have a pretty good idea how the internals of the NV40 look. Of course, ATI has long been fairly open about its the R300 architecture, but NVIDIA's newfound openness has forced ATI's hand a bit. As a result, we now have a clearer understanding about the amount of internal parallelism and the richness of features in ATI's graphics processorsboth the new X800 (code-named R420) and the R300 series from which it's derived.
In the R420, the ATI R3xx core has been reworked to allow for higher performance and better scalability. Let's have a look at some of the key differences between the R420 and its predecessors.


In addition to more pixel pipes, the Radeon X800 has two more vertex shader units than the Radeon 9800 XT, for a total of six. Combined with higher clock speeds, ATI is claiming the X800 has double the vertex shader power of the Radeon 9800 XT. Although the X800 Pro will have one of its pixel pipeline quads disabled, it will retain all six vertex shader units intact.

One more thing. You may recall that NVIDIA's recent GPUs, including the NV3x and NV4x chips, can render additional pixels per clock when doing certain types of rendering, like Z or stencil operations. This has led to the NV40 being called a "16x1 or 32x0" design. Turns out ATI's chips can render two Z/stencil ops per pipeline per clock, as well, so long as antialiasing is enabled.

ATI's engineers have also boosted the X800's peak Z compression ratio from 4:1 to 8:1, making the maximum possible peak compression (with 6X antialiasing and color compression) 48:1, but that's just a fancy number they like to throw around to impress the ladies.

From 10,000 feet up, this new setup doesn't look dramatically different from the 9800 XT's; it's a crossbar type design with four 64-bit data channels talking over a switch to four independent memory controllers. There are some important differences, though, beyond higher clock speeds. First and foremost, this memory interface can make use of the swanky new GDDR3 memory type that ATI helped create. Also, ATI says this new memory interface is more efficient, and it offers extensive tuning options. If a given application (or application type) typically accesses memory according to certain patterns, ATI's driver team may be able to reconfigure the memory controllers to perform better in that type of app.
Trouble is, like all textures, normal maps tend to chew up video memory, but normal maps don't tolerate well the compression artifacts caused by compression algorithms like DirectX Texture Compression (DXTC). If a normal map becomes blocky, the perceived elevation of a surface will become blocky and uneven, ruining the effect. ATI has tackled this problem by adapting the DTXC algorithm for alpha channel compression to work on normal maps. Specifically, the DXT5 alpha compression algorithm is used on the red and green channels, which store X and Y coordinate info, respectively. (Z values are discarded and computed later in the pixel shader.) This format is reasonably well suited for normal maps, and offers 4:1 compression ratios. Like any texture compression method, it should allow the use of higher resolution textures in a given amount of texture memory.

The X800 GPU supports this method of normal map compression, dubbed 3Dc, in hardware. Both DirectX and OpenGL support 3Dc via extensions, and game developers should be able to take advantage of it with minimal effort.
| Friday night topic: The trouble with Best Buy | 141 |