What's new, what's not
For graphics freaks like us, one of the most exciting developments of the past few weeks has been NVIDIA's new willingness to divulge information about the internals of its graphics processors. The NVIDIA media briefings on GeForce 6800 were chock full of block diagrams of the NV40 chip, its internal units, and all kinds of hairy detail about how things worked. This was a big change from the NV30 days, to say the least. We now have a pretty good idea how the internals of the NV40 look. Of course, ATI has long been fairly open about its the R300 architecture, but NVIDIA's newfound openness has forced ATI's hand a bit. As a result, we now have a clearer understanding about the amount of internal parallelism and the richness of features in ATI's graphics processors—both the new X800 (code-named R420) and the R300 series from which it's derived.

In the R420, the ATI R3xx core has been reworked to allow for higher performance and better scalability. Let's have a look at some of the key differences between the R420 and its predecessors.

  • More parallelism — The X800's pixel pipelines have been organized into sets of four, much like those in NVIDIA's NV40 chip. These pixel pipe quads can be enabled or disabled as needed, so the X800 chip can scale from four pipes to eight, twelve, or sixteen, depending on chip yields and market demands.


    An overview of the Radeon X800 architecture. Source: ATI.


    The X800's four pixel pipe quads. Source: ATI.

    In addition to more pixel pipes, the Radeon X800 has two more vertex shader units than the Radeon 9800 XT, for a total of six. Combined with higher clock speeds, ATI is claiming the X800 has double the vertex shader power of the Radeon 9800 XT. Although the X800 Pro will have one of its pixel pipeline quads disabled, it will retain all six vertex shader units intact.


    The Radeon X800's vertex shader units. Source: ATI.

    One more thing. You may recall that NVIDIA's recent GPUs, including the NV3x and NV4x chips, can render additional pixels per clock when doing certain types of rendering, like Z or stencil operations. This has led to the NV40 being called a "16x1 or 32x0" design. Turns out ATI's chips can render two Z/stencil ops per pipeline per clock, as well, so long as antialiasing is enabled.

  • Better performance at higher resolutions — Each pixel quad in the R420 has its own Z compression and hierarchical Z capability, including its own local cache. ATI has sized these caches to allow for Z-handling enhancements to operate at resolutions up to 1920x1080. (The R300's global Z cache was sized for resolutions up to 1600x1200, the RV3xx-series chips' to less.) Also, on the R420, if the screen resolution is too high for the Z cache to accommodate everything, the chip will use its available Z cache to render a portion of the screen, rather than simply turning off Z enhancements.


    Detail of a single four-pipeline quad. Source: ATI.

    ATI's engineers have also boosted the X800's peak Z compression ratio from 4:1 to 8:1, making the maximum possible peak compression (with 6X antialiasing and color compression) 48:1, but that's just a fancy number they like to throw around to impress the ladies.

  • A new memory interface — One of the Radeon 9800 XT's few weaknesses was its relatively slow memory speeds. The GeForce FX 5950 Ultra had 950MHz memory, while the Radeon 9800 XT's RAM topped out at 730MHz. Part of the reason for this disparity, it turns out, was the chip's memory interface, which didn't like high clock speeds. ATI has addressed this problem by giving the X800 GPU a new memory interface capable of clock speeds up to 50% higher than the 9800 XT.


    The Radeon X800's crossbar memory controller. Source: ATI.

    From 10,000 feet up, this new setup doesn't look dramatically different from the 9800 XT's; it's a crossbar type design with four 64-bit data channels talking over a switch to four independent memory controllers. There are some important differences, though, beyond higher clock speeds. First and foremost, this memory interface can make use of the swanky new GDDR3 memory type that ATI helped create. Also, ATI says this new memory interface is more efficient, and it offers extensive tuning options. If a given application (or application type) typically accesses memory according to certain patterns, ATI's driver team may be able to reconfigure the memory controllers to perform better in that type of app.

  • Longer shader programs — The X800's pixel shaders still have 24 bits of floating-point color precision for each color channel (red, green, blue, and alpha), and they do not have the branching and looping capabilities of the pixel shaders in the NV40. They do, however, have the ability to execute shader programs with longer instruction lengths, up from 160 in the 9800 XT to 1,536 in the X800. The X800's revised pixel shaders also have some register enhancements, including more temporary registers (up from 12 to 32). We'll look at the question of pixel shaders and shader models in more detail below.

  • 3Dc normal map compression — Normal maps are simply grown-up bump maps. Like bump maps, they contain information about the elevation of a surface, but unlike bump maps, they use three-component coordinate system to describe a surface, with X, Y, and Z coordinates. Game developers are now commonly taking high-polygon models and generating from them two things: a low-poly mesh and a normal map. When mated back together inside a graphics card, these elements combine to look like a high-poly model, but they're much easier to handle and render.

    Trouble is, like all textures, normal maps tend to chew up video memory, but normal maps don't tolerate well the compression artifacts caused by compression algorithms like DirectX Texture Compression (DXTC). If a normal map becomes blocky, the perceived elevation of a surface will become blocky and uneven, ruining the effect. ATI has tackled this problem by adapting the DTXC algorithm for alpha channel compression to work on normal maps. Specifically, the DXT5 alpha compression algorithm is used on the red and green channels, which store X and Y coordinate info, respectively. (Z values are discarded and computed later in the pixel shader.) This format is reasonably well suited for normal maps, and offers 4:1 compression ratios. Like any texture compression method, it should allow the use of higher resolution textures in a given amount of texture memory.


    3Dc versus DTXC color compression. Source: ATI.

    The X800 GPU supports this method of normal map compression, dubbed 3Dc, in hardware. Both DirectX and OpenGL support 3Dc via extensions, and game developers should be able to take advantage of it with minimal effort.

  • Temporal antialiasing — This new antialiasing feature is actually a driver trick that exploits the programmability of ATI's antialiasing hardware. We'll discuss it in more detail in the antialiasing section of the review.
Those are some of the more notable changes and tweaks ATI has made to the X800. We'll discuss some of them in more detail below, and we'll see the impact of the performance tweaks, in particular, in our benchmark results.