Single page Print

ATI's Radeon X1900 series graphics cards


I've got a fever, and the only prescription is... more pixel shaders
— 8:02 AM on January 24, 2006

SO YOU'RE SITTING IN your executive suite just outside of Toronto, reading your favorite website's stellar review of the GeForce 7800 GTX 512. This new GPU from your main competitor is putting the hurt on your company's own high-end offering, the Radeon X1800 XT. In fact, the contest is more lopsided than the NFC championship game. Not good. What can you do about it?

If you're a bigwig at ATI, you've got quite a few arrows in your quiver. You have a next-generation GPU design just recently introduced, loaded with the latest bells and whistles. You've already made the conversion to a 90nm chip fabrication process, so your transistor budgets are ample. And you have a pretty good idea what it might take to win back the performance crown. You'd probably order up a new chip with a whole lot more power in order to meet the competition head on. You'd want to do something gaudy, something that would be sure to raise eyebrows and also pack a heckuva wallop.

48 pixel shaders ought to do it, don't you think?

That's exactly how many pixel shader units ATI has packed into its new GPU, the Radeon X1900. Yes, you read that right: for-tee eight.

If you really wanted to make a splash, perhaps you'd hook two of them together into a CrossFire configuration for a total of 96 pixel shaders churning out eye candy by the bucketload. That oughta show 'em. And then you'd price 'em nice and high, but make sure that cards were widely available on their launch day, with thousands of those puppies lined up at online retailers, ready to sell.

Sounds like a plan to me. In fact, that is very much ATI's plan for the Radeon X1900 series, and your favorite website has benchmarked the stuffing out of the high-end lineups from ATI and NVIDIA in order to see how these new entries fit into the picture. With 96 pixel shaders tearing through F.E.A.R. like Michael Moore through a loaf of cheese bread, does NVIDIA stand a chance?

R580 pours on the pixel shaders
The chip that powers ATI's new Radeon X1900 lineup has been known through most of its life until now by its codename, R580. The R580 is the successor to the R520 GPU that powers the Radeon X1800 series of products, and it's derived from the same basic chip architecture. The only truly major change in the R580 is the expansion of the number of pixel shader units on the GPU. Since R500-class products are very modular internally, ATI can strategically add resources with relative ease. In a nutshell, that's how we've arrived at this mind-boggling situation where there are 48 pixel shader units on a single GPU.


A block diagram of R580's shader core. Source: ATI.

The R580 block diagram above is very much a statement from ATI about where they think PC graphics is going. Like the mid-range Radeon X1600, the R580 is a radically asymmetrical design, heavy on the pixel shaders and general-purpose registers but relatively easy on some of the resources that have traditionally gone with them. ATI clearly believes that game developers will be making extensive use of the computational power of pixel shaders in future games, and they have spent the R580's transistor budget accordingly. To give you a better sense of what I'm talking about, have a look at the table below, which shows how the R580 stacks up against the competition in handling various stages of what used to be the pixel pipeline.

Vertex
shaders
Pixel
shaders
Texture
units
Render
back-ends
Z
compare
Max.
threads
Radeon X1300 (RV515) 4 4 4 4 4 128
Radeon X1600 (RV530) 5 12 4 4 8 128
Radeon X1800 (R520) 8 16 16 16 16 512
Radeon X1900 (R580) 8 48 16 16 16 512
GeForce 6200 (NV44) 3 4 4 2 2/4 ?
GeForce 6600 (NV43) 3 8 8 4 8/16 ?
GeForce 6800 (NV41) 5 12 12 8 8/16 ?
GeForce 7800 GT (G70) 7 20 20 16 16/32 ?
GeForce 7800 GTX (G70) 8 24 24 16 16/32 ?


R580 is slightly smaller than G70

The R580 offers no more power than the R520 in terms of its abilities to apply textures to pixels, do depth comparisons, or write pixels to a frame buffer. Also, like the R520, the R580 has eight vertex shader units—no changes there.

So the basic story is: more pixel shaders. At 650MHz, the Radeon X1900 XTX should have at its disposal 31.2 billon pixel shader cycles per second. Pixel shader units and their computational capabilities vary greatly from one GPU architecture to the next, of course, but the GeForce 7800 GTX 512's 24 pixel shaders running at 550MHz will only reach 13.2 billion cycles per second. The R580's rich endowment of shaders is naturally a very good thing, because pixel shaders enable developers to employ all kinds of neat tricks like parallax occlusion mapping and high-dynamic-range lighting to make real-time graphics look more realistic. This bias toward shader power makes the R580 a very forward-looking design, and as is often the case with forward-looking designs, there may be a little bit of trouble with today's applications. Now, don't get me wrong here. A chip this capable can run just about any current game very well, but it may not reach its fullest potential while running them. Meanwhile, it has to compete with the likes of the GeForce 7800 GTX 512, which crams in more texturing capability per clock cycle than the R580—something to keep in mind when we turn toward the benchmark results.

This major upgrade in pixel shader power isn't the only tweak to the R580, though. ATI has also added a couple of minor wrinkles to boost performance in specific situations.

You may recall how we noted a while ago that design limitations prevent some GPUs from achieving optimal performance at really high resolutions. This limitation particularly affects GeForce 6 series GPUs; their performance drops off markedly at resolutions above two megapixels, like 1920x1440 or 2048x1536. NVIDIA isn't saying exactly what all is involved, but certain internal buffers or caches on the chip aren't sized to handle more than that. Recent ATI graphics chips, including the Radeon X1800 series, have a similar limitation: their Hierarchical Z buffer can only handle up to two megapixels of resolution. The performance impact isn't as stark as on the GeForce 6, but these Radeons can only use Hierarchical Z on a portion of the screen at very high resolutions; the rest of the screen must be rendered less efficiently. The R580 has a 50% larger on-chip buffer that raises that limit to three megapixels, so super-high-definition graphics should run more efficiently on Radeon X1900 cards.

The R580's other new trick is a simple but effective optimization. The GPU's texturing units are designed primarily to fetch traditional four-component textures—with red, green, blue, and alpha—from memory. However, some textures have only a single component, such as those used to store depth values for popular techniques like shadow maps. The R580's new Fetch4 capability allows it to fetch four adjacent values from a single-component texture at once, potentially raising texture sampling rates substantially.

Neither of these optimizations will make the R520 look out-of-date overnight, but they should be nice to have.

So what is required to cram these tweaks plus 48 pixel shaders into a single chip? Lots of everything—a 90nm fab process, roughly 384 million transistors, and over 314 mm2 of die space. The table below has some rough estimates; caveats to follow.

Transistors
(Millions)
Process
size (nm)
Approx.
Die size
(sq. mm)
Radeon X1300 (RV515) 105 90 95
Radeon X1600 (RV530) 157 90 132
Radeon X1800 (R520) 321 90 263
Radeon X1900 (R580) 384 90 314.5
GeForce 6200 (NV44) 75 110 110
GeForce 6600 (NV43) 143 110 156
GeForce 6800 (NV41) 190 110 210
GeForce 7800 (G70) 302 110 333

The transistor counts are from ATI and NVIDIA, with each company giving numbers for its own chips. Unfortunately, transistor counts are a source of consternation for both sides, because they seem to count transistors differently from one another. In other words, one should never, ever put those numbers together in a comparison table like the one above. That would be wrong.

Also, the die size numbers are based on my own plastic-ruler measurements of the chips. There are probably more accurate ways of getting this information, like random guessing. I've offered my numbers for whatever they're worth.

However you slice it, the R580 is a big chip. Even though it's fabbed using a 90nm process, it's nearly as large as NVIDIA's G70, and even with a disparity in counting methods, the R580 clearly must have many more transistors than the G70.