If you're a bigwig at ATI, you've got quite a few arrows in your quiver. You have a next-generation GPU design just recently introduced, loaded with the latest bells and whistles. You've already made the conversion to a 90nm chip fabrication process, so your transistor budgets are ample. And you have a pretty good idea what it might take to win back the performance crown. You'd probably order up a new chip with a whole lot more power in order to meet the competition head on. You'd want to do something gaudy, something that would be sure to raise eyebrows and also pack a heckuva wallop.
48 pixel shaders ought to do it, don't you think?
That's exactly how many pixel shader units ATI has packed into its new GPU, the Radeon X1900. Yes, you read that right: for-tee eight.
If you really wanted to make a splash, perhaps you'd hook two of them together into a CrossFire configuration for a total of 96 pixel shaders churning out eye candy by the bucketload. That oughta show 'em. And then you'd price 'em nice and high, but make sure that cards were widely available on their launch day, with thousands of those puppies lined up at online retailers, ready to sell.
Sounds like a plan to me. In fact, that is very much ATI's plan for the Radeon X1900 series, and your favorite website has benchmarked the stuffing out of the high-end lineups from ATI and NVIDIA in order to see how these new entries fit into the picture. With 96 pixel shaders tearing through F.E.A.R. like Michael Moore through a loaf of cheese bread, does NVIDIA stand a chance?
R580 pours on the pixel shaders
The chip that powers ATI's new Radeon X1900 lineup has been known through most of its life until now by its codename, R580. The R580 is the successor to the R520 GPU that powers the Radeon X1800 series of products, and it's derived from the same basic chip architecture. The only truly major change in the R580 is the expansion of the number of pixel shader units on the GPU. Since R500-class products are very modular internally, ATI can strategically add resources with relative ease. In a nutshell, that's how we've arrived at this mind-boggling situation where there are 48 pixel shader units on a single GPU.
The R580 block diagram above is very much a statement from ATI about where they think PC graphics is going. Like the mid-range Radeon X1600, the R580 is a radically asymmetrical design, heavy on the pixel shaders and general-purpose registers but relatively easy on some of the resources that have traditionally gone with them. ATI clearly believes that game developers will be making extensive use of the computational power of pixel shaders in future games, and they have spent the R580's transistor budget accordingly. To give you a better sense of what I'm talking about, have a look at the table below, which shows how the R580 stacks up against the competition in handling various stages of what used to be the pixel pipeline.
|Radeon X1300 (RV515)||4||4||4||4||4||128|
|Radeon X1600 (RV530)||5||12||4||4||8||128|
|Radeon X1800 (R520)||8||16||16||16||16||512|
|Radeon X1900 (R580)||8||48||16||16||16||512|
|GeForce 6200 (NV44)||3||4||4||2||2/4||?|
|GeForce 6600 (NV43)||3||8||8||4||8/16||?|
|GeForce 6800 (NV41)||5||12||12||8||8/16||?|
|GeForce 7800 GT (G70)||7||20||20||16||16/32||?|
|GeForce 7800 GTX (G70)||8||24||24||16||16/32||?|
So the basic story is: more pixel shaders. At 650MHz, the Radeon X1900 XTX should have at its disposal 31.2 billon pixel shader cycles per second. Pixel shader units and their computational capabilities vary greatly from one GPU architecture to the next, of course, but the GeForce 7800 GTX 512's 24 pixel shaders running at 550MHz will only reach 13.2 billion cycles per second. The R580's rich endowment of shaders is naturally a very good thing, because pixel shaders enable developers to employ all kinds of neat tricks like parallax occlusion mapping and high-dynamic-range lighting to make real-time graphics look more realistic. This bias toward shader power makes the R580 a very forward-looking design, and as is often the case with forward-looking designs, there may be a little bit of trouble with today's applications. Now, don't get me wrong here. A chip this capable can run just about any current game very well, but it may not reach its fullest potential while running them. Meanwhile, it has to compete with the likes of the GeForce 7800 GTX 512, which crams in more texturing capability per clock cycle than the R580something to keep in mind when we turn toward the benchmark results.
This major upgrade in pixel shader power isn't the only tweak to the R580, though. ATI has also added a couple of minor wrinkles to boost performance in specific situations.
You may recall how we noted a while ago that design limitations prevent some GPUs from achieving optimal performance at really high resolutions. This limitation particularly affects GeForce 6 series GPUs; their performance drops off markedly at resolutions above two megapixels, like 1920x1440 or 2048x1536. NVIDIA isn't saying exactly what all is involved, but certain internal buffers or caches on the chip aren't sized to handle more than that. Recent ATI graphics chips, including the Radeon X1800 series, have a similar limitation: their Hierarchical Z buffer can only handle up to two megapixels of resolution. The performance impact isn't as stark as on the GeForce 6, but these Radeons can only use Hierarchical Z on a portion of the screen at very high resolutions; the rest of the screen must be rendered less efficiently. The R580 has a 50% larger on-chip buffer that raises that limit to three megapixels, so super-high-definition graphics should run more efficiently on Radeon X1900 cards.
The R580's other new trick is a simple but effective optimization. The GPU's texturing units are designed primarily to fetch traditional four-component textureswith red, green, blue, and alphafrom memory. However, some textures have only a single component, such as those used to store depth values for popular techniques like shadow maps. The R580's new Fetch4 capability allows it to fetch four adjacent values from a single-component texture at once, potentially raising texture sampling rates substantially.
Neither of these optimizations will make the R520 look out-of-date overnight, but they should be nice to have.
So what is required to cram these tweaks plus 48 pixel shaders into a single chip? Lots of everythinga 90nm fab process, roughly 384 million transistors, and over 314 mm2 of die space. The table below has some rough estimates; caveats to follow.
|Radeon X1300 (RV515)||105||90||95|
|Radeon X1600 (RV530)||157||90||132|
|Radeon X1800 (R520)||321||90||263|
|Radeon X1900 (R580)||384||90||314.5|
|GeForce 6200 (NV44)||75||110||110|
|GeForce 6600 (NV43)||143||110||156|
|GeForce 6800 (NV41)||190||110||210|
|GeForce 7800 (G70)||302||110||333|
The transistor counts are from ATI and NVIDIA, with each company giving numbers for its own chips. Unfortunately, transistor counts are a source of consternation for both sides, because they seem to count transistors differently from one another. In other words, one should never, ever put those numbers together in a comparison table like the one above. That would be wrong.
Also, the die size numbers are based on my own plastic-ruler measurements of the chips. There are probably more accurate ways of getting this information, like random guessing. I've offered my numbers for whatever they're worth.
However you slice it, the R580 is a big chip. Even though it's fabbed using a 90nm process, it's nearly as large as NVIDIA's G70, and even with a disparity in counting methods, the R580 clearly must have many more transistors than the G70.