The R500 family
Because the R500 series decouples the different stages of the rendering pipeline from one another, ATI's chip designers have been able to allocate their transistor budgets in some interesting ways. We've seen this sort of thing before with NVIDIA's newer chips. For instance, the GeForce 6600 notably includes eight pixel shaders but only four "back end" units, or ROPs. Here's a quick look at how ATI has divvied up execution resources inside of the first three R500-series GPUs and how the competing NVIDIA chips compare.
|Radeon X1300 (RV515)||4||4||4||4||4||128|
|Radeon X1600 (RV530)||5||12||4||4||8||128|
|Radeon X1800 (R520)||8||16||16||16||16||512|
|GeForce 6200 (NV44)||3||4||4||2||2/4||?|
|GeForce 6600 (NV43)||3||8||8||4||8/16||?|
|GeForce 6800 (NV41)||5||12||12||8||8/16||?|
|GeForce 7800 GT (G70)||7||20||20||16||16/32||?|
|GeForce 7800 GTX (G70)||8||24||24||16||16/32||?|
Comparing these very different architectures from ATI and NVIDIA isn't an exact science, of course, but I've tried to get it right. Because we don't know precisely how many threads each NVIDIA GPU can handle, I've left that question unanswered in the table. Also, the number of Z compares per clock gets tricky. As you can see, I've put a split into the NVIDIA GPUs' Z compare capabilities. That's because the NV4x and G70 GPUs can handle either one Z/stencil and one color pixel per clock, or they can do two Z/stencil pixels per clock. It's a tradeoff. The R500 series actually has a similar quirk; it can do two Z pixels per clock when multisampled antialiasing is enabled, twice as many as indicated in the table.
Anyhow, if we concentrate on the new ATI GPUs, we can see a couple of very straight-laced designs in the RV515 and the R520. Both have the same number of pixel shaders, texture units, render back-ends, and Z-compare unitsall nicely balanced and politically correct, though not necessarily the optimal use of transistors.
Then we have the wild one of the bunch, the RV530. With five vertex shaders, 12 pixel shaders, eight Z-compare units, and only four texture units and render back ends, this design is wildly asymmetrical and generally disrespectful to its elders. The RV530 may also be a more optimal means of spending a transistor budget than the other two chips, although it is a pretty radical departure from the norm. The RV530 is something of a statement by ATI about where games are going, and the emphasis is decidedly on shaders. The X1600 cards based on this chip should be especially good at shader-heavy games with lots of complex geometry and shadowing, but they may be relatively weak in older games that rely on lots of texturing or just raw fill rate. RV530-based products may also suffer when subjected to the rigors of heavy anisotropic filtering or edge antialiasing, as in our test suite for this article. Whoops.
NVIDIA has gradually embraced more asymmetry between pixel shaders and other resources in its GPUs over time, culminating in the G70's use of 24 pixel shaders and 16 ROPs. As far as we know, NVIDIA's current architectures don't decouple everything ATI's new designs do. For instance, the number of texturing units is tied to the number of pixel shaders, and the render back-end and Z-compare unit are tied together inside what NVIDIA calls a ROP. Perhaps for this reason, or perhaps because it's just crazy, no GPU from NVIDIA has ever had a 3:1 ratio of pixel shaders to render back-ends like the RV530. I'll be curious to see whether the RV530 succeeds; it may be too far ahead of its time.
It's not in the table above, but I should mention the memory controllers on the three chips. R520 has the full 512-bit internal, 256-bit external memory controller with a ring topology. RV530's memory controller retains the ring topology but halves the bandwidth to 256 bits internally and 128 bits to memory. The humble RV515 doesn't have a ring-style memory controller, but it can support one, two, or four 32-bit memory channels.
Speaking of transistor budgetswhich is a sure-fire way to pick up the chickslet's have a look at where ATI's three new chips wound up.
|Radeon X1300 (RV515)||105||90||95|
|Radeon X1600 (RV530)||157||90||132|
|Radeon X1800 (R520)||321||90||263|
|GeForce 6200 (NV44)||75||110||110|
|GeForce 6600 (NV43)||143||110||156|
|GeForce 6800 (NV41)||190||110||210|
|GeForce 7800 (G70)||302||110||333|
No table like this one would be possible without a heaping helping of disclaimers, so let's get started. First, it seems that ATI and NVIDIA estimate transistor counts using different methods, so the numbers here aren't necessarily entirely comparable. I didn't count them myself. Second, the die size measurements you see were produced by me, and are not entirely, 100% accurate. I used a plastic ruler, and I didn't measure fractions of a millimeter beyond the occasional .5 increment when really obvious. That said, these numbers should be more accurate than some others I've seen bandied about, so there you go.
Obviously, ATI's move to 90nm process tech gives it the ability to squeeze in more transistors per square millimeter, as the numbers suggest. Die size is related pretty directly to the cost of producing a chip, so ATI looks to have an advantage in each segment of the market. However, that advantage may be mitigated by less-than-stellar yields on these 90nm chips, so who knows?
The real difficulty of handicapping things here comes in trying to sort out which of these new GPUs competes with which chip from NVIDIA. Truth be told, NVIDIA has already taped out multiple GPUs, presumably lower-end GeForce 7-series parts, to compete with ATI's new offerings. Only the G70 and the R520 are sure-fire direct competitors from the same generation. On that front, note that the G70 packs 24 pixel shader pipelines into only 302 million transistors, while the R520's sixteen pipes weigh in at 321 million transistors. That's quite the difference. NVIDIA says the G70 would translate to about 226 mm2 at 90nm, were it to make the leap. The G70 hasn't made that leap, though.
In terms of transistor counts and basic capabilities, the RV530 falls somewhere between the NV41 chip powering the GeForce 6800 and the NV43 GPU on GeForce 6600 cards. Similarly, the RV515 falls between the NV43 and NV44, so direct competitors among NVIDIA's GPUs aren't easily identified. That leaves us to compare the actual cards based on these chips on the basis of price and performance, which is what we'll do next.