Two weeks ago today, Rich Heye, VP and GM of ATI’s desktop business unit, stood up in front of a room full of skeptical journalists and attempted to defuse those concerns. The problem with R520, he told us, with neither a snag caused by TSMC’s 90nm process tech nor a fundamental design issue. The chip was supposed to launch in June, he said, but was slowed by a circuit design buga simple problem, but one that was repeated throughout the chip. Once ATI identified the problem and fixed it, the R520 gained 150MHz in clock frequency. That may not sound like much if you’re thinking of CPUs, but in the world of 16-pipe-wide graphics processors, 150MHz can make the difference between competitive success and failure.
With those concerns addressed, ATI proceeded to unveil not just R520, but a whole family of Radeon graphics products ranging from roughly $79 to $549, based on three new GPUs that share a common heritage. It is one of the most sweeping product launches we’ve ever seen in graphics, intended to bring ATI up to feature parity with NVIDIAand then some. Read on as we delve into the technology behind ATI’s new GPU lineup and then test its performance head to head against its direct competition.
Shader Model 3.0 and threading
Probably the most notable feature of the ATI R500-series graphics architecture is its support for the Shader Model 3.0 programming model. Shader Model 3.0 lives under the umbrella of Microsoft’s DirectX 9 API, the games and graphics programming interface for Windows. SM3.0 is the most advanced of several programming models built into DX9, and it’s the one used by all NVIDIA products in the GeForce 6 and 7 series product lines. SM3.0’s key features include a more CPU-like programming model for pixel shaders, the most powerful computational units on a GPU. SM3.0 pixel shaders must be able to execute longer programs, and they must support dynamic flow control within those programsthings such as looping and branching with conditionals. These pixel shaders must also do their computational work with 128-bits floating-point precision32-bits of floating-point precision per color channel for the red, green, blue, and alpha.
ATI’s new GPUs support all of these things, including 32-bit precision per color channel. That’s a step up in precision from ATI’s previous DirectX 9-class graphics processors, all of which did internal pixel shader calculations with 24 bits of FP precision. Unlike NVIDIA’s recent GPUs, the R500 series’ pixel shaders will not accept a “partial precision” hint from the programmer and cut back pixel shader precision to 16-bits per channel for some calculations in order to save on resources like internal chip register space. Instead, R500 GPUs do all pixel shader calculations with 32-bit precision. The GPU can, of course, store data in lower precision texture formats, but the internal pixel shader precision doesn’t change.
The move from 24 to 32 bits of precision establishes a nice baseline for the future, but virtually no applications have yet layered enough rendering passes on top of one another to cause 24-bit precision to become a problem. As we have learned over the life of the GeForce 6 and 7 series, Shader Model 3.0’s true value doesn’t come in the form of visual improvements over the more common SM2.0, but in higher performance. The same sort of visual effects possible in SM3.0 are generally possible in 2.0, but they’re not always possible in real time. Through judicious use of longer shader programs with looping and dynamic branching, applications may use SM3.0 to enable new forms of eye candy.
In order to take best advantage of Shader Model 3.0’s capabilities, ATI has equipped the R520’s pixel shader engine with the scheduling and control logic capable of handling up to 512 parallel threads. Threads are important in modern graphics architectures because they’re used to keep a GPU’s many execution resources well fed; having lots of threads standing by for execution allows the GPU to mask latency by doing other work while waiting for something relatively slow to happen, such as a texture access. A Shader Model 3.0 GPU may have to wait for the result of a conditional (such as an if-else statement) to be returned before proceeding with execution of a dependent branch, so such latency masking becomes even more important with the addition of dynamic flow control.
Despite the ability for programs to branch and loop, though, Shader Model 3.0 GPUs retain their parallelism, and that pulls against the efficient execution of “branchy” code. Pixels (or more appropriately, fragments) are processed together in blocks, and when pixels in the same block take different forks of a branch, all pixels must traverse both forks of that branch. (The ones not affected by the active branch are simply masked out during processing.) In the R520, pixels are grouped into threads in four-by-four blocks, which ATI says is much finer grained threading than in competing GPUs.
To illustrate the improved efficiency of its architecture, ATI offers this example of a shadow mapping algorithm using an if-else statement:
The GPU with large thread sizes must send lots of pixels down both sides of a branch, and thus it doesn’t realize the benefits of dynamic flow control. A four-by-four block, like in the R520, is much more efficient by comparison.
Comparing the scheduling and threading capabilities of R520 to the competition isn’t the easiest thing to do, because NVIDIA hasn’t offered quite as much detail about exactly how its GPUs do things. NVIDIA seems to rely more on software, including the real-time compiler in its graphics drivers, to assist with scheduling. Yet NVIDIA says that its GPUs do indeed have logic on board for management of threads and branching, and that they keep “hundreds” of threads in flight in order to mask latency. As for threading granularity, some clever folks have tested the NV40 and G70 and concluded that the NV40 handles pixels in blocks of 4096, while the G70 uses blocks of 1024. NVIDIA claims those numbers aren’t entirely correct, however, pegging the NV40’s thread size at around 880 pixels and the G70’s at roughly a quarter of that. In fact, the G70’s pipeline structure was altered to allow for finer-grained flow control. The efficiency difference between groups of 200-some pixels and 16 pixels in ATI’s example above is pretty stark, but how often and how much this difference between the G70 and R520 will matter will depend on how developers use flow control in their shaders.
A new memory controller
Along with the rework pixel shader engine, ATI has designed a brand-new memory controller for the R520. Past ATI graphics chips used a centralized memory controller located in the center of the chip, and this placement caused problems. Lots of wires had to come into the center of the chip, and high wire density caused hotspots. The R520’s new memory controller adopts a more distributed form of organization: a ring topology.
Communication is handled by means of a pair of 256-bit rings running in counter directions around the periphery of the chip. That makes for 512 bits of internal bandwidth, but external bandwidth to memory remains 256 bits, as in ATI’s past high-end GPUs. ATI expects this new design to be capable of much higher memory clock speeds. The new memory controller also offers more access granularity by dividing its external memory interface up into eight 32-bit channels rather than four 64-bit channels. This memory controller’s “smart” arbitration logic is programmable, and ATI expects to tune in on a per application basis in the future using the Catalyst A.I. facility of its graphics drivers.
The R520 is tweaked in a number of other ways to help improve memory performance, as well. The texture, color, Z, and stencil caches are now fully associative, making them more effective than in the past. The chip’s Z compression has been revamped to achieve higher compression and maintain it longer. And the GPU’s hidden surface removal technique for its hierarchical Z buffer has now uses floating-point math for more precise operation.
A grab-bag of other enhancements
There’s much more to R520 and its derivatives, of course, but time is running short, so I’m going to bust out the machine gun and spray you with bullet points to cover the rest of the highlights. Many of them are quick but notable. Among them:
- HDR support with AA R520 and the gang can do filtering and blending of 16-bit per color channel floating-point texture formats, allowing for easier use of high-dynamic-range lighting effectsjust like the GeForce 6 and 7 series. Unlike NVIDIA’s GPUs, the R500 series can also do multisampled antialiasing at the same time, complete with gamma-correct blends. ATI’s new chips also support a 10:10:10:2 integer buffer format that offers increased color precision at the expense of alpha transparency precision. This is a smaller 32-bit format, but it should still be suitable for high-dynamic-range rendering.
Interestingly enough, ATI claims the Avivo display engine in the R500 series can do tone mapping, converting HDR images to display color depths with no performance penalty. Unfortunately, DirectX 9 doesn’t currently offer a way to expose this capability to developers, and it’s not in the drivers yet, but apparently the hardware can do it.
- Better edge and texture antialiasing ATI’s new “Adaptive AA” feature mirrors NVIDIA’s supersampled transparency AA, taking care of cases where alpha transparency creates jaggies that aren’t on polygon edges. This one is enabled via a simple checkbox in the driver.
Another driver checkbox addresses a long-standing complaint of mine: angle-dependent anisotropic filtering. The R500 series includes a new, higher quality anisotropic filtering method that’s not angle dependent. Most newer GPUs haven’t included the ability to turn off angle-dependent aniso, and I’m pleased to see that ATI has made it happen.
- Avivo video and display engine Avivo is a branding concept that encompasses the whole of the “video pipeline,” from the capture stage using ATI’s Theater products to the decoding, playback, and display output stages that happen on a GPU. Avivo still uses pixel shaders rather than a dedicated video processing unit, but ATI is bringing over algorithms and capabilities from its Xilleon line of consumer electronics products in order to improve the Radeon’s video and display features. Avivo features end-to-end 10-bit integer color precsion per channel for both analog and digital displays, and ATI proudly declares that Avivo will have the best H.264 acceleration around. We will have to cover Avivo in more depth in a future face-off against NVIDIA’s PureVideo.
I’ve left out more details than I care to admit, but it’s time to turn our attention to the composition of the R500 family.
The R500 family
Because the R500 series decouples the different stages of the rendering pipeline from one another, ATI’s chip designers have been able to allocate their transistor budgets in some interesting ways. We’ve seen this sort of thing before with NVIDIA’s newer chips. For instance, the GeForce 6600 notably includes eight pixel shaders but only four “back end” units, or ROPs. Here’s a quick look at how ATI has divvied up execution resources inside of the first three R500-series GPUs and how the competing NVIDIA chips compare.
|Radeon X1300 (RV515)||4||4||4||4||4||128|
|Radeon X1600 (RV530)||5||12||4||4||8||128|
|Radeon X1800 (R520)||8||16||16||16||16||512|
|GeForce 6200 (NV44)||3||4||4||2||2/4||?|
|GeForce 6600 (NV43)||3||8||8||4||8/16||?|
|GeForce 6800 (NV41)||5||12||12||8||8/16||?|
|GeForce 7800 GT (G70)||7||20||20||16||16/32||?|
|GeForce 7800 GTX (G70)||8||24||24||16||16/32||?|
Comparing these very different architectures from ATI and NVIDIA isn’t an exact science, of course, but I’ve tried to get it right. Because we don’t know precisely how many threads each NVIDIA GPU can handle, I’ve left that question unanswered in the table. Also, the number of Z compares per clock gets tricky. As you can see, I’ve put a split into the NVIDIA GPUs’ Z compare capabilities. That’s because the NV4x and G70 GPUs can handle either one Z/stencil and one color pixel per clock, or they can do two Z/stencil pixels per clock. It’s a tradeoff. The R500 series actually has a similar quirk; it can do two Z pixels per clock when multisampled antialiasing is enabled, twice as many as indicated in the table.
Anyhow, if we concentrate on the new ATI GPUs, we can see a couple of very straight-laced designs in the RV515 and the R520. Both have the same number of pixel shaders, texture units, render back-ends, and Z-compare unitsall nicely balanced and politically correct, though not necessarily the optimal use of transistors.
Then we have the wild one of the bunch, the RV530. With five vertex shaders, 12 pixel shaders, eight Z-compare units, and only four texture units and render back ends, this design is wildly asymmetrical and generally disrespectful to its elders. The RV530 may also be a more optimal means of spending a transistor budget than the other two chips, although it is a pretty radical departure from the norm. The RV530 is something of a statement by ATI about where games are going, and the emphasis is decidedly on shaders. The X1600 cards based on this chip should be especially good at shader-heavy games with lots of complex geometry and shadowing, but they may be relatively weak in older games that rely on lots of texturing or just raw fill rate. RV530-based products may also suffer when subjected to the rigors of heavy anisotropic filtering or edge antialiasing, as in our test suite for this article. Whoops.
NVIDIA has gradually embraced more asymmetry between pixel shaders and other resources in its GPUs over time, culminating in the G70’s use of 24 pixel shaders and 16 ROPs. As far as we know, NVIDIA’s current architectures don’t decouple everything ATI’s new designs do. For instance, the number of texturing units is tied to the number of pixel shaders, and the render back-end and Z-compare unit are tied together inside what NVIDIA calls a ROP. Perhaps for this reason, or perhaps because it’s just crazy, no GPU from NVIDIA has ever had a 3:1 ratio of pixel shaders to render back-ends like the RV530. I’ll be curious to see whether the RV530 succeeds; it may be too far ahead of its time.
It’s not in the table above, but I should mention the memory controllers on the three chips. R520 has the full 512-bit internal, 256-bit external memory controller with a ring topology. RV530’s memory controller retains the ring topology but halves the bandwidth to 256 bits internally and 128 bits to memory. The humble RV515 doesn’t have a ring-style memory controller, but it can support one, two, or four 32-bit memory channels.
Speaking of transistor budgetswhich is a sure-fire way to pick up the chickslet’s have a look at where ATI’s three new chips wound up.
|Radeon X1300 (RV515)||105||90||95|
|Radeon X1600 (RV530)||157||90||132|
|Radeon X1800 (R520)||321||90||263|
|GeForce 6200 (NV44)||75||110||110|
|GeForce 6600 (NV43)||143||110||156|
|GeForce 6800 (NV41)||190||110||210|
|GeForce 7800 (G70)||302||110||333|
No table like this one would be possible without a heaping helping of disclaimers, so let’s get started. First, it seems that ATI and NVIDIA estimate transistor counts using different methods, so the numbers here aren’t necessarily entirely comparable. I didn’t count them myself. Second, the die size measurements you see were produced by me, and are not entirely, 100% accurate. I used a plastic ruler, and I didn’t measure fractions of a millimeter beyond the occasional .5 increment when really obvious. That said, these numbers should be more accurate than some others I’ve seen bandied about, so there you go.
Obviously, ATI’s move to 90nm process tech gives it the ability to squeeze in more transistors per square millimeter, as the numbers suggest. Die size is related pretty directly to the cost of producing a chip, so ATI looks to have an advantage in each segment of the market. However, that advantage may be mitigated by less-than-stellar yields on these 90nm chips, so who knows?
The real difficulty of handicapping things here comes in trying to sort out which of these new GPUs competes with which chip from NVIDIA. Truth be told, NVIDIA has already taped out multiple GPUs, presumably lower-end GeForce 7-series parts, to compete with ATI’s new offerings. Only the G70 and the R520 are sure-fire direct competitors from the same generation. On that front, note that the G70 packs 24 pixel shader pipelines into only 302 million transistors, while the R520’s sixteen pipes weigh in at 321 million transistors. That’s quite the difference. NVIDIA says the G70 would translate to about 226 mm2 at 90nm, were it to make the leap. The G70 hasn’t made that leap, though.
In terms of transistor counts and basic capabilities, the RV530 falls somewhere between the NV41 chip powering the GeForce 6800 and the NV43 GPU on GeForce 6600 cards. Similarly, the RV515 falls between the NV43 and NV44, so direct competitors among NVIDIA’s GPUs aren’t easily identified. That leaves us to compare the actual cards based on these chips on the basis of price and performance, which is what we’ll do next.
Cards in the X1000 series
Like I said, ATI is unleashing a whole range of cards based on these three chips. Here’s a look at all of ATI’s proposed models, from $79 on up.
|Radeon X1300 HyperMemory||RV515||450||1000||32(128)||$79|
|Radeon X1300 Pro||RV515||600||800||256||$149|
|Radeon X1600 Pro||RV530||500||780||128||$149|
|Radeon X1600 Pro||RV530||500||780||256||$199|
|Radeon X1600 XT||RV530||590||1380||128||$199|
|Radeon X1600 XT||RV530||590||1380||256||$249|
|Radeon X1800 XL||R520||500||1000||256||$449|
|Radeon X1800 XT||R520||625||1500||256||$499|
|Radeon X1800 XT||R520||625||1500||512||$549|
That’s pretty much a sweep, save for a gaping hole between $249 and $449 that could use some attention. Most of ATI’s current Radeon lineup should go the way of the dodo once these cards filter out into the market in sufficient volume. Surely something will come along to replace the current Radeon X800 XL at $299-349, but we don’t yet know what that will be.
ATI supplied us with four cards from its new lineup for review. Since we’ve only had a week and a half with the cards, we had to leave out the Radeon X1300 Pro. Our test suite for the other cards wouldn’t be appropriate for the X1300 Pro or its competitors. We’ll address it in a future review.
Further up the ladder, we have the Radeon X1600 XT. This card packs 256MB of RAM and 12 pixel pipes running at 590MHz for a price of $249. LCD fanatics, please note the dual DVI outputs. ATI says the X1600XT is a competitor for the GeForce 6600 GT, but that’s not quite right for this card. The more direct competitor is probably the GeForce 6800, a 12-pipe card that’s selling for $239 or sosometimes less. I expect that ATI’s board partners will discount somewhat from the list price, so the comparison seems apt.
In the meaty part of the product lineup is the Radeon X1800 XL. This R520-based beast packs 256MB of memory and a 500MHz GPU clock. ATI’s suggested list price on this puppy is $449. That would probably make its most direct competition the GeForce 7800 GT, although some 7800 GT cards can presently be found for under $400.
Finally, there’s the Radeon X1800 XT. The XT rides on the same basic board design as the XL, but with a larger dual-slot cooler. The 512MB version, pictured above, will list at $549. That puts it into the high end of GeForce 7800 GTX territory. Word has it that NVIDIA is preparing a 512MB version of the 7800 GTX to go toe to toe with the Radeon X1800 XT, but we’ll have to compare the 512MB Radeon X1800 XT to the 256MB GeForce 7800 GTX for now.
You may be wondering when these products will actually be available for purchase. All I can tell you right now is what ATI has told me, so let me hit you with the actual slide from the actual ATI presentation, with actual dates. Actually.
Three of the cards, including the X1800 XL, should be available today, if all goes according to plan. The rest are slated to arrive in November. ATI hasn’t given us any projections about how broad or deep availability of these new cards will be, so we will have to wait and see about that. The red team’s track record on product availability has been rocky in the past year or so, but perhaps this round will be different.
Doubling up on Radeon X1000 cards via CrossFire
Yes, the Radeon X1000 series will get its own version of CrossFire. We don’t have too many details on it yet, but we do know several things. High-end X1000 CrossFire rigs will still require dedicated master cards and external DVI cables in order to work, but will support much higher display resolutionsup to 2048×1536 at 70Hz or better. The dual-link DVI output capabilities of the Avivo display engine will team up with a new, more powerful compositing chip on the master cards to make this happen. ATI says this compositing engine will also be fast enough to deliver CrossFire’s SuperAA modes with no performance hit.
Meanwhile, the X1300 series may not even require master cards, because it will use PCI Express to pass data between two graphics cards, as can NVIDIA’s low-end GPUs.
We don’t yet have a definite ETA on X1800 or X1600 master cards, unfortunately, but ATI says they’re on the way.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test systems were configured like so:
|Processor||Athlon 64 X2 4800+ 2.4GHz|
|System bus||1GHz HyperTransport|
|Motherboard||Asus A8N-SLI Deluxe||ATI CrossFire reference board|
|North bridge||nForce4 SLI||Radeon Xpress 200P CrossFire Edition|
|Chipset drivers||SMBus driver 4.45
SATA IDE driver 5.34
|SMBus driver 5.10.1000.5
SATA IDE driver 188.8.131.52
|Memory size||1GB (2 DIMMs)|
|Memory type||OCZ EL PC3200 DDR SDRAM at 400MHz|
|CAS latency (CL)||2|
|RAS to CAS delay (tRCD)||2|
|RAS precharge (tRP)||2|
|Cycle time (tRAS)||8|
|Hard drive||Maxtor DiamondMax 10 250GB SATA 150|
with Realtek 184.108.40.20600 drivers
with Realtek 5.10.00.5152 drivers
|Networking||NVIDIA Ethernet driver 4.82
Marvell Yukon 220.127.116.11 drivers
VIA Velocity v24 drivers
|Marvell Yukon 18.104.22.168 drivers
VIA Velocity v24 drivers
|Graphics||XFX GeForce 6800 256MB PCI-E with ForceWare 78.03 drivers||Radeon X1600 XT PCI-E with Catalyst 8.173.1-05921a-026915E drivers|
|GeForce 6800 Ultra 256MB PCI-E with ForceWare 78.03 drivers||Radeon X850 XT PCI-E with Catalyst 8.162.1-050811a-026057E drivers|
|Dual GeForce 6800 Ultra 256MB PCI-E with ForceWare 78.03 drivers||Dual Radeon X850 XT PCI-E with Catalyst 8.162.1-050811a-026057E drivers|
|XFX GeForce 7800 GT 256MB PCI-E with ForceWare 78.03 drivers||Radeon X1800 XL PCI-E with Catalyst 8.173.1-05921a-026915E drivers|
|Dual XFX GeForce 7800 GT 256MB PCI-E with ForceWare 78.03 drivers|
|MSI GeForce 7800 GTX 256MB PCI-E with ForceWare 78.03 drivers||Radeon X1800 XT PCI-E with Catalyst 8.173.1-05921a-026915E drivers|
|Dual MSI GeForce 7800 GTX 256MB PCI-E with ForceWare 78.03 drivers|
|OS||Windows XP Professional (32-bit)|
|OS updates||Service Pack 2|
Thanks to OCZ for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, OCZ’s RAM is definitely worth considering.
Unless otherwise specified, the image quality settings for both ATI and NVIDIA graphics cards were left at the control panel defaults.
The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- trdemo2 demos
- Far Cry 1.33 with tr1-volcano and tr3-pier demos
- Splinter Cell: Chaos Theory 1.04 with trpenthouse demo
- The Chronicles of Riddick: Escape from Butcher Bay 1.1 with trdemo4
- Battlefield 2 1.02
- Guild Wars
- FEAR demo
- Half-Life 2
- FutureMark 3DMark05 Build 120
- FRAPS 2.6.4
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
We’ll begin with a look at the basic math on pixel-pushing power that helps determine overall performance. These numbers become less important over time as asymmetric designs like the Radeon X1600 XT become more common, but they’re still useful to consider.
| Core clock
|Peak fill rate
| Peak fill rate
| Memory bus
| Peak memory
|Radeon X1600 XT||590||4||2360||4||2360||1380||128||22.1|
|GeForce 6600 GT||500||4||2000||8||4000||1000||128||16.0|
|GeForce 6800 GT||350||16||5600||16||5600||1000||256||32.0|
|Radeon X800 XL||400||16||6400||16||6400||980||256||31.4|
|GeForce 6800 Ultra||425||16||6800||16||6800||1100||256||35.2|
|GeForce 7800 GT||400||16||6400||20||8000||1000||256||32.0|
|Radeon 1800 XL||500||16||8000||16||8000||1000||256||32.0|
|Radeon X850 XT||520||16||8320||16||8320||1120||256||35.8|
|Radeon X850 XT Platinum Edition||540||16||8640||16||8640||1180||256||37.8|
|XFX GeForce 7800 GT||450||16||7200||20||9000||1050||256||33.6|
|Radeon X1800 XT||625||16||10000||16||10000||1500||256||48.0|
|GeForce 7800 GTX||430||16||6880||24||10320||1200||256||38.4|
The Radeon X1600 XT has more pixel shaders running at a much higher clock speed than the GeForce 6800 and a similar amount of memory bandwidth, but it has just over half the pixel and texel fill rate of the 6800. As I’ve noted, the X1600 XT’s relative performance should vary quite a bit depending on the application. The X1600 XT’s challenge is compounded by the fact that it’s competing in our tests against an XFX card that’s “overclocked in the box” at 350MHz core and 900MHz memory clock speeds. The clock speeds on consumer products based on NVIDIA chips are often faster than NVIDIA’s reference spec, and that’s the case here.
Meanwhile, the Radeon X1800 XL matches up well against the stock GeForce 7800 GT, with exactly the same texel fill rate and memory bandwidth. Unfortunately for ATI, our 7800 GT test subjects are also “overclocked” cards from XFX that run at slightly higher clock speed, giving the 7800 GT cards a bit of an edge.
The most interesting matchup may be at the high end, where the Radeon X1800 XT squares off against the GeForce 7800 GTX. Here, ATI counters the brute force of GeForce 7800 GTX’s 24 pixel shaders and texture units with higher clock speeds. The Radeon X1800 XT’s 16 pipes run nearly 200MHz faster, yielding a theoretical peak texel fill rate nearly the same as the 7800 GTX. ATI’s top-end card leads by a wide margin in terms of peak memory bandwidth, nearly 10GB/s faster than the 7800 GTX.
So are the cards faithful to these numbers? Let’s see.
Few of the cards approach their theoretical peak limits in the single-textured fill rate test, save for the X1600 XT, whose limits are relatively low. With multiple textures per pixel, the cards get much closer to their peak capabilities. The Radeon X1800 XL magically pushes a few ticks past its theoretical peak throughput, but it still can’t catch the juiced XFX 7800 GT. The X1800 XT matches the GeForce 7800 GTX almost exactly, slightly behind it at lower resolutions, but slightly above it at 1600×1200.
Overall, it’s very close. That sets the stage for our game benchmarks, which are up next.
We’ve conducted our testing almost exclusively with 4X antialiasing and a high degree of anisotropic filtering. We generally used in-game controls when possible in order to invoke AA and aniso. In the case of Doom 3, we used the game’s “High Quality” mode in combination with 4X AA.
Our Delta Labs demo is typical of most of this game: running around in the Mars base, shooting baddies. The imaginatively named “trdemo2” takes place in the game’s Hell level, where the environment is a little more varied and shader effects seem to be more abundant.
It’s a disappointing start for the new ATI cards in Doom 3. OpenGL has long been an Achilles’ heel for Radeons, and the problem persists with the X1000 series.
Next up is Far Cry, which takes advantage of Shader Model 3.0 to improve performance. The game also has a path for ATI’s Shader Model 2.0b. Our first demo takes place in the jungle with lots of dense vegetation and even denser mercenaries. All of the quality settings in the game’s setup menu were cranked to the max. The second demo is relatively simpler in terms of geometry, but includes lots of “heat shimmer” effects.
The R500-series GPUs show us a little something in this first demo, outrunning the NVIDIA cards decidedly. The Radeon X1800 XT stomps the GeForce 7800 GTX and is nearly as fast as two 7800 GT cards in SLI. The X1800 XL outruns the GeForce 7800 GTX, too. Ow. Meanwhile, the Radeon 1600 XT still looks overmatched against the GeForce 6800.
Things balance out a little in the Volcano level, with an intriguing separation of sorts. The uber-high-end X1800 XT outperforms the GeForce 7800 GTX, but the still-pretty-high-end X1800 XL is matched almost evenly against the 7800 GT. The not-quite-as-high-end X1600 XT, though, can’t quite run with the GeForce 6800.
The Chronicles of Riddick: Escape from Butcher Bay
This OpenGL-based game has a Shader Model 3.0-ish mode for NVIDIA cards, but that mode exacts a big performance penalty, so I ran all cards with the SM2.0 path.
Another OpenGL game, another beating for the Radeon X1000 series. The fancy-pants Radeon X1800 XT gets a soul-compressing wedgie from the GeForce 6800 Ultra.
Splinter Cell: Chaos Theory
We’re using the 1.04 version of Splinter Cell: Chaos Theory for testing, and that gives us some useful tools for comparison. This new revision of the game includes support for Shader Model 2.0, the DirectX feature set used by Radeon X850 XT cards. The game also includes a Shader Model 3.0 code path that works on the newer NVIDA and ATI GPUs.
In our first test, we enabled the game’s parallax mapping and soft shadowing effects. In the second, we’ve also turned on high-dynamic-range lighting and tone mapping, for some additional eye candy. Due to limitations in the game engine (and in NVIDIA’s hardware), we can’t use HDR lighting in combination with antialiasing, so the second test was run without edge AA.
The new Radeon GPUs handle themselves well in this Shader Model 3.0 game. Once more, the X1800 XT puts the hurt on the GeForce 7800 GTX, while the X1800 XL is more evenly matched with the 7800 GT. Here, the X1600 XT even puts up a good fight. Notice that when we turn off antialiasing and crank up high dynamic range lighting, the GeForce 7800 cards get relatively strongernot by much, but it’s enough to push the 7800 GT past the X1800 XL.
We tested the next few games using FRAPS and playing through a level of the game manually. For these games, we played through five 90-second gaming sessions per config and captured average and low frame rates for each. The average frames per second number is the mean of the average frame rates from all five sessions. We also chose to report the median of the low frame rates from all five sessions, in order to rule out outliers. We found that these methods gave us reasonably consistent results, but they are not quite as solid and repeatable as other benchmarks.
Yet again, the Radeon X1800 XT outruns the GeForce 7800 GTX, but among cards that cost less than $500, the NVIDIA products are somewhat faster.
The F.E.A.R. demo looks purty, but that comes at the cost of frame rates. We actually had to drop back to 1024×768 resolution in order to hit playable frame rates, although we did have all of the image quality settings in the game cranked.
The new Radeons deliver consistently higher average frame rates than the green team’s competing cards in this new game, but the median low scores deflate the ATI cards a bit. The minimum frame rates on the NVIDIA cards are very similar, all told.
I played through an awful lot of battles with Plague Devourers outside of Old Ascalon in order to bring you these scores. The charm does wear off after your 50th battle with a big scorpion-looking thing, slightly diluting the crack-like addictiveness of this online RPG.
Chalk this one up as a win for the GeForce cards. Here, the X1600 XT receives a brutal noogie at the hands of the GeForce 6800. This game looks nice, but you’re not exactly knee-deep in advanced pixel shader effects when playing it.
Here’s one test where we didn’t feel the need to crank up 4X antialiasing and a high degree of anisotropic filtering to keep the high-end cards occupied. 3DMark05 also makes ample use of pixel shader effects and fairly complex geometry. All told, this test should allow the Radeon X1600 XT to strut its stuff.
As one might expect, the Radeon X1600 XT turns in a much better performance here than in most of our other tests, giving the slightly long-in-the-tooth GeForce 6800 a reason to contemplate its existence.
It’s a split on the R520-based cards, with the X1800 XT smoking the GeForce 7800 GTX, especially at lower resolutions, and the X1800 XL trailing the GeForce 7800 GT. Now we’ll look at the individual scores from the three games that make up 3DMark05’s overall score.
Relative performance changes a little from one test to the next, but stays reasonably true to the picture painted by 3DMark’s overall score.
Now we’ll look at the synthetic feature tests from 3DMark05.
The GeForce cards dominate 3DMark05’s relatively simple pixel shader test, but the tables turn in the vertex shader benchmarks, where the new Radeons steal the show. The X1600 XT’s irreverent personality comes out when it uses its five vertex shaders at 590MHz to somehow upstage the X1800 XL.
ShaderMark should allow us to look at pixel shader performance in quite a bit more detail. This programs runs a whole host of different shader programs and tracks performance in each.
This one’s a back-and-forth battle between the two high-end cards, and it’s hard to declare a winner. The GeForce 7800 GTX seems to be faster in more of the tests than the X1800 XT, but the new Radeon crunches through some of the slower, more intensive shaders faster.
I should mention that I’ve included scores for the HDR shaders here, but they weren’t really running right on the X1000-series cards. There were slight but obvious visible problems with these shaders on the new Radeons, and I expect that a fix in either the application or ATI’s drivers could affect performance.
Before we move on, let’s take a look at the slow-motion instant replay from ShaderMark’s shadow mapping tests.
Here’s a pixel shader program that uses flow control, and the results are striking. The GeForce cards all suffer a minor performance penalty when flow control is in use, while the Radeon X1000-series cards show a big jump in performance. With flow control active, the ATI cards are also much faster than their NVIDIA counterparts. This is the sort of shader that can benefit from ATI’s finer threading granularity for looping and branching, obviously.
ShaderMark will also allow us to quantitatively analyze image quality, believe it or not. It does so by comparing output from the graphics card to the output of Microsoft’s DirectX 9 reference rasterizer, so this is more of a quantitive analysis of the deviation from Microsoft’s standard than anything else. The number is reported as the mean square error in the card’s output versus the reference image.
Because of the obvious visual problems noted above, I’ve excluded the HDR shaders from our cumulative average. Otherwise, the ATI and NVIDIA GPUs are equally faithful to the Microsoft reference rasterizer.
I haven’t had time to do a full test and write-up of the X1000 series’ antialiasing performance, but I have confirmed that the new GPUs stay true to the sample patterns and modes used by the Radeon X800 series. In fact, there’s little change aside from the ability to do antialiasing in combinations with high-dynamic-range image formats and adaptive antialiasing.
Adaptive AA is basically a clone of NVIDIA’s transparency AA, introduced in the GeForce 7800. This feature handles difficult cases where alpha transparency is used to create a see-through object like a fence, screen, grill, or grate. Rather than capture all new images, I’ve lifted some of them from my GeForce 7800 GTX review. If anything about transparency AA has changed in NVIDIA’s latest drivers, it won’t be reflected here.
Here’s a completely pathological case where the multiple layers of the chain link fence look like a scattered mess without adaptive/transparency AA. Things get progressively better as you scroll down through the shots of different modes.
I’d say both ATI’s 2X adaptive AA and NVIDIA’s multisampled transparency AA modes are pretty well useless. ATI’s 4X adaptive AA does a good job, though, just as NVIDIA’s 4X supersampled transparency AA does. ATI’s 6X adaptive AA mode looks pretty goodnot as nice as NVIDIA’s 8xS with supersampled transparency AA, but 8xS mode has always carried a pretty big performance penalty. Unfortunately, we ran out of time before we could test adaptive AA performance. We’ll have to look at that in a future article.
We measured total system power consumption at the wall socket using a watt meter. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The idle measurements were taken at the Windows desktop, and cards were tested under load running a loop of 3DMark05’s “Firefly Forest” test at 1280×1024 resolution.
Please note that these numbers aren’t as pure as the driven snow. Because we wanted to include CrossFire and SLI, we kept each brand of cards with its respective platform here, the ATI cards with the Radeon Xpress 200 motherboard and the NVIDIA cards with the nForce4 SLI mobo. Differences in power consumption between these motherboards will influence the overall result.
Even with the caveats, I think we can draw some provisional conclusions. Systems based on the new Radeons uniformly draw more power than the NVIDIA competition, which isn’t entirely a shock given the higher clock rates for the ATI GPUs. The system based on the Radeon X800 XL, which has the lowest clock speed of the three, has power requirements under load similar to the GeForce 7800 GT-based rig. The X1800 XT system, with its stratsopheric GPU clock speed, requires 25W more system power than the nForce4/GeForce 7800 GTX system.
These new Radeons also seem to draw quite a bit more power at idle than the GeForce cardsor the Radeon X850 XT, which is on the same motherboard as the Radeon X1000 cards. The X1800 XT, in particular, is pulling an awful lot of juice. The R500 GPUs do clock gating to reduce idle power consumption, and they also ramp down clock speeds a small amount when the GPU is idle. Still, I wish ATI had used more extensive dynamic clock speed adjustments to help cut idle power use further.
We used an Extech model 407727 digital sound level meter to measure the noise created (primarily) by the cooling fans on our two test systems. The meter’s weightings were set to comply with OSHA standards. I held the meter approximately two inches above the tops of the graphics cards, right between the systems’ two PCI Express graphics slots.
The sound levels the meter picked up track pretty well with my perception of the X1000 series’ coolers. The two X1800 cards are reasonably quiet. The XT’s dual-slot cooler can be loud when it kicks into high gear as the system powers on, but otherwise, it just whispers. I suppose it might crank itself up inside of a warm case, but it was a model citizen on our open test bench. The X1800 XL, though, had an annoying habit of kicking its cooling fan into high gear for no apparent reason when idle on the Windows desktop.
The X1600 XT is another story. This is a loud card all of the time, whether idle or running a game. The fan just runs fast enough to make quite a bit of noise all day long, more than either of the X1800 cards do under load. ATI may need to put a beefier cooler on this one in order to keep fan speeds in check.
I should start by saying that we’ve not been able to test many of the R500-series GPU architectures’ new features as extensively as we usually would due to lack of time, including the chips’ Avivo video and display engine and some facets of edge and texture antialiasing. We also left out a minor feature known as the Radeon X1300 Pro. We’ll have to address those things at a later date, when time permits.
That said, we’ve learned quite a bit about ATI’s new GPUs in the preceding pages. We should probably break things down into pieces in order to make sense of it all.
From a pure graphics technology standpoint, the Radeon X1000 series of graphics processors doesn’t break new ground with bold innovations, but it does give ATI nearly every feature that the GeForce 7 series GPUs have had over the Radeon X800s. Not only that, but ATI has added a number of worthwhile capabilities, including multisampled antialiasing with high-dynamic-range color modes, “free” tone mapping via the Avivo display engine, and much finer-grained batch sizes for dynamic flow control in Shader Model 3.0. ATI has also caught up with NVIDIA on the internal chip architecture front by decoupling the computational units responsible for the various stages of the graphics pipeline from one another, allowing more flexibility for the development of adventurous variations on the core GPU architecture, like the RV530.
If the R520 architecture and derivatives have a weakness, it may be performance in current applications given the number of transistors. The R520 apparently requires more transistors for what’s basically a 16-pipe design than NVIDIA’s G70 does with 24 pixel shaders and texture units. ATI has made up the deficit with higher GPU and memory clock speeds, but that tradeoff leads to higher power consumption than G70, even though ATI’s GPU is manufactured on a smaller fab process. Then again, the Radeon X1800 XT outperforms the GeForce 7800 GTX in many cases. That performance gap may grow if future applications begin making extensive use of shaders with flow control, where ATI’s architecture is more efficient. However, I certainly wouldn’t expect that to happen overnight, and it may not happen during the lifespan of G70, R520, and their offspring.
These considerations are less of a concern in the middle of the graphics market than they are at the very high end, where ATI has resorted to extreme clock speeds to counter NVIDIA’s widest GPU architecture. Our measured power consumption under load on the Radeon X1800 XL system was within three watts of the GeForce 7800 GT-based system. That’s hardly a reason to panic. Smart use of asymmetric GPU design, as in the RV530 (if that proves to be smart), may help ATI achieve higher performance with fewer transistors and smaller power envelopes in the future, as well.
ATI faced some daunting challenges with this new GPU generation, and they have largely succeeded in meeting them. If they deliver the products on schedule, ATI could lay claim to bragging rights for having been first to deliver a top-to-bottom range of next-gen GPUs on 90nm process technology. As for the cards as products, well, let’s look at them one by one.
At the very high end of the market, the Radeon X1800 XT is indeed a worthy competitor for the GeForce 7800 GTX. In Direct3D games, the X1800 XT is usually faster than the 7800 GTX. Unfortunately, ATI’s weak showing in OpenGL games keeps the X1800 XT from capturing the undisputed heavyweight title. I also have a few concerns about the likely extent of the Radeon X1800 XT’s availability in the market, given its relatively high GPU clock speeds. We will have to wait and see about that.
There are also a number of GeForce 7800 GTX cards on the market now with higher clock speeds than the MSI cards we tested. I chose the MSI cards simply because I wanted a matched pair of cards for use in SLI, and I didn’t have a pair of “overclocked in the box” cards on hand. NVIDIA’s board partners have been pretty aggressive about ramping up the G70’s clock speeds and standing behind those cards with lifetime warranties, and those cards might give the Radeon X1800 XT more of a run for its money in Direct3D appsat the expense of higher power consumption, of course.
We may have a rematch between the Radeon X1800 XT 512MB and a new 512MB version of the GeForce 7800 GTX shortly, too, that could produce a clear champ.
The battle between the Radeon X1800 XL and the GeForce 7800 GT is close, but I’d have to give the edge to the 7800 GT. On balance, the 7800 GT is faster, though there are occasions, like in Far Cry and the F.E.A.R. demo, where the X1800 XL outruns even the GeForce 7800 GTX. Assuming the Radeon X1800 XL can soon hit the lower price points that some 7800 GT cards are hitting, it ought to be a very competitive product.
The Radeon X1600 XT confounds me. Admittedly, our test suite was best suited for high-end graphics cards, and the extensive use of antialiasing and anisotropic filtering probably hurt the X1600 XT’s standing in our results. Still, this card lists for $249 and has 256MB of memory onboard, which is territory where I’d expect to be able to use high-quality edge and texture antialiasing in current games. Only in select cases can the Radeon X1600 XT keep pace with NVIDIA’s like-priced offering, the GeForce 6800. I appreciate ATI’s boldness in choosing an asymmetrical GPU design with the RV530; five vertex pipes and 12 pixel shaders at 590MHz aren’t easily discounted. However, in light of the performance we’ve seen and the size of the chip, this feels more like a $179 product to memore of a true GeForce 6600 GT competitor. The X1600 XT may age well as more shader-laden games take hold, but I wouldn’t cough up $249 for one now when the GeForce 6800 can be had for less.