BACK IN THE DAY, graphics geeks would demonstrate a new effect using a model of a teapot. Nowadays, the object of our graphics demos is the sexpot. And if the sexpot demo is a measure of a graphics card, the new Radeon X800 series wins, hands down. The X800 is ATI’s second generation of DirectX 9-class GPUs with the floating-point datatypes that have taken real-time graphics to a new plane, and the Ruby demo ATI has created to show off its new GPU is simply stunning.
But I’m getting ahead of myself. First, let’s set the stage a bit. You’re probably already aware that ATI has had a very good past couple of years, since the introduction of the Radeon 9700 GPU (code-named R300). ATI’s arch-rival, NVIDIA, struggled mightily with its competing NV30-series of graphics chips, but just recently recovered quite nicely with the introduction of its NV40 chip, now better known as the GeForce 6800 series of GPUs.
You might do well to go read our GeForce 6800 Ultra review if you haven’t already, but for the lazy among us, I’ll sum things up. The GeForce 6800 Ultra is a massive chip with a sweet sixteen pixel pipelines and out-of-this-world performance several steps beyond ATI’s previous top-end chip, the Radeon 9800 XT. I concluded my review of the 6800 Ultra by saying that NVIDIA had left ATI with no margin for error, and I meant it. Fortunately, it looks like ATI hasn’t stumbled in the least, and the choice between graphics chips will come down to better versus best.
The Radeon X800 family
ATI’s new top-end graphics card is the $499 Radeon X800 XT Platinum Edition. With sixteen pixel pipes, a 520MHz core clock, and 256MB of DDR3 memory running at 1.12GHz, the X800 XT will be making a strong claim on the title of “fastest graphics card on the planet.” Why the name Platinum Edition? I think in part because ATI didn’t want to have only one letter’s worth of difference between the name of its old $499 card and its new $499 card. Also, the company wanted to let folks know that this puppy is something out of the ordinary, and apparently the platinum name was available. That doesn’t mean, ATI assures us, that only 42 of these cards will ever ship to consumers; the X800 XT Platinum Edition will be available in quantities similar to the Radeon 9800 XT.
Further down the product line, ATI has the Radeon X800 Pro for the low, low price of $399. This card has twelve pixel pipes, a 475MHz GPU, and 256MB of DDR3 memory at 900MHz. Although the X800 XT Platinum Edition is pictured above, the X800 Pro card looks identical. The Pro is probably the more exciting card for most folks, since its price is a little less stratospheric. Amazingly, this card will be shipping today, May 4, to retailers, so you should be able to buy one very soon. The X800 XT Platinum Edition will be following along a little later, on May 21.
Note some details in the pictures above. The cooler is a single-slot design for both cards. Both the X800 Pro and XT Platinum Edition require only one aux power connector, and they should work with any decent 300W power supply unit. In fact, one of the ATI engineers brought an X800-equipped Shuttle XPC to the X800 launch event, just to show it was possible to use the card in it.
Oh, and in case you’re wondering, that yellow, four-pin port next to the aux power connector is a composite video-in connector. ATI says this connector will likely be omitted from North American versions of the card, but the Euros will get them.
ATI waited for NVIDIA to release its GeForce 6 series before setting the final specifications for its Radeon X800 line. In doing so, ATI was able to ensure, with some certitude, that its new cards would be able to outperform NVIDIA’s GeForce 6800 and GeForce 6800 Ultra. However, NVIDIA didn’t play all of its, erm, cards when it introduced the GeForce 6800 Ultra, it seems. To counter ATI, NVIDIA delivered to us late last week a pair of new cards representing two new GeForce 6800 models, plus a new driver for these cards.
To face off against the X800 Pro, NVIDIA will be releasing its own $399 card, the GeForce 6800 GT. Like the X800 Pro, the GT has a single-slot cooling solution and requires only one auxiliary power connector. NVIDIA says the GeForce 6800 GT’s power supply requirements will be similar to those of the Radeon 9800 XT and GeForce FX 5900 Ultra, so a fairly “normal” PSU ought to suffice for it.
The GT will feature a full sixteen pixel pipes running at 350MHz, but its real ace in the hole will be its 256MB of GDDR3 memory running at 1GHz, or 100MHz more than the X800 Pro. The GT should be a tough competitor for the Radeon X800 Pro in the $399 price range, once it arrives. Expect to see 6800 GT cards on store shelves in “mid June,” according to NVIDIA.
Then we have the big momma, the GeForce 6800 Ultra “Extreme” card. Apparently aware that the 16-pipe, 400MHz GeForce 6800 Ultra might have its hands full with the 16-pipe, 520MHz Radeon X800 XT Platinum Edition, NVIDIA has decided to release a limited run of GeForce 6800 Ultra cards clocked at 450MHz. These beasts have dual-slot coolers, dual auxiliary power connectors, and require a 480W power supply, just like the regular GeForce 6800 Ultra.
The “Extreme” cards will be sold in systems made by select high-end PC builders like VoodooPC and Falcon Northwest, and through card makers like Gainward and XFX. I don’t have an official list of those partners just yet, but NVIDIA says to expect announcements at E3. Perhaps then we’ll learn more about what these puppies will cost, as well. These cards should be available in June in whatever quantities NVIDIA and its partners can muster.
I must note, by the way, that NVIDIA’s GeForce 6800 Ultra reference designs have dual DVI ports, while ATI’s Radeon X800 XT Platinum Edition has only one DVI port plus a VGA connector. Personally, I think any card that costs over 300 bucks or so should come with a pair of DVI ports, so that LCD owners can double up on digital goodness. Let’s hope NVIDIA’s board partners follow NVIDIA’s lead and include both DVI ports. Heck, let’s hope ATI’s board partners follow NVIDIA’s lead, as well.
For graphics freaks like us, one of the most exciting developments of the past few weeks has been NVIDIA’s new willingness to divulge information about the internals of its graphics processors. The NVIDIA media briefings on GeForce 6800 were chock full of block diagrams of the NV40 chip, its internal units, and all kinds of hairy detail about how things worked. This was a big change from the NV30 days, to say the least. We now have a pretty good idea how the internals of the NV40 look. Of course, ATI has long been fairly open about its the R300 architecture, but NVIDIA’s newfound openness has forced ATI’s hand a bit. As a result, we now have a clearer understanding about the amount of internal parallelism and the richness of features in ATI’s graphics processorsboth the new X800 (code-named R420) and the R300 series from which it’s derived.
In the R420, the ATI R3xx core has been reworked to allow for higher performance and better scalability. Let’s have a look at some of the key differences between the R420 and its predecessors.
More parallelism The X800’s pixel pipelines have been organized into sets of four, much like those in NVIDIA’s NV40 chip. These pixel pipe quads can be enabled or disabled as needed, so the X800 chip can scale from four pipes to eight, twelve, or sixteen, depending on chip yields and market demands.
In addition to more pixel pipes, the Radeon X800 has two more vertex shader units than the Radeon 9800 XT, for a total of six. Combined with higher clock speeds, ATI is claiming the X800 has double the vertex shader power of the Radeon 9800 XT. Although the X800 Pro will have one of its pixel pipeline quads disabled, it will retain all six vertex shader units intact.
One more thing. You may recall that NVIDIA’s recent GPUs, including the NV3x and NV4x chips, can render additional pixels per clock when doing certain types of rendering, like Z or stencil operations. This has led to the NV40 being called a “16×1 or 32×0” design. Turns out ATI’s chips can render two Z/stencil ops per pipeline per clock, as well, so long as antialiasing is enabled.
Better performance at higher resolutions Each pixel quad in the R420 has its own Z compression and hierarchical Z capability, including its own local cache. ATI has sized these caches to allow for Z-handling enhancements to operate at resolutions up to 1920×1080. (The R300’s global Z cache was sized for resolutions up to 1600×1200, the RV3xx-series chips’ to less.) Also, on the R420, if the screen resolution is too high for the Z cache to accommodate everything, the chip will use its available Z cache to render a portion of the screen, rather than simply turning off Z enhancements.
ATI’s engineers have also boosted the X800’s peak Z compression ratio from 4:1 to 8:1, making the maximum possible peak compression (with 6X antialiasing and color compression) 48:1, but that’s just a fancy number they like to throw around to impress the ladies.
A new memory interface One of the Radeon 9800 XT’s few weaknesses was its relatively slow memory speeds. The GeForce FX 5950 Ultra had 950MHz memory, while the Radeon 9800 XT’s RAM topped out at 730MHz. Part of the reason for this disparity, it turns out, was the chip’s memory interface, which didn’t like high clock speeds. ATI has addressed this problem by giving the X800 GPU a new memory interface capable of clock speeds up to 50% higher than the 9800 XT.
From 10,000 feet up, this new setup doesn’t look dramatically different from the 9800 XT’s; it’s a crossbar type design with four 64-bit data channels talking over a switch to four independent memory controllers. There are some important differences, though, beyond higher clock speeds. First and foremost, this memory interface can make use of the swanky new GDDR3 memory type that ATI helped create. Also, ATI says this new memory interface is more efficient, and it offers extensive tuning options. If a given application (or application type) typically accesses memory according to certain patterns, ATI’s driver team may be able to reconfigure the memory controllers to perform better in that type of app.
- Longer shader programs The X800’s pixel shaders still have 24 bits of floating-point color precision for each color channel (red, green, blue, and alpha), and they do not have the branching and looping capabilities of the pixel shaders in the NV40. They do, however, have the ability to execute shader programs with longer instruction lengths, up from 160 in the 9800 XT to 1,536 in the X800. The X800’s revised pixel shaders also have some register enhancements, including more temporary registers (up from 12 to 32). We’ll look at the question of pixel shaders and shader models in more detail below.
3Dc normal map compression Normal maps are simply grown-up bump maps. Like bump maps, they contain information about the elevation of a surface, but unlike bump maps, they use three-component coordinate system to describe a surface, with X, Y, and Z coordinates. Game developers are now commonly taking high-polygon models and generating from them two things: a low-poly mesh and a normal map. When mated back together inside a graphics card, these elements combine to look like a high-poly model, but they’re much easier to handle and render. Trouble is, like all textures, normal maps tend to chew up video memory, but normal maps don’t tolerate well the compression artifacts caused by compression algorithms like DirectX Texture Compression (DXTC). If a normal map becomes blocky, the perceived elevation of a surface will become blocky and uneven, ruining the effect. ATI has tackled this problem by adapting the DTXC algorithm for alpha channel compression to work on normal maps. Specifically, the DXT5 alpha compression algorithm is used on the red and green channels, which store X and Y coordinate info, respectively. (Z values are discarded and computed later in the pixel shader.) This format is reasonably well suited for normal maps, and offers 4:1 compression ratios. Like any texture compression method, it should allow the use of higher resolution textures in a given amount of texture memory.
The X800 GPU supports this method of normal map compression, dubbed 3Dc, in hardware. Both DirectX and OpenGL support 3Dc via extensions, and game developers should be able to take advantage of it with minimal effort.
- Temporal antialiasing This new antialiasing feature is actually a driver trick that exploits the programmability of ATI’s antialiasing hardware. We’ll discuss it in more detail in the antialiasing section of the review.
Those are some of the more notable changes and tweaks ATI has made to the X800. We’ll discuss some of them in more detail below, and we’ll see the impact of the performance tweaks, in particular, in our benchmark results.
Die size comparison
Both of these chips are large. NVIDIA says the NV40 GPU is 222 million transistors, while ATI says the X800 is roughly 160 million transistors. However, the two companies don’t seem to be counting transistors the same way. NVIDIA likes to count up all possible transistors on a chip, while ATI’s estimates are more conservative.
Regardless, transistor counts are less important, in reality, than die size, and we can measure that. ATI’s chips are manufactured by TSMC on a 0.13-micron, low-k “Black Diamond” process. The use of a low-capacitance dielectric can reduce crosstalk and allow a chip to run at higher speeds with less power consumption. NVIDIA’s NV40, meanwhile, is manufactured by IBM on its 0.13-micron fab process, though without the benefit of a low-k dielectric.
The picture on the right should give you some idea of the relative sizes of these two chips. Pictured between them is an Athlon XP “Thoroughbred” processor, also manufactured on a 0.13-micron process. As you can tell, these GPUs are much larger than your run-of-the-mill desktop CPU, and they are more similar in size to one another than the transistor counts would seem to indicate. By my measurements with my highly accurate plastic Garfield ruler (complete with Odie), the ATI R420 chip is 16.25mm by 16mm, or 260mm2. The NV40 is slightly larger at 18mm by 16mm, or 288mm2.
What Garfield is trying to tell us here is that these are not small chips, Jon. The NV40 isn’t massively larger than the R420, either, despite the different numbers coming out of the two companies.
In order to show off the power of the X800 chip, ATI set out to make a graphics demo very different from the usual single-object, single-effect demos we’re accustomed to seeing. They also went outside for creative help, tapping graphics studio RhinoFX as a partner. RhinoFX works with high-end, CPU-based rendering tools to produce most of its work, and the company’s hallmark has traditionally been realism. In this case, ATI wanted to create something unique in real time, so they gave RhinoFX some polygon, lighting, and animation budgets in line with what they expected the Radeon X800 to be able to handle. RhinoFX used its usual tools, like Maya and Renderman, to create an animation that worked within those budgets, and ATI’s demo team took on the task of converting this sequence into a real-time graphics demo.
The results are astounding. The demo is a full-on action short in a kind of realistic, comic-book-esque style. The sequence makes ample use of cinematic techniques like depth of field and motion-captured character animation to tell a story. Nearly every surface in the thing is covered with one or more shaders, from real-looking leather to convincing skin and just-waxed floors.
Oh yeah, and Ruby’s pretty hot, too. But then what did you expect?
ATI has produced a video on the making of Ruby, and they’ve made quite a bit of information available on how they did it. From what I gather, the ATI demo team spent the lion’s share of its time on shader research, working to create realistic shaders using ATI tools like RenderMonkey and ASHLI. I don’t have time to cover all of this work in great detail today, but I’m sure ATI will make a video of the Ruby demo and more information available to the public. For now, some screenshots from the demo. The screenshots below are unretouched, with the exception of the close-up of Ruby’s eyes. I had to resize that one.
I want to address one issue before we get to the benchmark results, and that’s a problem we found with the NVIDIA 61.11 beta drivers. On this page of our GeForce 6800 Ultra review, you’ll see that we disabled trilinear optimizations on the GeForce 6800 Ultra card in order to get a direct comparison to the Radeon. NVIDIA has adopted a shortcut method of handling blending between mip map levels, and we turned it off because ATI doesn’t use this method. NVIDIA’s trilinear optimizations produce slightly lower quality images than the commonly accepted method of trilinear filtering, although it is admittedly difficult to see with the naked eye. The performance impact of this optimization isn’t huge, but it can be significant in fill-rate-limited situations, as we demonstrated here.
Unfortunately, the 61.11 beta drivers we received this past Friday night didn’t behave as expected. We ticked the checkbox to disable trilinear optimizations, but our image quality tests showed that the driver didn’t disable all trilinear optimizations in DirectX games. I did have time to check at least one OpenGL app, and trilinear optimizations were definitely disabled there. We will show you the image quality impact of the 61.11 drivers’ odd behavior in the IQ section the review.
NVIDIA’s Tony Tamasi confirmed for us that this behavior is a bug in the 61.11 drivers’ control panel, and says it will be fixed. Keep in mind that the scores you see for the GeForce cards in the following pages would, all other things being equal, be slightly lower if the driver behaved as expected. Also, remember that no recent NVIDIA driver we’ve encountered has allowed us to disable trilinear optimizations on a GeForce FX GPU, including the 5950 Ultra model tested here.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test system was configured like so:
|System||MSI K8T Neo|
|Processor||AMD Athlon 64 3400+ 2.2GHz|
|Chipset drivers||4-in-1 v.4.51
|Memory size||1GB (2 DIMMs)|
|Memory type||Corsair TwinX XMS3200LL DDR SDRAM at 400MHz|
|Hard drive||Seagate Barracuda V 120GB SATA 150|
|Audio||Creative SoundBlaster Live!|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 1, DirectX 9.0b|
We used ATI’s CATALYST 4.4 drivers on the Radeon 9800 XT card and CATALYST beta version 8.01.3-040420a2-015068E on the X800 cards. For the GeForce 6800 series cards, we used Forceware 61.11 beta drivers, and for the GeForce FX 5950, we used Forceware 60.72 beta 2. One exception: at the request of FutureMark, we used NVIDIA’s 52.16 drivers for all 3DMark benchmarking and image quality tests on the GeForce FX 5950 Ultra.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- FutureMark 3DMark03 Build 340
- Far Cry 1.1 with trdemo1.
- PainKiller SP demo 2
- Price of Persia: The Sands of Time 1.01
- Comanche 4 demo
- Quake III Arena v1.31 with trdemo1.dm_67
- Wolfenstein: Enemy Territory with demo0000.dm_82
- Serious Sam SE v1.07 with Demo0003
- Unreal Tournament 2004 with trdemo1.demo4
- Splinter Cell v1.2 with TRKalinatekDemo.bin
- ShaderMark 2.0 build 1e
- rthdribl 1.2
- D3D FSAA Viewer 5
- D3D RightMark beta 4
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
We’ll start off with our traditional look at fill rate, one of the more important factors in determining overall performance of a graphics card. Of course, raw theoretical fill rate can’t be divorced from its most likely real-world constraint, memory bandwidth, so we have numbers on that, too. The table below shows specs for some top-end graphics cards in the past two years or so, sorted in order of memory bandwidth.
|Core clock (MHz)||Pixel pipelines||Peak fill rate (Mpixels/s)||Texture units per pixel pipeline||Peak fill rate (Mtexels/s)||Memory clock (MHz)||Memory bus width (bits)||Peak memory bandwidth (GB/s)|
|GeForce FX 5800 Ultra||500||4||2000||2||4000||1000||128||16.0|
|Radeon 9700 Pro||325||8||2600||1||2600||620||256||19.8|
|Radeon 9800 Pro||380||8||3040||1||3040||680||256||21.8|
|Radeon 9800 Pro 256MB||380||8||3040||1||3040||700||256||22.4|
|Radeon 9800 XT||412||8||3296||1||3296||730||256||23.4|
|GeForce FX 5900 Ultra||450||4||1800||2||3600||850||256||27.2|
|Radeon X800 Pro||475||12||5700||1||5700||900||256||28.8|
|GeForce FX 5950 Ultra||475||4||1900||2||3800||950||256||30.4|
|GeForce 6800 GT||350||16||5600||1||5600||1000||256||32.0|
|GeForce 6800 Ultra||400||16||6400||1||6400||1100||256||35.2|
|GeForce 6800 Ultra Extreme||450||16||7200||1||7200||1100||256||35.2|
|Radeon X800 XT Platinum Edition||520||16||8320||1||8320||1120||256||35.8|
If fill rate were money, the Radeon X800 XT Platinum Edition would be Bill Gates. If it were water, the X800 XT would be Niagara Falls. If it were transistors, the X800 XT would be a GeForce 6800 Ultra.
Err, strike that last one.
However you put it, the Radeon X800 XT Platinum Edition has helluva lotta grunt. The GeForce 6800 Ultra Extreme isn’t far behind it, though, assuming they’ll both be largely limited by available memory bandwidth.
In the $399 category, the 12-pipe Radeon X800 Pro just leads the 16-pipe GeForce 6800 GT thanks to the Radeon’s much higher clock speeds. The memory bandwidth advantage, however, is clearly with the GeForce.
How do these numbers play out in synthetic fill rate tests?
In 3DMark03’s single-textured fill rate test, the GeForce 6800 Ultra cards actually manage to outdo the Platinum Edition, probably because of more efficient management of memory bandwidth during the execution of this test. When it comes time to apply more than one texture per pixel, the X800 XT PE edges ahead slightly. Let’s stop once more to remind ourselves what a huge performance leap we’re seeing from one generation to the next. The Radeon X800 XT PE delivers over twice the fill rate of the Radeon 9800 XT. Yikes.
Rightmark allows us to scale up the number of textures, unleashing the X800 XT PE’s inner demons. When applying one texture, the NVIDIA cards have a big relative advantage. Moving two textures, the picture changes, and the ATI cards look relatively stronger. Once we get into three or more textures, though, the Radeons dominate. So it’s a bit of a wash, depending on the scenario. Keep these differences in texturing and fill rate performance in mind as you read the rest of the results, because they will be enlightening.
We’re using scaling graphs to show just how fill rate and pixel shader performance differences separate the men from the boys, but keep your eyes on the rightmost scores on each graphand especially on the graphs with 4X antialiasing and 8X anisotropic filteringif you want to see the most relevant numbers. All of the next-gen cards perform well in UT2004, but the Radeon X800s do best. The X800 XT PE beats the GeForce 6800 Ultra by just over 10 frames per second at 1600×1200 with 4X AA and 8X AF, though the Extreme card narrows that gap a little. And the X800 Pro also tops the GeForce 6800 Ultra, surprisingly enough.
Quake III Arena
In ye olde Quake III Arena, the situation is reversed, with the GeForce 6800 GT outperforming the Radeon X800 XT PE. NVIDIA has long done well in OpenGL games, and the results here suggest that trend will continue.
This game is the big one, Doom 3 and Half-Life 2 all rolled into one and released in mid-2004, ahead of either of those big-name titles. Far Cry uses DirectX 9 shaders, per-pixel lighting and shadowing, and a host of related effects to make one of the best looking games we’ve ever seen.
I tested Far Cry using its built-in demo record and playback function, but I used FRAPS to get frame rates out of it. I don’t know why. Far Cry‘s demos don’t record character interactions, but I did get a good cross section of the game in, starting inside and moving through some tunnels, then going outside through thick vegetation for a stroll on the beach. All of the game’s quality options were set to “Very High.” Anisotropic filtering was set to the highest level the game would allow, and antialiasing quality was set to medium.
The ATI cards take this one, with the X800 XT PE well ahead of the GeForce 6800 Ultra Extreme. The X800 Pro darn near ties with the GeForce 6800 Ultra. You may have noticed, though, that our scores are way up on the GeForce 6800 Ultra from our initial review. NVIDIA fixed a Z culling bug in its drivers, and performance has leaped as a result. Unfortunately, the GeForce 6800 drivers still need some work to fix a number of image quality issues, including intermittent problems with shadowing on weapons. NVIDIA is also working with CryTek to alleviate some lighting artifacts that folks have noted on the GeForce 6800 Ultra. Apparently, some of the problems are driver based, while others will require a patch from CryTek. NVIDIA’s driver team is making its piece of the problem a top priority.
For now, the Radeon X800 cards run this premier game title faster and with better image quality than NVIDIA’s GeForce 6800 series. That has to be a little embarrassing given that this is an NVIDIA “The Way it’s Meant to be Played” game, but the reality is that the GeForce 6800 series is still a very new product, not yet shipping, with beta drivers, while ATI’s Radeon X800s are second-generation silicon shipping today.
Painkiller is another brand-new game with gorgeous graphics, but this one has a decidedly old-school shooter feel. Heaven’s got a hitman… and a crate opener, it would seem.
Since the Painkiller demo doesn’t have any sort of demo recording, I played through a game sequence in the same way three times for each card and averaged the results recorded by FRAPS. I’ve gained more confidence in this “manual” method of benchmarking during this process, because results were remarkably consistent between runs.
Chalk up another win in a new game for the Radeon cards. The X800 XT PE leads the pack, with the GeForce 6800 Ultra Extreme nearly 20 frames per second behind it. Interestingly enough, though, the Radeon X800 Pro manages a higher minimum frame rate, suggesting better playability on the cheaper Radeon X800 card than on the Extreme. Prince of Persia: The Sands of Time
In Price of Persia, I played through the game’s opening sequence, which is a mixture of canned animations and live gameplay, all rendered in the game engine. As with Painkiller, results were remarkably consistent from one run to the next on the same card.
This one is a mirror image of the Painkiller results, with the GeForce 6800 Ultra Extreme pulling far ahead of the Radeons, and the GeForce 6800 GT hitting a higher minimum framerate than the Radeon X800 XT PE.
AquaMark is an interesting case, because the NVIDIA cards have a clear advantage at lower resolutions, but as the fill rate needs increase, the Radeon X800s pull in the lead. Ultimately, at 1600×1200 with antialiasing and aniso, the Radeon X800 Pro surpasses the GeForce 6800 Ultra. Wolfenstein: Enemy Territory
Here’s another OpenGL game, and another big win for the NVIDIA cards. The GeForce 6800 GT outruns the high-end Radeon X800 XT PE by over 20 FPS.
Splinter Cell doesn’t work right with antialiasing, so we’ve limited our tests to non-AA modes. You can see the frame-by-frame performance of each card at 1600×1200 here, as well. However, with all the new GPUs bunched up at over 70 FPS, I’m thinking we’d better update our test suite. Serious Sam SE
Serious Sam can run in either Direct3D or OpenGL. Since the game’s default mode is OpenGL, we ran our tests with that API. To keep things on an even playing field, we used the “Default settings” add-on to defeat Serious Sam SE’s graphics auto-tuning features.
The GeForce 6800 cards perform very well in our third and final OpenGL game test, making it a sweep.
At FutureMark’s request, we are using NVIDIA’s 52.16 drivers for the GeForce FX 5950 in this test.
Whoa, that’s a squeaker! The GeForce 6800 Ultra juuuuuust manages to eke out the win over the Radeon X800 XT PE at 1600×1200. The higher-clocked Ultra Extreme puts an exclamation point on the win. However, at 1024×768 where most 3DMark03 scores are recorded and compared, the X800 XT PE beats the GeForce 6800 Ultra, losing only to the Extreme edition. Things aren’t so close in the $399 card race, where the 6800 GT opens up a comfortable lead over the X800 Pro.
The GeForce 6800 cards take 3DMark03’s overall score by being faster in three of the four game tests. However, in the pixel-shader-laden Mother Nature scene, the Radeon cards lead the way.
3DMark’s pixel and vertex shader tests produce opposing results, with the GeForce 6800s running the pixel shader test faster, and the Radeon X800s leading in the vertex shader test.
The Mother Nature scene from 3DMark has been the source of some controversy over time, so I wanted to include some screenshots to show how the three cards compare. On this page and in all the following pages with screenshots, you’re looking at low-compression JPEG images. You can click on the image to open a new window with a lossless PNG version of the image.
The results look very similar between all the cards, at least to my eye.
ShaderMark is intended to test pixel shader performance with DirectX 9-class pixel shaders. Specifically, ShaderMark 2.0 is geared toward pixel shader revisions 2.0 and 2.0a. (Version 2.0a or “2.0+” uses longer programs.) ShaderMark also has the ability to use a “partial precision” hint on NVIDIA hardware to request 16-bit floating point mode. Otherwise, the test uses 32 bits of precision on NVIDIA cards and, no matter what, 24 bits per color channel on the Radeon chips due to their pixel shader precision limits. To keep the graph a little less busy, I’ve only included partial precision and 2.0a results for the GeForce 6800 Ultra.
The GeForce cards can’t run some of the tests because ShaderMark is requesting a surface format they don’t support. I’m hopeful ShaderMark will be revised to use a more widely supported format.
The results here are too varied to declare any one winner. You can see, though, that the GeForce 6800 Ultra Extreme and Radeon X800 XT PE are vying together for the top performance in most tests. Where they are not, the GeForce 6800 Ultra with partial precision enabled leads the pack. This is quite a good showing for GeForce 6800 chips given their clock speed handicaps. NVIDIA seems to lead ATI in clock-for-clock performance here.
One of the quirks of running ShaderMark on the GeForce 6800 Ultra with 32-bit precision was a texture placement problem with the background texture (not the pixel shaders or the orb thingy, just the background.) The problem didn’t show up in “partial precision” mode, but it did in FP32 mode. Since the changing background texture is distracting, and since 16 bits per color channel is more than adequate for these pixel shaders, I’ve chosen to use the “partial precision” images from the GeForce 6800 Ultra.
The images shown are the GeForce 6800 Ultra screenshots, until you run your mouse over them, at which point the Radeon-generated images will appear. Be warned, though, the differences are very subtle.
All in all, not much to talk about. That’s good, I suppose.
Generally, I’ll show all the cards on a test like this, but this time around, I decided to include results from the GeForce 6800 Ultra using the 60.72 drivers. Since the 61.11 drivers don’t disable trilinear optimizations properly, the results wouldn’t show what I’d like. Instead, you can see results from the 60.72 drivers for the GeForce 6800 Ultra using three modes: the regular results without trilinear optimizations, with NVIDIA’s “brilinear” optimizations enabled, and with the “high quality” anisotropic filtering option checked. Note, also, that I’m testing in Direct3D mode here, not OpenGL, since Direct3D is more commonly used for games nowadays.
The top-end cards are nearly impervious to texture filtering loads in Serious Sam, especially the Radeon X800 XT PE. Overall, though, most of the cards scale fairly similarly. You can see how the “brilinear” mode helps the GeForce 6800 Ultra’s performance when anisotropic filtering is in use.
Here’s a sample scene from Serious Sam, grabbed in Direct3D mode, that shows texture filtering in action along the wall, the floor, and on the inclined surface between the two.
The difference between “brilinear” filtering and true trilinear is difficult to detect with a static screenshot and the naked eyeat least, it is for me. Remember that the GeForce FX 5950 is doing “brilinear” filtering in all cases.
Once we dye the different mip maps colors using Serious Sam’s built-in developer tools, we can see the difference between the filtering methods more clearly.
Here are NVIDIA’s trilinear optimizations at work. Mip-map boundary transitions aren’t as smooth as they are on the Radeon X800 XT and on the GeForce 6800 Ultra with “brilinear” disabled. Notice the odd mixture of filtering going on in the 61.11 drivers with optimizations disabled. The floor and angled surface look like they should, but the wall’s mip map boundaries show the dark banding indicative of the “brilinear” method.
Once again here, all the cards look pretty good, and very similar, to the naked eye.
With colored mip maps and aniso filtering, the differences become very clear. With optimizations disabled, NVIDIA’s 60.72 drivers did filtering almost exactly equivalent to ATI’s, but the 61.11 drivers do something less than that, though the degree varies depending on the angle of the surface.
The Radeon X800’s antialiasing hardware hasn’t changed significantly from the Radeon 9800 XT, but ATI does have a few tricks up its sleeve. Let’s start with a look at traditional multisampling edge AA, which ATI and NVIDIA do fairly similarly.
The X800 cards’ AA performance is about what one might expect. The big news here, believe it or not, is the 8X AA mode from NVIDIA. The 6800 Ultra is performing much better than it did in 8X mode with older drivers. Let’s have a look at the AA sample patterns to see if we can discern why. Remember, green represents texture sample points, and red represents geometry sample points.
|GeForce FX 5950 Ultra|
|GeForce 6800 Ultra|
|Radeon 9800 XT|
|Radeon X800 XT|
The X800 doesn’t vary from the path taken by its predecessor, with the same stock sample patterns for 2X, 4X, and 6X modes. The GeForce 6800, though, has a new sample pattern for both 4X and 8X AA. In fact, the 8X mode is now identified in NVIDIA drivers as “8xS” mode, and it’s a new pattern from the one we saw in our initial review of the GeForce 6800 Ultra. The fact there are only two geometry sample points suggests that NVIDIA’s new driver has enabled a 2X supersamling/4X multisampling mode, replacing the 4X supersampling/2X multisampling method we saw earlier. That explains the performance jump in 8X mode for the GeForce 6800 cards, because multisampling is more efficient.
That’s progress for NVIDIA, but ATI’s sparse sampling pattern for 6X AA is not aligned to any kind of grid, making it very effective in practice. Also, ATI’s cards use gamma-correct blending to improve results. When we interviewed Tony Tamasi of NVIDIA recently, he said the GeForce 6800 is capable of gamma-correct blends, as well, but we’ve since confirmed that the feature won’t be enabled until a future driver release.
Temporal AA varies the AA sample pattern from one frame to the next, taking advantage of display persistence to create the effect of a higher degree of antialiasing. In theory, at least, this trick should allow the performance of 2X AA while giving the perceived image quality of 4X AA. To give you some idea what’s going on here, the table of sample patterns below shows the two temporal patterns used in each mode, then superimposes them on top of one another.
|Temporal pattern 1|
|Temporal pattern 2|
That’s the basic layout of temporal AA. The trick is achieving that effective pattern, and that brings with it some limitations. For one, temporal AA has the potential to introduce noise into a scene at polygon edges by varying what’s being rendered from frame to frame. That’s not good. To mitigate this effect, ATI has instituted a frame rate threshold for temporal AA. If frame rates drop below 60 frames per second, the X800 will revert to a single, fixed sample pattern. Below 60Hz or so, depending on the persistence of the display, temporal AA is likely to show up as flickering edges. Above 60Hz, though, it can be rather effective.
Temporal AA’s other limitations aren’t necessarily everyday drawbacks, but they present problems for the reviewer. For one, there should be little overhead for temporal AA, so that an effective 8X temporal AA mode ought to perform about like a normal 4X AA mode. However, in order to work properly, temporal AA needs and requires that vertical refresh sync be enabled. With vsync enabled, testing performance becomes difficult. Basically, performance testing is limited to frame rates above 60 FPS and below a reasonable monitor refresh rate. I haven’t yet devised a proper temporal AA performance test.
Also, capturing screenshots of temporal AA is hard. I think I can illustrate the effect, though. Have a look at the images below. The first two are bits of screenshots from UT2004, magnified to 4X their original size. The “averaged patterns” and “diff” shots were created in an image processing program.
|Pattern 1||Pattern 2||Averaged patterns|| Diff between
patterns 1 and 2
The pattern 1 and 2 images are from ATI’s 6X temporal AA mode, and you can see how the two sample patterns produce different results. When combined, even at 6X AA, the final result looks much smoother, as the average patterns image illustrates. To highlight the variance between the two patterns, I’ve also run at “diff” operation between the two sample images, and you can see in the result how the edge pixels are covered differently by the two patterns.
Here’s a larger picture showing the difference between two edge patterns. Unfortunately, the sky texture moves around in UT2004, so it shows up here, as well.
So there is good reason to vary the patterns, even with 6X AA.
In practice, temporal AA is nifty, but not perfect. With the right game and the right frame rates, 2X temporal AA looks for all the world like 4X AA. The difference is more subtle with higher AA modes, as one might expect, but temporal AA can be very effective. However, the 60 FPS cutoff seemed a little low on the Trinitron CRT I used for testing. I could see discernible flicker on high-contrast edges from time to time, and I didn’t like it. Also, the 60 fps cutoff causes temporal AA to switch on and off periodically during gameplay. The difference between 2X “regular” and 2X temporal (or 4X effective) AA is large enough that I was distracted by the transition.
That said, if I were using a Radeon X800 in my main gaming rig, I would probably enable temporal AA and use itespecially if I already were using 4X AA or the like. The Radeon X800 cards tend to spend enough time above 60 fps to make it very useful.
We’ll start off with non-AA images, just to establish a baseline.
2X AA shows little difference between the cards.
As we’ve noted before, ATI’s gamma-correct blending seems to do a slightly better job eliminating stairstep jaggies on near-horizontal and near-vertical lines.
Here the difference becomes more apparent. Although NVIDIA’s new 8xS mode looks very good, ATI’s 6X mode is more effective than NVIDIA’s 8X mode, thanks to gamma-correct blends. Since ATI’s 6X mode also performs better, that’s quite the advantage. Add in temporal AA, and ATI has a pronounced lead in antialiasing overall right now.
However, I should mention that NVIDIA’s 8xS mode does do more than just edge AA, and there are image quality benefits to supersampling that likely won’t show up in a still screenshot.
With each of the graphics cards installed and running, I used a watt meter to measure power draw on our test system, as shown in our Testing Methods section, to see how many watts it pulled. The monitor was plugged into a separate power source. I took readings with the system idling at the Windows desktop and with it running a real-time HDR lighting demo at 1024×768 full-screen.
The Radeon X800 Pro actually pulls less juice than the Radeon 9800 XT, believe it or not, and the Radeon X800 XT PE only requires a little more power than the 9800 XT. At idle, the new ATI GPUs consume much less power than the 9800 XT, as well. The GeForce 6800 GT looks reasonably good on the power consumption front, as well, but the GeForce 6800 Ultra Extreme is another story. This card runs hot and loud, its cooler blowing a notch faster than the other cards when it’s just sitting idle.
High dynamic range rendering
Finally, I’ve got to include just one shot from the “dribble” demo, just so you can see what the frame rate counter’s showing on the Radeon X800 XT Platinum Edition. You can see compare frame rate counters from other cards here.
ATI has really raised the performance bar dramatically with this chip. Getting three times the performance of a Radeon 9800 XT in this shader-intensive demo is no small thing.
The Radeon X800 series cards perform best in some of our most intensive benchmarks based on newer games or requiring lots of pixel shading power, including Far Cry, Painkiller, UT2004, and 3DMark03’s Mother Nature sceneespecially at high resolutions with edge and texture antialiasing enabled. The X800s also have superior edge antialiasing. Their 6X multisampling mode reduces edge jaggies better than NVIDIA’s 8xS mode, and the presence of temporal antialiasing only underscores ATI’s leadership here. With a single-slot cooler, one power connector, and fairly reasonable power requirements, the Radeon X800 XT Platinum Edition offers all its capability with less inconvenience than NVIDIA’s GeForce 6800 Ultra. What’s more, ATI’s X800 series will be in stores first, with a more mature driver than NVIDIA currently has for the GeForce 6800 line. The folks at ATI have improved mightily on the R300 design with the R420, successfully delivering the massive performance leap necessary to keep pace with NVIDIA’s new GPUs. The achievement of ATI’s demo team with the Ruby demo is a heckuva reminder that ATI knows what it’s doing with DirectX 9-class graphics, and a very strong argument that the X800’s new, longer shader instruction limits don’t preclude much higher quality graphics in real time than anything we’ve seen from game developers yet.
However, NVIDIA’s GeForce 6800 cards are no pushovers this time around. The GeForce 6 cards are faster in OpenGL, in many older games, and in Prince of Persia: The Sands of Time. ShaderMark 2.0 is very close, too, proving that NVIDIA’s new pixel shaders are very capable, even with a distinct clock speed deficit. The GeForce 6800 GPUs have some natural advantages, including support for Shader Model 3.0 with longer shader programs, dynamic flow control, and FP16 framebuffer blending and texture filtering. Down the road, these capabilities could prove useful for creating advanced visual effects with the highest possible fidelity.
Right now, though, NVIDIA needs to concentrate on getting some basics right. The NV40 is a novel chip architecture, and its drivers are very much in the beta stages. We’d like to see better results in newer titles like Far Cry, antialiasing blends that account for display gamma, and a consistent means of banishing “brilinear” filtering optimizations. Ideally, NVIDIA would make “brilinear” an option but not the default; the GeForce 6800 series is too good and too fast to need this crutch. It’s possible NVIDIA will have worked out all of these problems by the time GeForce 6800 cards arrive in stores.
At present, ATI appears to be slightly ahead of NVIDIA, but its superiority isn’t etched indelibly in silicon the way it was in the last generation of GPUs. The GeForce 6800 is an extremely capable graphics chip, and we don’t know yet how good it may become. Whatever happens, you can see why I said this generation of GPUs presents us with a choice between better and best. These cards are all killer performers, and having seen Far Cry running on them fluidly, I can actually see the logic in parting with four or five hundred bucks in order to own one.