ATI’s Radeon 9800 Pro graphics card

ATI’S RADEON 9700 PRO HAS been a resounding success, capturing the leads in both graphics technology and performance for ATI upon its introduction last fall, and holding the crown to today—at least in terms of products shipping in volume. NVIDIA’s GeForce FX 5800 Ultra may have captured at least part of the technology and performance titles for itself, but the cards are so rare, we haven’t even been able to secure one for review.

The new Radeon 9800 Pro is all about solidly winning the graphics lead for ATI, and these cards are set to hit store shelves this month—quite possibly before any of the high-end GeForce FX cards arrive. With fast 256-bit DDR memory, improved pixel shaders, and more efficient use of memory bandwidth, the Radeon 9800 Pro looks to be the new king of the hill. Read on as we examine the 9800 Pro in detail, exploring the performance and technology behind ATI’s latest and greatest.

The R350 VPU debuts
The Radeon 9800 Pro is based on the chip code-named R350. R350 is, as you might expect, derived from ATI’s R300 chip, which powers ATI’s Radeon 9500 and 9700 lineups. We’ve already reviewed the Radeon 9700 Pro in some depth, and I will try to avoid repeating myself here. If you want to understand the technology from which the R350 is derived, please read our 9700 review.

The key things you need to know about the R350 chip are fairly basic. Like the R300, the R350 is manufactured using 0.15-micron process tech, and like the R300, it has 8 pipelines with one texture unit per pipe. The R350’s 256-bit DDR memory interface runs at a higher clock speed, which allows the chip to have even more memory bandwidth than the Radeon 9700 Pro. The Radeon 9800 Pro will debut with an effective 680MHz memory clock speed, which gives it a very healthy 21.8GB/s of memory bandwidth. The Radeon 9700 Pro, by contrast, topped out at 19.8GB/s.

No, that’s not a huge gain in terms of bandwidth overall, but it’s not bad. ATI has achieved more throughput than any other consumer graphics chip, and they’ve done so without resorting to a Dustbuster appendage. Hard to argue with that.

ATI has taken several measures to allow the R350 to put its memory bandwidth to good use. The clock speed of the R350 chip is 380MHz, while the R300 peaked at 325MHz in the Radeon 9700 Pro. Also, the company has tuned the chip’s memory controller to better arbitrate reads and writes during heavy use, which should especially help performance when rendering antialiased pixels. Finally, the R350’s has an improved cache for Z-buffer reads and writes, to aid in the bandwidth-intensive task of handling pixel depth information. (That is, info about a pixel’s position on the Z axis.) ATI says this cache has been optimized to work better with stencil buffer data, which should help when developers use stencil shadow volumes to create shadowing effects in future games like Doom III.

Your new graphics catch phrase: F-buffer
The most significant piece of new technology in the R350, however, is more than a simple performance tweak. One of the NVIDIA GeForce FX’s key advantages over the Radeon 9700 is its ability to execute pixel shader programs as long as 1024 instructions. The pixel shaders on the R300 chip are limited to program lengths of 64 instructions, which simply isn’t enough to create some of the more compelling shader effects developers might want to use. In order to produce more complex effects, the R300 would have to resort to multi-pass rendering. Multi-pass rendering is nifty because it overcomes a lot of technical limitations, but it’s a performance killer because it duplicates lots of work unnecessarily. Essentially, to provide really complex shader effects in real time, you want to avoid making multiple rendering passes, at least in the traditional sense of full passes through the GPU pipeline. The GeForce FX can do so, but the R300 can’t.

To understand why all of this multi-pass stuff matters and to get a sense why I get all hot and bothered when talking about DirectX 9-class hardware, go read my article about such things. There, I identified pixel shader program lengths as a noteworthy advantage for NVIDIA way back in August.

ATI has addressed the R300’s pixel shader limitations in R350 by implementing something called an F-buffer. (ATI says the “F” stands for “fragment stream FIFO buffer,” in case you were wondering.) The R350’s F-buffer allows it to execute pixel shader programs of arbitrary instruction lengths, more than bringing it on par with NVIDIA’s GeForce FX. The genesis of the F-buffer idea was a paper by William R. Mark and Kekoa Proudfoot at Stanford University. Mark and Proudfoot suggested the F-buffer as a means of storing intermediate results of rendering passes without writing each pixel to the frame buffer and taking another trip through the graphics pipeline.


Source: ATI

Storing intermediate results in a FIFO buffer not only offers the potential for big performance increases over traditional multi-pass techniques, it also sidesteps a number of problems. For instance, multi-pass rendering doesn’t handle transparent or translucent surfaces particularly well. In this case, the chip must perform a color blend operation before writing the pixel to the framebuffer, which can cause problems with the look of the final, rendered output. The F-buffer, however, can store both foreground and background pixel fragments and perform additional operations on them both—no blend ops needed between passes.

The F-buffer approach does have some limitations, but they aren’t show-stoppers, from what I gather. However, as with traditional multi-pass approaches, pixel shader programs will have to be structured to account for the GPU’s per-pass rendering limitations.

Of course, in the new worlds of DirectX 9 and OpenGL 2.0, such things generally ought to be handled by compilers. Shader programs will largely be written in high-level shading languages like MS’s HLSL and broken down into passes by a runtime compiler. With high-level shading languages, developers need not think much about the hardware’s per-pass limitations.

Honestly, I didn’t expect ATI to address the R300’s 64-instruction pixel shader limit with this “half-generation” refresh chip, but they’ve apparently done so. The verdict is still out on how R350’s approach compares to NVIDIA’s GeForce FX chips, mainly because we don’t yet have enough information about the NVIDIA chip to understand precisely how these two chips compare. My sense is that the NV30 chip in the GeForce FX offers a little more complexity and flexibility than the F-buffer approach, but the real-world differences in performance and rendering output are likely to be minor.

In all, the F-buffer is a crucial enhancement to the R350 that reasserts ATI’s technology leadership in graphics. The concept is fundamentally simple, as many good innovations in computers are, but the impact of the change is profound.

 

The Radeon 9800 Pro card
OK, enough of the egghead graphics geek stuff. Let’s check out the new hardware. Here are some beauty shots of the card:

The 9800 Pro card isn’t wildly different from the Radeon 9700 Pro. Many of the same bits are in the same places, although the board itself is about a half-inch longer than the 9700 cards. All the standard graphics card stuff is there, including VGA, DVI, and TV-Out ports. One especially welcome addition is that cute little heatsink on the card’s voltage regulator. The metal plate on the back side of the original 9700 cards tended to get very hot, and the heatsink seems like a better design.

Remarkably, the card’s main heatsink and cooler is actually smaller than the stock ATI cooler on the 9700 Pro. The unit’s fan has more blades at a sharper angle than the 9700’s fan, but noise levels seem largely unchanged. As with the 9700, the Radeon 9800 Pro is barely audible amidst the noise generated by a standard CPU heatsink/fan combo and a power supply fan.

ATI uses Samsung DDR memory chips on the 9800 Pro, and unlike the 9700, the 9800 card uses a four-pin Molex-type connector for auxiliary power. These connectors are more abundant in most systems than the smaller floppy-drive power connector on the 9700 cards, so the change is welcome.

Now that we’ve ogled the card, let’s see how it performs.

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test system was configured like so:

  System
Processor Athlon XP ‘Thoroughbred’ 2600+ 2.083GHz
Front-side bus 333MHz (166MHz DDR)
Motherboard Asus A7N8X Deluxe
Chipset NVIDIA nForce2
North bridge nForce2 SPP
South bridge nForce2 MCP-T
Chipset drivers 2.03
Memory size 512MB (2 DIMMs)
Memory type Corsair XMS3200 PC2700 DDR SDRAM (333MHz)
Sound nForce2 APU
Storage Maxtor DiamondMax Plus D740X 7200RPM ATA/100 hard drive
OS Microsoft Windows XP Professional
OS updates Service Pack 1, DirectX 9.0

The test system’s Windows desktop was set at 1024×768 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used Catalyst revision 7.84 drivers for the Radeon 9800 Pro, and ATI cards and Catalyst 3.1 (7.83) drivers for the 9700 Pro. We used NVIDIA’s Detonator 42.68 drivers for the NVIDIA cards, which aren’t yet publicly available (wink, wink), but include optimizations for 3DMark03.

We used the following versions of our test applications:

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

Synthetic tests
We’ll kick off our review with a series of synthetic tests, so we can see how the Radeon 9800 Pro’s enhancements have helped specific kinds of performance versus the 9700 Pro. We’ll also use some more general application-based tests to gauge real-world performance, as well.

Fill rate
First and foremost in graphics performance, as always, is fill rate. We are talking raw, pixel-pushing power here, and in this case, specs matter. Specifications aren’t destiny for a GPU, but they are a full-ride scholarship to the college of its choice plus free tutoring whenever needed. In this department, the 9800 is especially well-endowed. Behold, the trusty chip chart, sorted conveniently in order of memory bandwidth:

  Core clock (MHz) Pixel pipelines  Peak fill rate (Mpixels/s) Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s)
Radeon 9600 325 4 1300 1 1300 400 128 6.4
GeForce4 Ti 4200 8X 250 4 1000 2 2000 512 128 8.2
Radeon 9500 275 4 1100 1 1100 540 128 8.6
Radeon 9500 Pro 275 8 2200 1 2200 540 128 8.6
Radeon 9600 Pro  400 4 1600 1 1600 600 128 9.6
GeForce4 Ti 4600

300

4 1200 2 2400 650 128 10.4
GeForce FX 5800 400 4 1600 2 3200 800 128 12.8
GeForce FX 5800 Ultra 500 4 2000 2 4000 1000 128 16.0
Radeon 9700 275 8 2200 1 2200 540 256 17.3
Parhelia-512 220 4 880 4 3520 550 256 17.6
Radeon 9700 Pro 325 8 2600 1 2600 620 256 19.8
Radeon 9800 Pro 380 8 3040 1 3040 680 256 21.8

The 9800 Pro is the pimp-daddy of pixel-pushing prowess, with over 3 gigapixels per second of single-textured fill rate. With two-layer multitexturing in the picture, the GeForce FX 5800 Ultra has a higher peak texel fill rate, but the FX may lack the memory bandwidth to keep up with the 9800 Pro. In the special case of four-layer multitexturing, Matrox’s Parhelia leads the pack, but theory and reality rarely collide there.

Overall, the fill rate increase (and corresponding memory bandwidth increase) from the 9700 Pro to the 9800 Pro is fairly modest. However, since the 9800 Pro supplants the 9700 Pro at the same price point, the gains are quite welcome.

Here’s how the 9800 Pro’s measured up in our synthetic tests.

The 9800 Pro nicely distances itself from its older sibling, and it absolutely crushes the previous-gen NVIDIA card in single-textured fill rate. With multitexturing, the 9800 Pro is still quite potent.

Diss enjoys playing theoretical versus actual with fill rate, so we can see how memory bandwidth limitations and other mitigating factors keep the chips from reaching their peak potential. (Of course, with synthetic tests, those limitations don’t always show up quite as much as they can in real applications.) I’ll indulge him. Here’s the dirt.

The 9800 Pro stays a ways below its theoretical peak pixel fill rate with single texturing, but when multiple textures are required, it’s able to achieve near-perfection in the 3DMark fill rate test.

Before we go on, I should note that for a really high-end card like the 9800 Pro, the most important sorts of fill rate performance aren’t the straight-up numbers we’re measuring here. The 9800 Pro can deliver amazing performance with anisotropic filtering and edge antialiasing enabled, which is where a card like this really excels. We will look at antialiasing and texture filtering performance later in the review, but I wanted to mention that fact now. The R350 GPU employs sophisticated techniques like 6:1 color compression to make AA performance especially smooth. In fact, our entire test suite should probably be revised to better account for antialiasing and the like, but we didn’t have time to make those revisions since the 9800 Pro arrived in our labs just this past Saturday.

Occlusion detection
VillageMark tests fill rate, including the ability of a GPU to avoid drawing pixels that won’t make it onscreen—a.k.a. overdraw. These pixels are generally situated behind others in a scene and are therefore obscured. The R3x0 chips use a technique called Early Z to “virtually eliminate” overdraw. VillageMark renders a scene with gobs of overdraw, and it’s up to the cards to cope.

The 9800 comes through looking good, although it doesn’t appear to be wildly more efficient than the 9700 Pro. The GeForce4 Ti chip lacks Early Z, though it does have some overdraw-reduction abilities, and the disparity shows.

 

Pixel shader performance
Next up, we have a series of pixel shader tests to see how these cards handle fancy-pants effects. Unfortunately, only one of our tests uses the pixel shader 2.0 standard from DirectX 9, and none of the tests will flex the F-buffer at all. (DX9 pixel shaders are limited to 64 instructions.) These tests will show us generally how the cards’ pixel shaders will perform in current games, however.

Guess what? The 9800 Pro is very fast, and the old GF4 Ti nearly gets a hernia trying to keep up. You can see why it’s crucial for NVIDIA to get GeForce FX products to stores soon.

Now let’s try NVIDIA’s own ChameleonMark.

For DX8-class pixel shaders, ATI’s R3x0 chips are worlds ahead of the GF4 Ti. Let’s try what’s currently our only DX9 pixel shader test, from the new 3DMark03. Because it’s a DX8-class chip, the GeForce4 Ti will have to sit this one out.

The extra clock speed and memory bandwidth give the 9800 Pro a slight edge over its predecessor here.

 
Polygon throughput and vertex shader performance
Now we’ll try out some vertex shader and lighting tests to see what kind of pain the 9800 Pro can inflict on its competition in this department.

Like R300, the R350 exhibits amazing vertex shader throughput, more than doubling the performance of the GF4 Ti 4600. Oddly, the R300 matches the R350 in the 3DMark2001 SE test until fill rate considerations intrude at higher resolutions. The 3DMark03 test, however, shows the R350 to be faster.

Let’s look at legacy transform and lighting now. As ever, all these chips use a vertex shader program to emulate an older-style T&L unit.

You were expecting something else? The 9800 Pro again prevails.

 

Quake III Arena
Now to the real game tests. We tested with a Quake III demo from a CPL match involving my bud Fatal1ty, of course. It’s a long-ish demo for benchmarking, but it should be a nice test. You can grab the demo from our server here, at least until we find out the thing is copyrighted somehow.

The GF4 Ti card puts in a relatively strong showing in this older game, but as screen resolutions increase, the 9800 Pro asserts its superiority.

Comanche 4
The shader-aware Comanche 4 offers us a chance to test DirectX gaming performance.

Here’s a case where our test system, based on an Athlon XP 2600+ with a 333MHz front-side bus and dual-channel DDR333 memory, is entirely the limiting performance factor. Only at 1600×1200 does the GF4 Ti 4600 card begin to dip below about 45-47 frames per second, and only by a smidgen. Oh well. Not every game is limited by graphics card performance.

Codecreatures Benchmark Pro
The Codecreatures engine is also a DirectX 8-class application, so it may give us some insights into DX8 performance that Comanche 4 couldn’t.

The 9800 Pro at 1600×1200 nearly matches the GF4 Ti card at 1024×768, and the 9800 Pro shows some decent performance gains over the 9700 Pro.

 

Unreal Tournament 2003
Now for a really real DirectX 8-class game engine.

Once more, the 9800 Pro comes out on top.

 

Serious Sam SE
We wanted to really stress the 9800 Pro for once, so this time around, we used Serious Sam’s “Extreme Quality” add-on to set graphics options. This should be largely an apples-to-apples comparison from one card to the next, but the Radeon cards are doing 16X anisotropic filtering here, and the TI 4600 is doing its maximum of 8X aniso. However, with the adaptive aniso algorithms that ATI and NVIDIA use, the difference between 8X aniso and 16X aniso is very minor.

The 9800 Pro barely pulls ahead of the 9700 Pro here, but both Radeons deliver Serious Sam at extremely high-quality graphics settings at over 60 fps at 1280×1024—an impressive accomplishment.

Let’s look at a timescale graph to see how much the cards’ performance varies over the course of our test demo.

As you can see, the 9800 Pro delivers more than just nice frame rate averages over time. Even in our demo’s worst-case scenarios, it runs well ahead of the GF4 Ti card.

 

3DMark2001 SE
For completeness, I’ve included 3DMark2001, which is a decent DirectX 8 performance test. I won’t comment much on the results, because there’s not much more to say.

Again, the 9800 Pro rolls, especially at high resolutions.

 

3DMark03
3DMark03 has become a bit controversial. Please see my article about the controversy to better understand the issues. Nonetheless, we’ll offer results from this benchmark, because it uses more advanced rendering techniques than most anything else we can use to test.

The 3DMark03 composite score is above is just a weighted average of the four game tests below. Also, note that the GF4 Ti card isn’t able to run game test 4, because that test requires DX9 compatibility. This missing score harms the GF4’s showing in the 3DMark composite score.

Games 2 and 3 use stencil shadow volume for shadowing, so the R350’s Z-cache optimizations come into play here. Nevertheless, the 9800 Pro isn’t too much faster than the 9700 Pro.

 

Edge antialiasing
Now for a real test of the 9800 Pro’s abilities. As resolutions and antialiasing sample counts go up, even the 9800 Pro should be strained.

Amazingly, the 9800 Pro never dips below 100 frame per second, even with 6X antialiasing at 1280×1024. In current and older games, the 9800 Pro should be able to run 6X AA all the time, without penalties. And ATI’s gamma-corrected AA is the best looking AA we’ve ever seen.

Texture antialiasing
Another way to enjoy the benefits of the 9800 Pro’s performance is to increase visual quality though texture filtering. We’ve already looked at how texture filtering improves images in our 9700 Pro review, complete with screenshots. The 9800 Pro’s output shouldn’t be visually different from the 9700 Pro, so we’ll skip to the benches.

With texture filtering, the 9800 Pro scales like what it essentially is: a faster Radeon 9700.

 
Conclusions
We will have to buy, beg, borrow, or steal a GeForce FX 5800 Ultra card before we can say with complete confidence that the Radeon 9800 Pro is the fastest, more feature-complete graphics card anywhere, but judging by our experience with the card, the FX will be hard pressed to keep up. This thing is amazingly fast, and preliminary reviews of the GeForce FX 5800 Ultra show it tying with, or slightly outrunning, the Radeon 9700 Pro. By all rights, the 9800 Pro should be the fastest card on the planet.

Technology-wise, the addition of the F-buffer is the single most important enhancement I could imagine ATI making to its R300 chip. It will probably be a very long time before we see the benefits of the F-buffer exercised in real games, but on the Quadro/FireGL side of the house, this improvement should be widely used. Importantly, ATI has found an adequate reponse to the GeForce FX’s ability to run longer shader programs. Perhaps the only significant difference between the two left is the maximum floating-point pixel shader precision, where the FX has an edge, 128 bits to 96 bits.

At $399, the Radeon 9800 Pro 128MB isn’t cheap. However, it may well arrive in stores before the GeForce FX 5800 line does, which would be a major coup for ATI. Retail availability is planned for later this month, and ATI’s ability to deliver a test card to us today may be a testament to its ability to build these cards in volume. NVIDIA is still unable to do so.

ATI is planning an “amateur” Radeon 9800 for $349 with slightly lower clock speeds (the exact speeds hasn’t been finalized yet) and 128MB of memory. That version of the R350 should be available some time in the second quarter of this year. For the price of a small house, you’ll also be able to pick up a Radeon 9800 Pro card sporting 256MB of DDR-II memory. ATI claims this will be the first gamers’ card with 256MB of memory, and the new DDR-II type RAM should offer even more bandwidth than the current 9800 Pro.

If you’re a hard-core gamer who feeds on eye candy and can’t get enough antialiasing, high resolutions, and texture filtering, it’s time to put a kidney for sale on eBay. The Radeon 9800 Pro is coming, and you won’t want to miss it. 

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!