Single page Print

ATI's Radeon 9800 Pro graphics card

Time to hock a kidney, kids
— 12:00 AM on March 6, 2003

ATI'S RADEON 9700 PRO HAS been a resounding success, capturing the leads in both graphics technology and performance for ATI upon its introduction last fall, and holding the crown to today—at least in terms of products shipping in volume. NVIDIA's GeForce FX 5800 Ultra may have captured at least part of the technology and performance titles for itself, but the cards are so rare, we haven't even been able to secure one for review.

The new Radeon 9800 Pro is all about solidly winning the graphics lead for ATI, and these cards are set to hit store shelves this month—quite possibly before any of the high-end GeForce FX cards arrive. With fast 256-bit DDR memory, improved pixel shaders, and more efficient use of memory bandwidth, the Radeon 9800 Pro looks to be the new king of the hill. Read on as we examine the 9800 Pro in detail, exploring the performance and technology behind ATI's latest and greatest.

The R350 VPU debuts
The Radeon 9800 Pro is based on the chip code-named R350. R350 is, as you might expect, derived from ATI's R300 chip, which powers ATI's Radeon 9500 and 9700 lineups. We've already reviewed the Radeon 9700 Pro in some depth, and I will try to avoid repeating myself here. If you want to understand the technology from which the R350 is derived, please read our 9700 review.

The key things you need to know about the R350 chip are fairly basic. Like the R300, the R350 is manufactured using 0.15-micron process tech, and like the R300, it has 8 pipelines with one texture unit per pipe. The R350's 256-bit DDR memory interface runs at a higher clock speed, which allows the chip to have even more memory bandwidth than the Radeon 9700 Pro. The Radeon 9800 Pro will debut with an effective 680MHz memory clock speed, which gives it a very healthy 21.8GB/s of memory bandwidth. The Radeon 9700 Pro, by contrast, topped out at 19.8GB/s.

No, that's not a huge gain in terms of bandwidth overall, but it's not bad. ATI has achieved more throughput than any other consumer graphics chip, and they've done so without resorting to a Dustbuster appendage. Hard to argue with that.

ATI has taken several measures to allow the R350 to put its memory bandwidth to good use. The clock speed of the R350 chip is 380MHz, while the R300 peaked at 325MHz in the Radeon 9700 Pro. Also, the company has tuned the chip's memory controller to better arbitrate reads and writes during heavy use, which should especially help performance when rendering antialiased pixels. Finally, the R350's has an improved cache for Z-buffer reads and writes, to aid in the bandwidth-intensive task of handling pixel depth information. (That is, info about a pixel's position on the Z axis.) ATI says this cache has been optimized to work better with stencil buffer data, which should help when developers use stencil shadow volumes to create shadowing effects in future games like Doom III.

Your new graphics catch phrase: F-buffer
The most significant piece of new technology in the R350, however, is more than a simple performance tweak. One of the NVIDIA GeForce FX's key advantages over the Radeon 9700 is its ability to execute pixel shader programs as long as 1024 instructions. The pixel shaders on the R300 chip are limited to program lengths of 64 instructions, which simply isn't enough to create some of the more compelling shader effects developers might want to use. In order to produce more complex effects, the R300 would have to resort to multi-pass rendering. Multi-pass rendering is nifty because it overcomes a lot of technical limitations, but it's a performance killer because it duplicates lots of work unnecessarily. Essentially, to provide really complex shader effects in real time, you want to avoid making multiple rendering passes, at least in the traditional sense of full passes through the GPU pipeline. The GeForce FX can do so, but the R300 can't.

To understand why all of this multi-pass stuff matters and to get a sense why I get all hot and bothered when talking about DirectX 9-class hardware, go read my article about such things. There, I identified pixel shader program lengths as a noteworthy advantage for NVIDIA way back in August.

ATI has addressed the R300's pixel shader limitations in R350 by implementing something called an F-buffer. (ATI says the "F" stands for "fragment stream FIFO buffer," in case you were wondering.) The R350's F-buffer allows it to execute pixel shader programs of arbitrary instruction lengths, more than bringing it on par with NVIDIA's GeForce FX. The genesis of the F-buffer idea was a paper by William R. Mark and Kekoa Proudfoot at Stanford University. Mark and Proudfoot suggested the F-buffer as a means of storing intermediate results of rendering passes without writing each pixel to the frame buffer and taking another trip through the graphics pipeline.

Source: ATI

Storing intermediate results in a FIFO buffer not only offers the potential for big performance increases over traditional multi-pass techniques, it also sidesteps a number of problems. For instance, multi-pass rendering doesn't handle transparent or translucent surfaces particularly well. In this case, the chip must perform a color blend operation before writing the pixel to the framebuffer, which can cause problems with the look of the final, rendered output. The F-buffer, however, can store both foreground and background pixel fragments and perform additional operations on them both—no blend ops needed between passes.

The F-buffer approach does have some limitations, but they aren't show-stoppers, from what I gather. However, as with traditional multi-pass approaches, pixel shader programs will have to be structured to account for the GPU's per-pass rendering limitations.

Of course, in the new worlds of DirectX 9 and OpenGL 2.0, such things generally ought to be handled by compilers. Shader programs will largely be written in high-level shading languages like MS's HLSL and broken down into passes by a runtime compiler. With high-level shading languages, developers need not think much about the hardware's per-pass limitations.

Honestly, I didn't expect ATI to address the R300's 64-instruction pixel shader limit with this "half-generation" refresh chip, but they've apparently done so. The verdict is still out on how R350's approach compares to NVIDIA's GeForce FX chips, mainly because we don't yet have enough information about the NVIDIA chip to understand precisely how these two chips compare. My sense is that the NV30 chip in the GeForce FX offers a little more complexity and flexibility than the F-buffer approach, but the real-world differences in performance and rendering output are likely to be minor.

In all, the F-buffer is a crucial enhancement to the R350 that reasserts ATI's technology leadership in graphics. The concept is fundamentally simple, as many good innovations in computers are, but the impact of the change is profound.