ATI’s Radeon 9700 Pro graphics card

THOSE OF YOU who have read my preview of next-gen graphics chips will know that I’m optimistic about the prospects for DirectX 9-class graphics hardware. The Radeon 9700 Pro graphics card we’re reviewing today is the first DX9-class product to hit the market, so we’re naturally excited about it.

However, I was surprised to see that the Radeon 9700 didn’t ship with a single piece of code capable of taking advantage of its most important new features, including its new floating-point datatypes that allow for very high color precision. Not only that, but I searched around online, and I couldn’t find anything there, either. Microsoft’s DirectX 9 isn’t available yet, and all those fancy demos ATI showed off at the Radeon 9700 launch were apparently written in Direct3D. ATI has a set of OpenGL extensions in the works, but those aren’t ready for prime time yet.

I asked ATI about the possibility we’d see any driver features from them like Matrox provides with Parhelia. The Parhelia drivers will force games and the Windows desktop into a 10-bit-per-color mode. It’s a hack, but it works. Unfortunately, ATI doesn’t have any plans to provide such a thing, nor are they aware of any plans at Microsoft to make use of 10-bits-per-channel color modes on the Windows desktop anytime soon.

The Radeon 9700 Pro card

So we’ll have to review the Radeon 9700 for what it is, effectively, for today’s buyers: an especially nice DirectX 8-class graphics card with some intriguing future potential. ATI has made some compromises in the Radeon 9700’s design in order to gear it toward the DX9 future. For instance, the chip can lay down only one texture per pixel pipeline in a clock cycle. The Radeon 9700 has eight pixel pipelines, so it’s still very fast, but current cards like the GeForce4 Ti (four pipes with two texture units each) and Matrox’s Parhelia (four pipes with four texture units each) have comparable pixel-pushing power. The number of texture units per pipe may seem important now, but in the future, when pixel shader programs generate textures procedurally—just run a “wood” shader or a “marble” shader—traditional texture units will probably matter less.

Nevertheless, the Radeon 9700 should be the ultimate DX8 card in many ways. The extra precision present throughout the chip’s pixel pipeline should help image quality in a few places, and its memory bandwidth is second to none. The R9700 can run most current games fluidly at high resolutions with edge and texture antialiasing features cranked up.

Before we go on, I’m going to have to stop and admonish you to go read my article on next-gen graphics chips so you can see how innovative the Radeon 9700 really is. It’s a quick read, and it probably won’t make your head hurt too much. I covered a lot of ground in that article, and I won’t drag you back over the same territory here. Suffice to say that the Radeon 9700 should change the graphics landscape dramatically.

Also, you’ll want to go read this page to get an exposition of the Radeon 9700’s most important new features. This chip is loaded with all of the latest goodies, including AGP 8X, eight pixel pipes, four vertex shaders, a 256-bit crossbar memory interface, and killer occlusion detection. It also has all of the non-3D bits that one would expect on a modern graphics card integrated onto a single chip: dual RAMDACs for analog monitors, a TMDS transmitter for digital flat panels, and a TV decoder/encoder unit for video output and capture. We will discuss many of the card’s new features in more detail as we frame our test results below.

All the standard output ports, plus a DVI-to-VGA adapter is included

Of course, all these fancy features come at a price. This wonder of Moore’s Law weighs in at roughly 110 million transistors. Manufactured on a 0.15-micron fab process, the Radeon 9700 chip has a land mass roughly equal to that of France, provided France hasn’t surrendered any land lately. To give you some perspective, have a look at the picture below, which shows a Radeon 9700 chip next to an Athlon XP processor. The Athlon XP is made up of about 37 million transistors, and it’s manufactured on a 0.13-micron process.

The Athlon XP (Thoroughbred) is dwarfed by the R300

In order to provide this massive chip with the needed juice, ATI put an auxiliary power connector on the card. The AGP slot alone just isn’t up to snuff, so—a la 3dfx’s Voodoo 5—you’ll have to plug the card into your computer’s power supply unit. That’s really no big deal, and ATI provides a pass-through cable to make it as easy as possible.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test systems were configured like so:

Processor Intel Pentium 4 2.8GHz
Front-side bus 533MHz (133MHz quad-pumped)
Motherboard Asus P4T533C
Chipset Intel 850E
North bridge 82850E MCH
South bridge 82801BA ICH2
Chipset drivers Intel Application Accelerator 6.22
Memory size 512MB (4 RIMMs)
Memory type Samsung PC1066 Rambus DRAM
Sound Creative SoundBlaster Live!
Storage Maxtor DiamondMax Plus D740X 7200RPM ATA/100 hard drive
OS Microsoft Windows XP Professional
OS updates None

The test systems’ Windows desktops were set at 1024×768 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following graphics cards and driver revisions:

  • ATI Radeon 8500 128MB — 7.74
  • ATI Radeon 9700 Pro 128MB — 7.75 (first shipping drivers)
  • Matrox Parhelia 128MB —
  • PNY Verto GeForce4 Ti 4600 128MB — Detonator 40.41

We used the following versions of our test applications:

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

The Radeon 9700 Pro’s heatsink comes with a TIM pad

Fill rate
To show exactly where a new product sits among the current crop of 3D graphics cards, we like to pull out the ol’ chip table and compare the specs. As always, these numbers can lie. The key ones, like memory bandwidth and fill rate, are theoretical peaks, not real-world numbers. Real-world performance will vary depending on implementations. Still, the chips’ specs are instructive, so here’s our trusty table:

Core clock (MHz) Pixel pipelines Peak fill rate (Mpixels/s) Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s)
Radeon 8500 275 4 1100 2 2200 550 128 8.8
GeForce4 Ti 4600 300 4 1200 2 2400 650 128 10.4
Parhelia-512 220 4 880 4 3520 550 256 17.6
Radeon 9700 Pro 325 8 2600 1 2600 620 256 19.4

The Radeon 9700 Pro leads in nearly every category. It’s endowed with gobs of memory bandwidth and a blistering pixel fill rate that’s more than double that of the closest competition.

One of the most important items in the table above is texel fill rate, which describes a card’s ability to produce pixels with texture maps applied to them. In current games, texel fill rate is key to good performance at high resolutions. The 9700 Pro’s texel fill rate is good, but it’s not head and shoulders above the other cards. As I said above, ATI’s use of only one texture unit per pixel pipeline is a bit of a compromise. The 9700 chip can apply a single texture per clock cycle, while others can apply two or four textures per clock.

However, with eight pixel pipelines, the 9700 chip has an advantage. Matrox’s Parhelia, for instance, has the highest theoretical peak texel fill rate, with four texture units in each of its four pixel pipes. Parhelia isn’t able to use all four texture units per pipe in most current games and apps, because those apps don’t apply four textures per rendering pass. As a result, Parhelia performs much slower than its stats-sheet specs would suggest. The 9700, on the other hand, can almost always put all of its texture units to use at once, even if an app only applies one texture per pass.

Should the 9700 need to apply more than one texture per rendering pass, it can send pixels back through its pipelines up to 16 times before sending the results out to a frame buffer. This process will chew up clock cycles, but it’s not nearly as much trouble as writing the results to memory, reading them back in, and then applying another texture. (For the record, the Radeon 8500 can “loop back” pixels in this manner up to three times per rendering pass, for a total of six textures per pass. The GeForce4 Ti can do it twice, delivering four textures per pass.)

Of course, none of this pixel filling ability is worth much if the VPU—visual processing unit, don’t ask—can’t write those pixels out to memory as fast as it can process them. In order to make most effective use of its memory, the Radeon 9700 includes a sophisticated crossbar memory interface very similar to the one in NVIDIA’s GeForce4 Ti chips. However, at 256 bits, the 9700 has double the number of paths to memory and thus double the raw memory bandwidth. The 9700’s memory interface incorporates four memory controllers on the VPU side and four 64-bit channels into main memory. Between the memory channels and the array of controllers is a switched fabric. Any one of the memory controllers can talk to any one (or two or four) memory channels via the switched fabric. As you might imagine, this approach is much more efficient than simply transferring data sequentially in 256-bit chunks.

A block diagram of the Radeon 9700’s memory controller. Source: ATI.

Let’s see how all of this technology translates into performance. 3DMark’s synthetic fill rate test measures pixel and texel fill rates. This test doesn’t exploit all of the advanced bandwidth conservation and pixel loopback techniques we’ve discussed, but it should give us a good idea about a card’s basic pixel-pushing prowess. We’ll test performance in real games at very high resolutions a bit later.

The theory works out rather nicely in practice with the 9700. The chip is by far the fastest in terms of pixel (or single-textured) fill rate, and it delivers the highest texel fill rate in all but one display resolution, where the Parhelia grabs a small edge. Notice, also, how much closer the R9700 Pro comes to its theoretical fill rate than the Parhelia.

Occlusion detection
Real application performance is determined by more than just raw fill rate, however. The 9700 packs the third generation of ATI’s HyperZ suite of bandwidth conservation techniques. All of these techniques have to do with the Z buffer, which stores depth information about each pixel in a scene. The HyperZ bag of tricks includes lossless Z buffer compression, fast clearing of the Z buffer, and a pair of occlusion detection methods, dubbed Hierarchical Z and Early Z.

If you read the words “occlusion detection” just now and your eyes glazed over, don’t fret. Occlusion in 3D graphics is a simple thing: when one object is in front of another, the object in back is occluded from view. Occluded objects are a problem in graphics, because 3D chips tend to draw everything in a scene, only figuring out which pixel belongs in front of another during the rendering process. Drawing pixels the viewer will never see is often referred to as overdraw, and overdraw is wildly inefficient. If a chip can figure out which pixels are occluded and avoid drawing them, lots of effort—and memory bandwidth and pixel processing power—will be saved.

ATI’s Hierarchical Z logic examines blocks of pixels to determine whether they will be occluded when a final scene is rendered. If none of the pixels in a block will be visible, the block doesn’t get drawn. Hierarchical Z doesn’t catch everything, but it can cut down on unneeded pixel processing. Early Z is a new addition in this third-gen HyperZ suite, and it does occlusion detection on a per-pixel level. ATI claims Early Z “virtually eliminates” overdraw. This new, more rigorous approach makes particular sense on a DX9 card, because performing complex pixel shading ops on occluded pixels would be especially painful.

To examine the effectiveness of the 9700’s occlusion detection, we’ll use VillageMark, which benchmarks an extreme worst-case scenario for overdraw. For what it’s worth, the Radeon 8500 and GeForce4 Ti 4600 both have limited forms of occlusion detection, but Parhelia doesn’t have any.

The Radeon 9700 Pro shows its mettle here, clearly outrunning the other cards.

Incidentally, some folks have speculated that VillageMark effectively tests texel fill rates more than anything because of its robust use of multitexturing. However, our results show something different. The Parhelia, which has the highest texel fill rate and no provisions to reduce overdraw, is slowest of the group. The R9700 Pro, with only one texture unit per pixel pipe, is fastest by a fair margin.

Pixel shader performance
Pixel shader performance is closely related to fill rate, occlusion detection, and memory bandwidth, but it’s not the same thing. Reasonably complex pixel shader programs can make the VPU the primary limiting factor in performance, and a good pixel shader implementation can be several times faster at a given task than a competing one.

The Radeon 9700’s pixel shaders, which meet the requirements for DirectX 9’s version 2.0 pixel shaders, are much more capable than the DX8-class pixel shaders in all of the competing cards we’re testing. The 9700’s pixel shaders can execute 64 color operations and 32 texture ops in a single rendering pass, which is four to eight times what the DX8-class cards can achieve. Also, the 9700 has 96 bits of precision in its pixel shaders, while older chips’ pixel shaders have no more than 48 bits of precision. More importantly for performance in current apps, the Radeon 9700 has eight pixel shaders, while the rest of the pack has only four.

This advantage in pixel shading capacity results in markedly better performance on synthetic tests, such as 3DMark’s DirectX 8.x pixel shader benchmarks.

The Radeon 9700 Pro nearly doubles the scores of the other cards, which is about what we’d expect in this case.

We’ll also use NVIDIA’s ChameleonMark to measure pixel shader performance. Although we’ve included Radeon 8500 scores, please note that ChameleonMark doesn’t produce the proper output on this card in the Glass and Shiny tests because the 8500 doesn’t support the cubemap function the program uses (ATI has its own implementation that works fine, however). The Radeon 9700 and Parhelia both run these tests flawlessly.

ATI’s new card utterly dominates NVIDIA’s pixel shader benchmark, especially in the tests where the cubemap function is employed. Still, true DirectX 9 applications with longer, more complex shader programs created in high-level shading languages should take even better advantage of the Radeon 9700’s pixel shading power. The R9700 Pro’s true increase in computational capacity isn’t fully reflected in these tests, but they do give us a taste of things to came.

Polygon throughput and vertex shader performance

A block diagram
of a Radeon 9700
vertex shader unit.
Source: ATI.

The Radeon 9700 has four DX9-class vertex shader units running at the chip’s 325MHz clock rate. That’s twice the number of DX8 shader units in the GF4 Ti and Radeon 8500 chips, although vertex shader implementations vary in their performance. Matrox’s Parhelia has four DX9-class vertex shaders running at the chip’s 220MHz clock speed. DX9’s vertex shader 2.0 spec incorporates support for flow control (branching, loops, and subroutines) in vertex programs, but no current apps can use this capability. Each of the Radeon 9700 chip’s four vertex shader pipelines has vector and scalar processing units, so they can process the two types of operations in parallel.

We’ll test vertex shader performance first with Matrox’s SharkMark. SharkMark was written by Matrox to show off the power of Parhelia’s quad vertex shader units, so it does a nice job taking advantage of the Radeon 9700’s four vertex shaders, as well.

In SharkMark, the Radeon 9700 delivers exactly twice the vertex processing ability of a GeForce4 Ti 4600, and it outruns the Parhelia, too. 3DMark2001 also includes a vertex shader test; let’s see how the 9700 performs there.

Again, the 9700 shows roughly twice the vertex processing power of competing chips. The Parhelia drops back into the back in 3DMark for some reason, possibly because 3DMark’s vertex shader programs aren’t as complex as SharkMark’s—at least, to my eye they certainly aren’t.

Next, we’ll look at traditional DirectX 7/OpenGL style transform and lighting. I believe all of these cards implement T&L engines as vertex shader programs rather than using dedicated circuitry for T&L. Traditional transform and lighting performance is still the key to good performance in most current apps.

Here again, the 9700 leads the pack, although not by quite as much. Legacy T&L performance will matter less and less over time.

AGP write performance
For the sake of completeness, I’ll include another round of tests of AGP texture download performance. What we’re talking about here is the ability to move rendered images from a graphics card’s local memory over the AGP bus into main memory. Games don’t generally have a need to transfer data to main memory, but applications like video processing tools and high-quality rendering programs do. Please see my article on this subject if you want to know more.

Once again, none of these cards move data back over the AGP bus at anything close to an acceptable rate for real-time graphics applications. This problem can probably be fixed in software. In fact, these especially low transfer rates aren’t as much of a problem in Windows 98 or in OpenGL. But in Win2K/XP with Direct3D, AGP texture download rates are slow as molasses.

Our synthetic benches have shown that the Radeon 9700 Pro is faster than previous-generation cards in nearly every key category: fill rate, occlusion detection, pixel shading, vertex processing, and traditional T&L. Now we’ll test some current games and game engines to see how these capabilities affect everyday use.

Quake III Arena
We tested Q3A using the Q3Bench utility’s “Max” quality settings. Sound was enabled with 22KHz mixing.

The R9700 Pro just outclasses the other cards. The higher the resolution, the bigger the gap between the 9700 and everything else.

Jedi Knight 2

Jedi Knight II tells an interesting story: the Radeon 9700 Pro simply isn’t fill-rate limited in this game. Raising the display resolution has almost no impact on performance. In fact, all of the cards seem limited to about 155 FPS in JK2, probably by the game engine itself.

Comanche 4

Comanche 4 is a true DirectX 8-class game with vertex and pixel shader support. The Radeon 9700 outruns the GF4 Ti 4600 here, but not by much. Again, at lower resolutions, performance is not limited by fill rate, with the possible exception of the preternaturally pokey Parhelia.

Unreal Tournament 2003
The UT 2003 demo is a late addition to this review. Epic released it right as we were finishing up, and we decided to include test results, because UT 2003 uses more advanced DX8 graphics features, more polygon detail, and larger textures than most current games. Also, we used just-released, brand-spanking-new video drivers for the UT tests—version 7.76 from ATI and version 1.01 from Matrox.

The UT demo’s default benchmarking function tests two things. The first test is a pair a fly-throughs of two of the game’s levels, and the second test is a pair of deathmatches between bots.

Things really slow down from the flyby to the botmatch, but in either case, the 9700 is generally fastest. The GF4 Ti 4600 is still very competitive, though.

Codecreatures Benchmark Pro

Here’s another DX8 application. The Radeon 9700 Pro separates itself from the pack here, no doubt thanks to its superior vertex and pixel shading power. Oddly enough, the Radeon 8500 and Parhelia are right on top of each other in all three resolutions.

Serious Sam SE
Serious Sam SE allows us a number of benchmarks options, including the ability to switch between Direct3D and OpenGL, and the ability to plot second-by-second frame rate numbers.

The trouble is, the game engine auto-tunes its graphics settings to run optimally on various graphics cards, which makes apple-to-apples comparisons difficult. In the past, we’ve used Serious Sam SE as an application benchmark, allowing the game to tune itself, but we’d like to limit that effect this time around. Some reviewers have used the game’s “Max quality” add on to defeat the engine’s auto-tuning feature, but that’s not a perfect solution. For instance, the game will use the strongest form of anisotropic filtering available to it, which varies from 2X to 16X, depending on the graphics card.

To keep things even, we’ve elected to use the game’s “Default quality” addon. We won’t be using every possible feature and form of texture filtering, but we should have a solid basis for comparison.

The story is much the same in D3D and OpenGL. The GeForce4 is fastest, by just a little bit, in lower resolutions, probably thanks to NVIDIA’s highly optimized graphics drivers. At higher resolutions, the 9700 pulls ahead.

Next, we’ll plot frame rates over time to see something a little more revealing than a simple frame-rate average. These tests are conducted using OpenGL.

All of the cards demonstrate similar performance profiles; they all peak and bottom out at about the same places, and none of them shows any really inordinate extremes in either direction.

3DMark2001 SE

The R9700 Pro rules the roost in 3DMark. It’s nearly as fast at 1600×1200 as the Parhelia is at 640×480.

I’m not sure why the Ti 4600 is slower at 1280×1024 than at 1600×1200, but I ran this test a number of times to make sure the results were for real. This anomaly only showed up in this test, but it was consistent.

3DMark’s game tests confirm what its synthetic tests showed us: the Radeon 9700 is faster in nearly every way than the competition.

Workstation-class applications

The Radeon 9700 Pro is fastest in four out of six tests, but it’s not exceptionally faster than previous-gen cards. We’ll have to test out the FireGL X1, ATI’s workstation-class card based on the 9700 chip, to see how it performs with drivers properly optimized for workstation apps.

Speaking of drivers, when we initially tested the Parhelia in viewperf, it was extraordinarily slow; it barely scored above zero on the Unigraphics test. Matrox has just released new drivers that are certified for some of the applications in the viewperf suite, so we retested with the new 1.01 drivers and included those results above. The improvements in performance were notable. Unigraphics is still slow, but it’s better than before.


Antialiased text

Edge antialiasing
We’ll cover antialiasing in its two most important forms, edge AA and texture AA. Most newer graphics chips handle these two things separately. I happen to think texture AA has the bigger impact on overall visual quality, but edge AA gets most of the attention, so we’ll start with it. I’m going to break it down on a card-by-card basis, because each card’s edge antialiasing implementation is unique.

Fundamentally, though, all of these chips’ edge AA methods are similar. They all seek to smooth out rough edges by blending colors. To do so, the chips take multiple samples of the area covered by a pixel on the screen. These samples are offset slightly at a sub-pixel level according to some kind of pattern. The sampled colors are then blended. If the sub-pixel-level samples happen to straddle an edge, the resulting pixel output color is somewhere between the colors on either side of the edge.

Presto, jaggies melt away.

At least, that’s the theory. Most of us use edge antialiasing for text in our everyday computer use. Doing AA in 3D graphics is similar, but it does get a little tricky.

  • GeForce4 — The GeForce4 Ti chips use an antialiasing algorithm known as multisampling. This algorithm singles out pixels on polygon edges by checking the Z-buffer for each pixel. If a pixel is covered by more than one polygon, the chip will do a texture read for each of the samples it’s grabbing. If not, the chip will avoid the texture read, conserving effort and memory bandwidth. The multiple samples are then blended. In this way, multisampling algorithms effectively do edge-only antialiasing. The GeForce4 Ti’s multisampled AA is so efficient, there’s barely any performance penalty for turning on 2-sample (or 2X) antialiasing.

    The GF4 Ti collects its samples according to one of two patterns. In 4X mode, the chip uses a traditional ordered grid sampling pattern (the leftmost of the two patterns at right)—like graph paper. In 2X mode, the chip samples according to a rotated grid pattern (turn that graph paper 35 degrees or so, like the rightmost pattern on the right). The human eye can lock on to patterns easily, and an ordered grid sample is an easy one to spot. Also, on-screen pixels are themselves organized in an ordered grid. The point of using an offset or tilted grid pattern is to disrupt the correspondence between the display’s grid of pixels and the underlying AA sampling. Doing so produces more effective antialiasing, especially on near-vertical and near-horizontal edges, where ordered-grid sampling patterns sometimes aren’t very effective. The GF4 Ti also has a mysterious mode dubbed “4XS” that is apparently a combination of 2X multisampling and 4X supersampling (we’ll get to supersampling in a second).

    GeForce4 Ti 4600 — No AA, 2X AA, 4X AA, “4xs” mode, diff between No AA and 4X AA Uncompressed versions of our edge AA sample images are available here.

    The results speak for themselves. Even 2X antialiasing produces much cleaner edges than a stock image. Look at the rightmost tile of the image above to see something a little bit unique. That’s the result of a mathematical “diff” operation between the non-AA image and the 4X AA image. You can see that the only pixels really modified by the GeForce4 Ti’s multisampling AA routine are edge pixels. Multisampling is efficient, but effective.

    Call the GeForce4’s two AA techniques ordered grid multisampling and rotated grid multisampling.

  • Radeon 8500 — The Radeon 8500 is a different case. It uses more sophisticated sample placement, but it doesn’t use multisampling. Instead, the 8500 uses supersampling, which is simply rendering a scene at two or four times its original size (or more) and then scaling it back down, blending two or four pixels into one. Supersampling provides full-scene AA, not just edge AA, but it’s much less efficient than multisampling—the algorithm has to do a texture read for each sample it processes.

    The Radeon 8500’s SMOOTHVISION AA uses a nifty trick for sample placement. Instead of sampling according to grid pattern, the 8500 uses a programmable template to determine sample placement. Have a look at the two squares at the right for a visual example. The leftmost image is a sample template with six possible sample positions, and the rightmost one shows resulting sample patterns for four adjacent pixels in 4X AA mode. The 8500 uses the template to determine sample positions in a quasi-random fashion, which disrupts the eye’s ability to detect patterns. This technique isn’t completely random, but it’s much closer to true stochastic sampling than other cards’ methods.

    Radeon 8500 — No AA, 2X AA, 4X AA, diff between No AA and 4X AA In practice, the Radeon 8500’s 2X AA is surprisingly ineffective. The 4X mode, however, is quite good. In motion, the 8500’s 4X SMOOTHVISION AA is visibly more effective than the GeForce4’s ordered grid 4X mode. Have a look at the “diff” image on the right to see how pixels in the scene are affected by the 8500’s supersampled AA. The 8500 is antialiasing edges as well as textures. Unfortunately, it’s neither as effective or efficient at texture AA as texture-specific routines like anisotropic filtering.

    The 8500’s also has a “performance” mode that samples on an ordered grid, so the Radeon 8500 does both ordered grid supersampling and programmable jittered supersampling.

  • Parhelia — The Parhelia’s unique Fragment AA method does edge-only antialiasing with lots of samples—sixteen, to be exact. Fragment AA does a Z-buffer check to determine which pixels are on polygon edges, then segregates edge pixels from the rest of the scene. Non-edge pixels are rendered normally, while edge pixels are sent to a fragment buffer and rendered using 16X ordered grid supersampling (see the pattern image at right). Although Fragment AA doesn’t use a rotated or quasi-random sampling pattern, it’s very effective, because sample size is more important than sample pattern. 16X AA is hard to beat. This method is theoretically more efficient than multisampling. Multisampling avoids multiple texture reads for non-edge pixels, but those pixels are still processed by the algorithm. Fragment AA operates only on edge pixels, so Parhelia can sample more often for those pixels with little performance penalty. However, Fragment AA sometimes causes visual artifacts and other compatibility problems, so Matrox included a 4X ordered-grid supersampling fallback mode. Because supersampling is less efficient, the 4X mode is slower than 16X FAA.

    Oddly, the Parhelia drivers won’t respond to Direct3D API calls for 2X or 4X AA, so users have to manually choose 16X FAA or 4X ordered-grid AA.

    Parhelia — No AA, 4X AA, 16X AA, diff between No AA and 4X AA, diff between No AA and 16X AA FAA at 16X does a very nice job cleaning up the near-vertical lines in our sample image. You can see from the “diff” images how the 4X supersampled mode modifies quite a few pixels, while 16X Fragment AA touches only edge pixels. Although this method is distinct from multisampling, its results are generally similar.

  • Radeon 9700 Pro — The Radeon 9700 Pro improves on the 8500 by shedding supersampling for multisampling. This change should help performance immensely. The 9700 supports 2X, 4X, and 6X modes. Beyond that, ATI has included a number of innovations in the 9700’s antialiasing hardware; most should help image quality and performance.

    Notably, the Radeon 9700 retains the programmability of the 8500 in terms of sampling patterns, but as I understand it (at least with current drivers, and probably over the long term) the 9700 doesn’t vary sample points from pixel to pixel in a quasi-random fashion like the 8500 does. Still, the chip’s programmed sampling pattern is a little bit wilder than a simple rotated grid. The example at right is based on a sampling pattern shown in an ATI whitepaper, but I’m not sure of the 9700’s exact pattern in current drivers. Obviously, the pattern shown is for 6X AA modes. Also, unlike the Radeon 8500, the 9700 can make use of Z-buffer compression in conjunction with antialiasing, which should help performance.

    On the image quality front, the Radeon 9700 employs a simple but effective new twist: gamma correct blending. When the chip blends sampled color values together, it applies a gamma curve. Gamma correction at various points in the pixel pipeline is encouraged in the DirectX 9 spec. Because graphics chips must operate in a gamma colorspace, this technique only makes sense.

    The pictures below illustrate the impact of gamma correction. The first, from an ATI whitepaper, shows how applying a gamma curve ensures smooth gradients on antialiased edges. The second, generated in a paint program using a screenshot from Serious Sam SE, shows how incrementing color values with a simple brightness filter washes out colors, while a gamma-correct filter preserves contrast. This second image isn’t strictly about antialiasing, but it demonstrates clearly the impact of accounting for gamma.

    Gamma-correct blending maintains smooth gradients. Source: ATI.

    Brightness washes out colors, while gamma preserves contrast.

    The Radeon 9700 includes provisions to address one of the weaknesses of multisampling: textures with transparency. Multisampling handles antialiasing at the edges of polygons well, but it doesn’t address “edges” inside of textures with transparency. Unfortunately, these provisions appear to require application-level support, and current apps don’t provide it.

    Radeon 9700 — No AA, 2X AA, 4X AA, 6X AA, diff between No AA and 6X AA Nevertheless, the Radeon 9700’s antialiasing is excellent. Thanks to its irregular sampling pattern and gamma correction, the Radeon 9700’s 6X AA mode looks at least as good as Matrox’s 16X FAA to my eye—and much better than a GeForce4 Ti. Of course, that perception may be colored by the fact that the Radeon 9700’s frame rates in 6X AA are much higher than the Parhelia’s in 16X FAA, and frame rate is an important component of effective antialiasing. Without fluid motion, the AA effect doesn’t work as well.

Now that we’ve covered the theory, we’ll look at how the various cards perform in their various antialiasing modes. For the sake of simplicity, we’ve omitted a few less useful AA modes, like the 3X, 5X, and 6X modes on the Radeon 8500, which are simply too slow and cumbersome to matter, and the 4xS and Quincunx modes on the GeForce4 Ti. 4xS mode is only available in Direct3D, and Quincunx is simply 2X AA plus a blurring filter; it sacrifices too much texture quality and overall sharpness to be worthwhile.

The Radeon 9700 Pro’s antialiasing performance is very impressive. The 9700 has the best combination of edge antialiasing quality, compatibility, and performance of any consumer graphics chip.

Texture antialiasing
ATI has cleaned up some of the texture filtering quirks from the Radeon 8500, as well. The 9700 can do anistropic filtering up to 16X, and the 9700’s anisotropic filtering algorithm is improved.

The best way to show you the effect of anisotropic filtering is with visuals. (Plus, I spent way too much time on edge AA, so this is what you get.) The images below show how higher degrees of anisotropy provide increased texture clarity; anisotropic filtering provides this clarity without aliasing while the image is in motion. The key to anisotropic filtering is taking into account the slope of the surface from the perspective of the viewer. Conventional isotropic mip map filters don’t do this, so they have to cut out texture detail in order to reduce aliasing.

GeForce4 Ti 4600 — Isotropic filtering, 2X aniso, 4X, 8X, diff between isotropic and 8X aniso

Radeon 9700 — Isotropic filtering, 2X aniso, 4X, 8X, 16X aniso, diff between isotropic and 16X aniso If you’d like to see uncompressed versions of these images, go right here. Be warned, however, that the files are rather large.

The Radeon 8500’s anisotropic filtering method has a couple of annoying quirks. Its adaptive filtering algorithm acts selectively on certain surfaces, and sometimes, textures aren’t filtered as they should be. ATI says the Radeon 9700’s anisotropic filtering algorithm better accounts for these polygons, so filtering is applied more thoroughly.

Also, the 8500 couldn’t apply anisotropic filtering and trilinear filtering simultaneously. Users were forced to choose between blurry textures with smooth mip map transitions (bilinear + trilinear filtering) or sharp textures with stark mip map transitions (anisotropic). Personally, I couldn’t stand playing Serious Sam with anisotropic filtering on a Radeon 8500 for this very reason. That “line” out on the ground in front of me everywhere I went was just too much to take. Ugh.

Colorized mip maps show anisotropic + bilinear filtering

Colorized mip maps show anisotropic + trilinear filtering Happily, the 9700 addresses this quirk, although ATI requires users to check a “quality” radio button in its drivers in order for aniso and trilinear to work together. I’d prefer they do it like everyone else and allow the application settings for trilinear filtering to determine what happens. Nevertheless, the 9700’s image output is positively gorgeous. Our screenshots with colored mipmaps show long, smooth trilinear gradients with anisotropic filtering active.

Mip map transitions are handled beautifully As you might expect, the Radeon 9700’s texture filtering performance is excellent, as well.

All told, the Radeon 9700 Pro provides more and better texture antialiasing, at higher frame rates, than anything else on the market.

I believe I’ve said enough about the Radeon 9700 Pro by now. You have seen the results and screenshots for yourself, and you know that it’s got more pixel-pushing power, faster pixel shaders, better vertex shaders, more poly throughput, and better real-world performance than any other graphics card you can buy. The image quality is second to none, especially with antialiasing enabled. I’ve racked my brains and raked this thing over the coals trying to find a significant weakness, and I’ll tell you what: I haven’t found one.

Oh, sure, there are always odd driver bugs, incompatiblities, and other teething problems, and I won’t pretend to be able to keep tabs on all of those things. (The card was problem-free in my testing.) If you’re thinking about buying a Radeon 9700 Pro card, you’ll definitely need to factor those considerations into your decision.

But as a 3D graphics chip, the Radeon 9700 is darn near perfect. ATI has taken the time with this chip to increase precision and expand registers and tweak functional units to the point where everything works as advertised. The Radeon 8500, good as it was, was nowhere near this good.

We may run into some problems or limitations when DirectX 9 arrives in force. I’m still a little wary about how this first generation of chips with support for 64- and 128-bit color depths will perform at those depths. The data throughput and computational requirements for 128-bit pixels are enormous, and that presents every opportunity to fudge by splitting 128-bit chunks into two and the like. Tricks like that could sap performance and cause problems with next-gen games. Of course, those games probably won’t be arriving for months and months.

110 million transistors of joy

At the end of the day, the Radeon 9700 Pro is the culmination of an era and the beginning of something new. The 9700 is the last in a line, the ultimate graphics chip as a collection of fixed functional units that accelerate specific parts of the rendering process in hardware. But it also ushers in the era of generally programmable graphics chips, VPUs as general SIMD machines. As programmability and high-level shading languages gain a foothold, graphics chips will probably become both simpler and more powerful—general SIMD processors with less custom circuitry for accelerating specific effects.

The 9700 bridges these two eras, running today’s games better than anything else, and providing the potential for artists to create graphics unlike anything you have ever seen done in real time.

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!