Radeon 8500 vs. GeForce3 Ti 500

This article is late because we’ve been occupied by trying to pin down ATI on Radeon 8500 clock speeds (which we eventually did) and delving into ATI’s apparent use of cheats on Quake III Arena benchmarks. Then we visited with ATI (and with NVIDIA) at Comdex. While we were there, ATI released a new driver for the Radeon 8500, so we had to go back to the drawing board with our testing.

That’s all probably just as well, however, because the new drivers make this comparison much more interesting. Before ATI released the latest drivers for the Radeon 8500, this GeForce3 Ti 500-versus-Radeon 8500 comparison would have read like this:

Don’t buy a Radeon 8500. Buy a GeForce3.

End of story (only with more graphs). However, ATI’s latest drivers take care of a great many problems—the Quake III “optimizations,” Athlon XP incompatibilities, surprisingly low performance—that the Radeon 8500 brought with it when it first arrived on retail shelves. And once you get under that crusty old ATI veneer of lousy drivers and purposely vague public statements, the Radeon 8500 looks like a darned good graphics processor.

Good enough to take on NVIDIA’s vaunted GeForce3 Titanium series? Just maybe. Keep reading to find out.

GeForce goes Titanium
The GeForce3 Titanium series cards are new, but not really novel. They’re simply GeForce3 chips set to run at different core and memory clock speeds. Rather than use its traditional “Ultra” and “Pro” tags, NVIDIA chose to use the “Titanium” name for its fall product line this year. The new GeForce3 Ti 500 runs at a 240MHz core clock speed with 500MHz (DDR) memory—just a bit faster than the 200/460MHz of the original GeForce3. The Ti 200, meanwhile, runs at a 175MHz core speed with 400MHz memory—slower than the GeForce3, but then it’s priced much lower, too. Beyond the clock speed changes, the Ti series of chips is essentially identical to the original GeForce3.

Not that there’s anything wrong with that. In fact, the GeForce3 is still one of the most amazing graphics chips ever. If it weren’t for the Radeon 8500, the GeForce3 would have no real rivals. If you aren’t familiar with what makes the GF3 so special, go check out my review of the GeForce3 right now. It will bring you up to speed on the new approach to real-time graphics that NVIDIA pioneered with the GeForce3, including fancy-pants things like vertex shaders and pixel shaders. When you’re done reading that article, you’ll be much better equipped to follow this one.

The Ti cards hit the scene at the same time NVIDIA released its Detonator XP video drivers. These new drivers brought with them substantial performance gains, and they “turned on” a few features already present in the GeForce3 hardware but not yet implemented in driver software. Among them:


    A 3D textured object with a
    scoop taken out of the corner
  • 3D textures — Also known as volumetric texturing. 3D textures are just what they sound like, but the concept may be a little bit difficult. Think of it this way: traditional 2D textures cover the surface of an object, while 3D textures permeate the entire space an object occupies. Chop off a corner of the object, and what’s inside is textured, as well. The picture at the right will help illustrate the concept.

    Once you’ve grasped the basic idea, the mind-blowing stuff comes along. NVIDIA’s implementation of 3D textures includes quad-linear filtering and up to 8:1 compression of 3D textures.

    Poof! Mind blown.

    NVIDIA has licensed its 3D texture compression scheme to Microsoft for use in DirectX, but this scheme remains unique to NVIDIA products in OpenGL.

  • Shadow buffers — Shadow buffers are one of the better ways to produce shadows in real-time 3D rendering. They work well with complex scenes, allow for shadows with soft edges, and even allow objects to be self-shadowing. Shadow buffers and shadow maps may not be the end-all, be-all for 3D shadowing; we’ll have to wait and see how often and how thoroughly developers choose to implement them. But they’re much better than nothing.
  • Improved occlusion detection — This is where the big performance gain comes along. Occlusion detection helps the graphics chip to avoid one of the biggest bandwidth hogs in graphics: overdraw. Overdraw happens whenever a graphics chip renders a pixel that will end up behind another pixel (and thus not visible) once an entire scene is rendered. A conventional graphics chip renders hundred of pixels per frame that won’t be visible. That’s a huge waste of a graphics card’s most precious resource: memory bandwidth.

    Although it’s not as radical an approach as a Kyro II, the GeForce3 has the ability to determine, at least some of the time, when a pixel will be occluded, so the chip can avoid drawing unneeded pixels. The Detonator XP drivers improve the GeForce3’s occlusion detection, boosting the chip’s effective pixel-pushing power—or fill rate—substantially.

The combination of the Ti 500’s higher clock speeds and Detonator XP’s new features makes the GeForce3 a more formidable competitor than ever. The gap between previous-generation chips like the GeForce2 and the original Radeon keeps growing, and ATI looks like the only graphics company with a prayer of keeping up.

 

ATI answers
The Radeon 8500 chip is ATI’s answer to the GeForce3. Like the original Radeon, it comes to market months behind the competing NVIDIA product. Also like original Radeon, the 8500 implements nearly everything the competing NVIDIA product can do, plus some additional functionality. This strategy of coming behind NVIDIA and trying to leapfrog them is tricky, but it almost seems to be working for ATI.

The Radeon 8500 is a case in point. The Radeon 8500 and GeForce3 chips are very similar to one another in many ways. They share the same basic configuration: four pixel pipelines with two texture units per pipeline. In its retail boxed form, the Radeon 8500 runs at 275MHz, with a 550MHz (DDR) memory clock, which is just slightly faster than the GeForce3 Ti 500 at 240/500MHz. Both chips implement occlusion detection and other memory bandwidth-saving techniques. Also, the Radeon 8500 GPU implements vertex and pixel shaders much like the GeForce3.

However, ATI claims that the Radeon 8500 implementations of vertex and pixel shaders both include improvements over the “DirectX 8.0” (read: GeForce3) implementations. To give you a sense of where ATI is coming from, let me give you an overview of the Radeon 8500.

The Radeon 8500
By my count, ATI has coined eleven marketing buzzwords in order to describe the Radeon 8500. Like the name RADEON, these terms are supposed to be written in all caps. I DON’T KNOW WHY. Just for the record, here are the terms:

TRUFORM, SMARTSHADER, SMOOTHVISION, CHARISMA ENGINE II, PIXEL TAPESTRY II, VIDEO IMMERSION II, HYPER Z II, HYDRAVISION.

Those are some fine marketing terms, and they really jump out in all caps, don’t they? I will try to cover all the concepts wrapped up in these marketing terms, but it’s probably best to label the Radeon 8500’s features for what they are. Here are some of the highlights:

  • Vertex shader — As in the GeForce3, the vertex shader replaces the old fixed-function transform and lighting (T&L) unit of the GeForce2/Radeon with a programmable unit capable of bending and flexing entire meshes of polygons as organic units.

    The Radeon 8500 also includes an entire fixed-function T&L unit (PIXEL TAPESTRY II), which can operate in parallel with the 8500’s vertex shader. The GeForce3, by contrast, implements its backward-compatible fixed-function T&L capability as a vertex shader program.

  • Pixel shader — ATI’s pixel shaders also essentially duplicate the GeForce3’s capabilities, but in this case, ATI’s improvements over the GeForce3 pixel shaders are really notable. ATI’s implementation of pixel shaders is markedly different from NVIDIA’s, and in some ways, it offers more flexibility.

    For instance, the Radeon 8500 can apply six textures to a pixel per rendering pass. That statement may seem a bit perplexing, because the Radeon 8500 has only two texture units per rendering pipeline. Here’s the distinction: the chip can apply two textures per clock cycle, but it can “loop back” and apply two more textures in the next cycle—and two more in the next—all in the same rendering pass. The GeForce3 uses a similar “loopback” method, but it can only apply four textures per pass. In order for the GeForce3 to render shader effects that require six texture operations per pixel, the chip has to do part of the work, write an image out to the frame buffer, and complete its work in a second rendering pass. Multipass rendering saps performance and potentially degrades image quality.

    The Radeon 8500’s pixel shaders use a different instruction set than the GeForce3’s; ATI claims its instruction set is simpler yet more powerful than NVIDIA’s. Also, ATI’s pixel shader programs can be as long as 22 instructions, while the GeForce3 is limited to 12. Finally, the 8500’s pixel shaders can perform operations on texture addresses as well as on color values, potentially allowing much more advanced shader techniques than the GeForce3.


    Left: The original models
    Right: TRUFORM-enhanced models
  • Higher order surfaces — ATI and NVIDIA both support higher-order surfaces, which is a means of describing a complex (usually curved) 3D surface with mathematical formulas instead of as polygons. Using higher-order surfaces saves precious AGP bus and memory bandwidth, and allows for fast chips to supply much more detail than a simple polygonal model. They are, in short, a good idea.

    Trouble is, NVIDIA and ATI are backing competing standards that are incompatible with one another. NVIDIA supports polynomial surfaces, and ATI employs N-patches, a.k.a TRUFORM. TRUFORM has the advantage of working reasonably well in current games, provided the developers are willing to enable support for it. No additional information is required, because TRUFORM smoothes out existing 3D models by adding polygon detail.

    However, TRUFORM adds detail to 3D models by—kind of—guessing. It looks at existing, low-detail 3D models, generates a higher-order surface, and then rebuilds the models with more polygons. It can make a rounded surface look much smoother, but it can make an object built from flat surfaces look, well, puffy.

    Still, it’s a neat trick, and I’ll betcha it works well most of the time.

  • Memory bandwidth conservation — The Radeon 8500 gets ATI’s second-generation HYPER Z II, a trio of techniques used to make the most of available memory bandwidth. All the techniques—fast Z clear, Z compression, and hierarchical Z—center around the depth buffer, or Z buffer (the GeForce3 includes similar optimizations). The most notable of these technologies is hierarchical Z, which is ATI’s way of doing occlusion detection.
  • Advanced anti-aliasing — We’ll talk more about ATI’s new SMOOTHVISION anti-aliasing later in this article. Keep reading…
  • Dual monitor support — The Radeon 8500 is the first and only high-end graphics card to incorporate support for dual monitors. ATI’s version of dual monitor support, HYDRAVISION, doesn’t allow for independent resolutions or bit depths on the two displays in Windows 2000, but otherwise, it’s quite good.

That’s the Radeon 8500 in a (gigantic, bulging, distended) nutshell. If that’s not enough punishment for you, stare at this block diagram for twenty minutes:

 

Radeon 8500 in action
With that out of the way, I’d like to give you a look at what the Radeon 8500, with its fancy pixel shaders, can do. The following screenshots are from an ATI demo. They were all generated in real time, and the demo runs at nice, smooth frame rates—even at 1280×1024 resolution.


The bubble models deform and flex courtesy of the vertex shader, while
pixel shader effects make the bubbles reflect and refract light


This character casts a realistic shadow that falls properly over an uneven surface


Bump mapping lends realism to this scene

All in all, as you can probably tell by looking, the Radeon 8500 is a worthy competitor to the GeForce3.

 

The chip specs
Now that we’ve established the identities of our two main contenders, we’re about ready to get down to the performance tests. Before we do that, however, let’s take a look at the basic specs of each chip and see how they stack up against some of the other graphics solutions floating around out there. The following specs are handy for comparing graphics chips, but as these chips become more complex, these sorts of numbers tend to matter less and less. The memory bandwidth and pixel throughput numbers, in particular, are just theoretical peaks.

 

Core clock (MHz)

Pixel pipelines

Fill rate (Mpixels/s)

Texture units per pixel pipeline

Fill rate (Mtexels/s)

Memory clock (MHz)

Memory bus width (bits)

Memory bandwidth
(GB/s)

Kyro II 175 2 350 1 350 175 128 2.8

GeForce2 GTS

200

4

800

2

1600

333

128

5.3

GeForce2 Ultra

250

4

1000

2

2000

460

128

7.4

GeForce3 Ti 200 175 4 700 2 1400 400 128 6.4
GeForce3 200 4 800 2 1600 460 128 7.4
GeForce3 Ti 500 240 4 960 2 1920 500 128 8.0
Radeon 64MB DDR 183 2 366 3 1100 366 128 5.9

Radeon 8500

275

4

1100

2

2200

550

128

8.8

It’s close, but the Radeon 8500 has an edge over the GeForce3 Ti 500 in both peak theoretical fill rate and memory bandwidth. Let’s see how that plays out in the real world…

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. All tests were run at least twice, and the results were averaged.

The test systems were built using:

Processor AMD Duron 1GHz
AMD Athlon 1.2GHz
AMD Athlon 1.4GHz
AMD Athlon XP 1800+ 1.53GHz
Intel Pentium 4 1.6GHz
Intel Pentium 4 1.8GHz
Intel Pentium 4 2GHz
Front-side bus 133MHz (266MHz DDR) 100MHz (400MHz quad-pumped)
Motherboard Gigabyte GA-7DX rev. 4.0 Intel D850MD
Chipset AMD 760/VIA hybrid Intel 850
North bridge AMD 761 82850 MCH
South bridge VIA VT82C686B 82801BA ICH2
Memory size 256MB (1 DIMM) 256MB (2 RIMMs)
Memory type Micron PC2100 DDR SDRAM CAS 2 Samsung PC800 Rambus DRAM
Sound Creative SoundBlaster Live!
Storage IBM 75GXP 30.5GB 7200RPM ATA/100 hard drive
OS Microsoft Windows XP Professional

We ran the bulk of the tests on our Socket A test platform with an Athlon XP 1800+ processor.

For comparative purposes, we used the following video cards and drivers:

  • ATI Radeon 64MB DDR with 6.13.3276 drivers
  • ATI Radeon 8500 with 6.13.3286 drivers
  • NVIDIA GeForce2 Ultra 64MB (NVIDIA reference card) with Detonator XP 21.83 drivers
  • NVIDIA GeForce3 64MB (NVIDIA reference card) with Detonator XP 21.83 drivers
  • Hercules 3D Prophet 4500 64MB with 9.031 drivers
  • VisionTek Xtasy 6964 (NVIDIA GeForce3 Ti 500) with Detonator XP 21.83 drivers

We also included a “simulated” GeForce3 Ti 200, because we could. We used PowerStrip to underclock our GeForce3 card to Ti 200 speeds and ran the tests. The performance of the GeForce3 at this speed should be identical to a “real” GeForce3 Ti 200. If you can’t handle the concept of a simulated GeForce3 Ti 200 card, pretend those results aren’t included.

We used the following versions of our test applications:

A word about our benchmark selection. I wanted to test with a number of newer games, like Aquanox, Dronez, Max Payne, and the Wolfenstein Multiplayer Demo. For one reason or another, I chose not to test with these games. Dronez and Aquamark are still works in progress with some compatibility problems. In the case of Max Payne, I was unsure whether benchmarking the game’s cut scenes really worked properly. On these very fast video cards, the game seemed to hit an internal limit at about 75 frames per second, even with vsync disabled. The Wolfenstein MP demo was more troublesome, because I couldn’t get it to produce consistent results.

I’m aware that other sites—and heck, even other reviewers here at TR—have used Wolf MP and Max Payne with apparent success, and I’m not saying those results aren’t valid. Still, for the sake of this review, I decided to play it safe and omit these tests.

The test systems’ Windows desktop was set at 1024×768 in 32-bit color at a 75Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

VillageMark
We’ll start our tests with VillageMark, which was handcrafted by PowerVR (the Kyro people) to demonstrate the weaknesses of conventional 3D chips and the strengths of the Kyro chips. Kyro chips use deferred rendering to cut out overdraw, and VillageMark is full of overdraw. Since the GeForce3 and Radeon 8500 both include some occlusion detection capabilities, VillageMark is a good way to test these cards, as well.

The Radeon 8500’s occlusion detection appears to be quite a bit more effective in VillageMark than the GeForce3’s. Still, notice how the original GeForce3 card beats out the GeForce2 Ultra, which has the exact same 7.4GB/s of memory bandwidth, by over 20 frames per second. No, none of these cards is as effective at eliminating overdraw as the Kyro II chip on the 3D Prophet 4500, but the newest generation of chips shows considerable improvement—especially the Radeon 8500.

Now, on to some more real world tests…

Serious Sam
This OpenGL-based first-person shooter gives us a chance to make some extremely odd looking graphs, or plots, or whatever. That’s because Serious Sam records frame rates second by second as its benchmark runs. The graphs you see below show us more than just a frame rate average; we can see peaks, valleys, and relative performance.

As we’ll do in most of our tests, we’ve run Serious Sam at a range of resolutions, from 640×480 to 1600×1200. We’ve done so in order to test several different aspects of 3D chip performance. At low resolutions, performance is dictated largely decent video drivers and the performance of a chip’s T&L unit or vertex shader. At very high resolutions, the limiting factor is fill rate, or pixel-pushing prowess. Generally, memory bandwidth limitations determine a graphics card’s real-world fill rate, but in newer apps with lots of pixel shader effects, the graphics chip itself may become the bottleneck. At intermediate resolutions, performance is determined by a mix of fill rate and polygon throughput.

At low res, the pack is pretty well bunched up. However, the NVIDIA chips are all running just a bit faster than the rest of the pack. Now you can see why NVIDIA has a reputation for producing good drivers.

At 1024×768, the GeForce3 cards bunch together at the top. Just below them, the Radeon 8500 and GeForce2 Ultra are more or less tied. You can see, though, how different the shape of the Radeon 8500 and GeForce2 Ultra lines are; their peaks and valleys come in different places.

As the resolution increases, the Radeon 8500 keeps looking better and better. The GeForce3 Ti 500, however, is an absolute monster. It’s averaging over 100 fps at 1280×1024 in one of the most visually rich games around.

The highest resolution gives us a peek at a purely fill rate-limited scenario. The 8500 manages to catch the original GeForce3, but the Ti 500 is faster still.

 

Quake III Arena
Now that we have all of that cheating brouhaha out of the way and ATI has fixed its drivers, we can test Q3A fair and square. Rather than use the old “demo001” test that’s been around forever, we chose to use the “four.dm_66” from the new 1.30 version of Quake III.

Although the graphs aren’t as complicated, the low-resolution results look strikingly similar to Serious Sam: the NVIDIA cards are bunched up together, and they’re simply a cut above everything else.

The pack starts to separate as we increase the resolution. Once again, the GF3 Ti 500 is leading. Amazingly, at 1280×1024, the next-gen GeForce3 and Radeon 8500 cards are well over twice as fast as the original Radeon 64MB DDR.

The Radeon 8500 again ties up with the GeForce3 in a fill rate-limited situation.

MDK2
Our “simulated” GeForce3 Ti 200 wouldn’t complete the MDK2 test; I’m not quite sure why. As a result, its scores are omitted below.

Once again, the NVIDIA cards are a notch above everything else at low resolutions in OpenGL.

Yet again, the Radeon 8500 rises up the ranks as the resolution increases, but yet again, the GeForce3 Ti 500 dominates, even at 1600×1200.

 

Vulpine GLMark
Our last stop in the OpenGL gaming tests is Vulpine GLMark, which is intended as an OpenGL-based alternative to 3DMark.

What was it I was saying about NVIDIA drivers, poly throughput, and low resolutions? I forget.

The Radeon 8500 is simply outclassed here. The bottom line: for OpenGL games, the GeForce3 Ti 500 is the fastest game in town. The Radeon 8500 is about as fast as a GeForce3 Ti 200 at intermediate resolutions, where most folks will want to play their games.

3DMark 2000
We’ll kick off our Direct3D tests with good ol’ 3DMark 2000. We decided to include 3DMark 2000 for a couple of reasons. First, the world needs more Direct3D-based games with built-in benchmarking functions. Second, most of the PC world’s 3D current games are written using DirectX 7, not DirectX 8. Since 3DMark 2001 tests specifically and extensively for DirectX 8, it made sense to include 3DMark 2000 as a DirectX 7 performance metric.

Here’s how the cards performed:

The Radeon 8500 puts in a strong showing, but chalk up another win for the Ti 500. Let’s break down 3DMark 2000’s individual game tests to see how the competitors matched up.

Once again, in DirectX 7 as in OpenGL, the GeForce3 Ti 500 is just a little bit faster all around. The Radeon 8500 puts in a very strong second-place showing, especially at high detail levels.

 

3DMark 2001
Now for the main event: a real DirectX 8 benchmark that takes advantage of vertex shaders, pixel shaders, and other advanced features. This is where the next-gen cards are supposed to shine. Will the Ti 500 win yet another yawner here?

Not this time. ATI’s new card finally takes a win—and in one of the most important tests for predicting performance on future games. Fortunately, 3DMark 2001 lets us delve into the individual components of the benchmark, so we can see exactly how the Radeon 8500 managed to win this one.

Game tests
Don’t let the legend on the graphs fool you. Compared to our other tests, 3DMark’s game scenes are either “high detail” or “painfully high detail.” There is no “low detail” here.

As you can see, the Radeon 8500 isn’t fastest on every test, but it’s strongest overall. In Dragothic, the 8500 outruns the Ti 500 by 20-plus frames per second.

Next up is the Nature test, the super-intensive outdoors scene with real trees, flowing water, and lots of advanced effects. The previous-gen cards won’t even run this one.

The Radeon 8500 ties the original GeForce3 in this grueling test, but can’t keep pace with the GF3 Ti 500.

 

Fill rate and pixel shader performance
I’ve tried to group these 3DMark 2001 result as logically as possible. The next few tests measure pixel-pushing ability, either in terms of raw fill rate or pixel shader speed. We’ll start with 3DMark’s standard fill rate tests.

The Radeon 8500 wallops the GeForce3 Ti 500 in both fill rate tests. That’s not surprising given the two cards’ respective specs, but it’s very surprising after the way the Ti 500 outperformed the 8500 at 1600×1200 in all of our other benchmarks.

Let’s look at some specialized tests, where advanced effects, either hard-wired or via the pixel shaders, are on display.

The GeForce3 cards are faster at both flavors of bump mapping. What happens when we dive into real DirectX 8-class pixel shaders, where only the latest chips can play?

ATI’s new chip is by far the fastest here, beating NVIDIA at the game NVIDIA invented.

Poly throughput and vertex shader performance
The first couple of tests here measure performance on fixed-function transform and lighting, not vertex shaders.

Well, that pretty much settles that! The Radeon 8500’s ability to run its vertex shader in parallel with its fixed-function T&L unit may help explain its astounding performance here. Whatever the case, the Radeon 8500 nearly triples the Ti 500’s performance on the 8-light test.

Notice also that the GeForce2 Ultra, with its fixed-function T&L unit, outperforms the GeForce3, which implements its fixed-function T&L as a vertex shader program.

Point sprites are especially useful for creating special effects like snowstorms, waterfalls, or crackling fires. They’re generally handled by a T&L engine, when one is available. That’s good news for the Radeon 8500, as it wins convincingly again.

Finally, we have 3DMark’s vertex shader tests…

And again, the Radeon 8500 mops the floor with everything else. Oddly, the GeForce2 Ultra beats out the GeForce3 cards here. In theory, at least, a true vertex program ought to run faster on a card with a vertex shader than on one without. DX8’s software vertex shaders would no doubt run fast on our test system’s Athlon XP 1800+ processor, but that doesn’t explain why the GeForce2 Ultra is so much faster than the Radeon DDR and Kyro II cards here. I suspect this isn’t a true test of hardware vertex shader capabilities.

Nonetheless, these tests are as close as we have to low-level hardware benchmarks for graphics chips, and they show that there’s a beast lurking inside the Radeon 8500. Its T&L capabilities are breathtaking. I just wish they’d show up in more of our game tests.

 

Giants: Citizen Kabuto
We searched long and hard for a decent Direct3D game benchmark, and finally we found Giants. This game uses bump mapping and lots of polys to achieve its look. The Radeon 8500 put in such a strong showing in 3DMark 2001, we’re wondering if it won’t carry over into this D3D game.

Even in this relatively new and feature-rich Direct3D game, the Radeon 8500 can’t quite keep up with the Ti 500.

SPECviewperf
SPECviewperf measures performance in high-end OpenGL applications, the kind one would run on a workstation. Both the Radeon 8500 and GeForce3 have high-fallutin’ cousins (the FireGL and Quadro lines, respectively) that are built around, basically, the GeForce3 and Radeon 8500 chips. NVIDIA and ATI give these cards some tweaks, many of them in software, to make workstation-class apps run a little better. Regardless, we’ll test these much less expensive consumer cards to see how they perform.

Overall, it’s a wash. The Ti 500 and Radeon 8500 are about equal.

 

CPU scaling
We’ll cap off our benchmarking with a look at how these two cards handle working with different types and speeds of CPUs.

In Quake III, the two cards scale just about the same, though the Radeon 8500 is a little slower overall. The only real exception is the difference between the Athlon XP 1800+ and the Pentium 4 1.8GHz. The Ti 500 is a little faster with the Pentium 4, and the Radeon 8500 is a little faster with the Athlon XP. We’re talking about teensy little differences here, though, so it really doesn’t mean much.

As an application, MDK2 is quite a bit happier running on an Athlon. In terms of performance scaling, though, the two cards are identical. There’s no reason to pick one graphics card over the other the get the best performance with a particular brand of CPU.

 

3D image quality
I want to touch on the subject of image quality for a second, because there are a few things worth mentioning. Like many folks, I’ve estimated in the past that the Radeon’s image quality is superior to the GeForce2’s. Naturally, one might think that situation carries over to the current generations of cards, but not necessarily. The GeForce3 has lots of internal precision, so even with multipass rendering, it looks great.

The Radeon 8500 looks quite good, too, and most folks would be hard pressed to tell the difference between the two most of the time. However, there are a couple of differences in how these two chips implement common filtering methods. First, have a look at these three screenshots to see the difference between the chips’ trilinear filtering implementations. I’m using the psychedelic “r_colormiplevels” variable in Quake III to show exactly where the mip map levels are.

The first shot is the GeForce3 with trilinear filtering enabled, and the next two are the Radeon 8500 with trilinear filtering enabled, first with the 6.13.3276 drivers (with cheats disabled), and next with the 6.13.3286 drivers.


The GeForce3’s mip map bands are uniformly distant
from the player’s point of view


3276 drivers: The Radeon 8500 mip maps are bounded by two lines that intersect at an angle,
though mip map boundaries are blended smoothly


3286 drivers: The Radeon 8500 mip maps are still bounded by two lines that intersect at an angle,
and now mip map blending has little precision (two levels)

Now, the difference between these two trilinear filtering implementations isn’t night and day, and folks who are gonna fight endlessly over these things in online forums really need to get out more. But I believe that the GeForce3’s implementation, where the mip map boundaries form an arc at a set distance from the “camera,” is a little more proper.

The Radeon 8500’s mip map boundaries are a little less precise. They tend to move and jump around, too, and not in a uniform way. You can see how the boundaries in the screenshots there aren’t symmetrical; they don’t really meet in the center of the screen. Sometimes, they intersect way off center. It’s hard to describe unless you see it, but ATI is kind of guesstimating where mip map boundaries ought to be.

Of more concern, however, is the way the 3286 drivers cut down on blending between mip map levels. They’ve essentially fudged trilinear filtering, probably to gain a little performance edge. Or maybe they just broke it somehow.

Either way: tsk, tsk.

 
The next thing I want to show you is how these two chips handle the combination of anisotropic filtering and trilinear filtering. Below are two more screenshots, the first from the GeForce3 and the second from the Radeon 8500.


The GeForce3 with anisotropic and trilinear filtering enabled

The GeForce3 implements trilinear filtering properly with anisotropic filtering, and it produces beautiful, smooth gradients.


The Radeon 8500 with anisotropic filtering enabled.

With anisotropic filtering enabled on the Radeon 8500, two things happen. First, trilinear filtering goes away. Transitions between mip maps are stark. Second, see how the mip map transition is positioned oddly on the walkway in the screenshot. The lack of precision in determining mip map boundaries makes for some weird results with anisotropic filtering enabled. I believe both of these problems are carried over from the original Radeon, and it’s a shame to see them in the 8500.

 

Antialiasing
Now we finally get to talk about SMOOTHVISION, the Radeon 8500’s new antialiasing mode. ATI calls SMOOTHVISION “programmable multisample anti-aliasing,” which is kind of correct. Let’s review the various methods of anti-aliasing out there, so we can see where SMOOTHVISION fits in.

The methods


    Ordered-grid
    sampling pattern
  • Ordered-grid supersampling (OGSS) — This approach is employed by GeForce/GeForce2, Radeon, and Kyro/Kyro II cards. The 3D scene is rendered at a higher resolution than the intended output resolution, and then scaled down. For instance, to produce a scene using 4X OGSS at 640×480 resolution, the card would render the scene at 1280×960, and then scale down the result to 640×480. Thus, the color of each pixel on the screen would be determined by averaging the color values for four sub-pixels. This is a very basic approach to antialiasing, and given the right sample size, it can be fairly effective. OGSS can potentially affect almost every pixel on the screen, cutting down on “jaggies” at edges and helping improve texture quality on surfaces.

    Rotated-grid
    sampling pattern
  • Rotated grid supersampling (RGSS) — 3dfx’s approach with the Voodoo 5. This hybrid approach is a modified version of OGSS. Rather than doing its supersampling on an ordered grid, the Voodoo 5 renders its supersample images on a grid with a tilt angle to it. In the case of 4X FSAA, the samples are “jittered” from one sample to the next in a diamond-shaped pattern. This slightly “off” pattern doesn’t correspond so neatly to the display’s grid of pixels, making it harder for the eye to recognize aliasing patterns.

    At a given sample size, RGSS is likely to produce results slightly superior to OGSS. However, sample size matters more than the difference between these two techniques. 4X OGSS almost always looks superior to 2X RGSS, no matter what the few remaining 3dfx fanboys would have you believe.

  • Multisampling — This is the GeForce3’s AA method. Like the Voodoo 5, the GeForce3’s sampling pattern follows a rotated grid—but only in 2X AA mode. In 4X mode, the GeForce3 uses an ordered grid. Either way, though, the GeForce3 takes a different approach to sampling called multisampling.

    Multisampling isn’t full-scene antialiasing like the two techniques I’ve described above. Instead, it does edge antialiasing. Take 2X antialiasing, for instance. The GeForce3 captures two versions of the scene to be rendered, with samples distributed according to a rotated grid pattern, as you’d expect. But it doesn’t do a texture read for both of those two sub-pixels. Instead, the chip does a single texture read for the first sub-pixel and uses that value to determine the color of both sub-pixels. So you’ve saved a little bit by avoiding an extra texture read, although the memory bandwidth requirements for storing the two sub-pixels are the same as with supersampling.

    Here’s the trick with multisampling: If a pixel is on the edge of a polygon, its sub-pixels probably “belong” to different polygons which comprise different objects in a scene—a lighter object in the foreground, say, and a darker one in the background.


    Quincunx in action

    If this is the case, the GeForce3 will perform a blend on the colors of the sub-pixels to determine the final color of the pixel. Thus, the chip effectively antialiases polygon edges, but not surfaces or the textures mapped to them—and it does so more efficiently, with fewer texture reads.

  • Quincunx — NVIDIA’s Quincunx is simply this: 2X multisampled AA (on a rotated grid) plus a blurring filter. Once the 2X AA image is produced, the blurring filter is applied. Yes, this filter looks at the three adjacent pixels—not subpixels, but actual pixels—to do its thing, but it’s not looking up any additional information. Compared to straight-up 2X AA, Quincunx sacrifices clarity in order to reduce high-frequency texture shimmer and edge crawling. It’s a trade-off, pure and simple. Personally, I’d rather use 2X AA.

    SMOOTHVISION’s
    pseudo-random
    sampling pattern
    template
  • SMOOTHVISION — Last but not least is SMOOTHVISION. ATI’s new AA method is billed as a “multisample” method, but SMOOTHVISION is actually supersampling, not GeForce3-style multisampling. That means it affects more than just edge pixels; SMOOTHVISION touches nearly every pixel on the screen, so it provides a little more texture antialiasing than true multisampling. It’s also a little less efficient than multisampling, because it performs more texture reads.

    However, the real innovation with SMOOTHVISION is its pseudo-random sample distribution. ATI scatters the samples it pulls for each pixel according to a template, which you can see in the diagram at the right. That disorderly mess is even less aligned with the grid of pixels in a display than 3dfx’s diamond-shaped RGSS pattern. To get a feel for how it works, take a look at the diagram below, which shows where sub-pixel samples might be located for four neighboring pixels.


    Four different pixels might pull samples from these
    locations with 4X SMOOTHVISION AA

    This approach might seem counter-intuitive at first, but it works. By distributing pixels in this randomized fashion, SMOOTHVISION disrupts the eye’s ability to key in on the grid of pixels that comprises the moving images before it. In full motion, SMOOTHVISION is wildly effective. It’s easily the best antialiasing solution on the market, and I’m darn near convinced that 2X SMOOTHVISION is as good as 4X ordered-grid solutions. It’s not perfect—sharp, high-contrast edges look kind of “fizzy” in 2X mode as the sub-pixel samples move around. A more random sample distribution would be even more effective. Still, SMOOTHVISION is easily better than the present competition.

    Sadly, the effect is lost in static screenshots. You’ve got to have motion for the pseudo-randomized sampling to work its magic. Heck, I think SMOOTHVISION’s static screenshots look worse than everything else, because the edge AA blending isn’t as “perfect.” But that won’t stop me from dumping a load of screenshots on you for comparison.

    Note that in the following screenshots, we’ve got two pictures from the Radeon 8500 for each AA mode. One is marked “quality” and the other “performance,” because ATI’s drivers offer two options. Near as I can tell, “quality” mode is pseudo-randomized SMOOTHVISION, and “performance” mode is ordered-grid supersampling.

 

Non-AA reference images
The next few pages will show you exactly how the antialiasing output of the cards we’ve tested compares. The first page is nothing more than non-antialiased images, included for reference.

Voodoo 5, no AA

Radeon, no AA

Radeon 8500, no AA

GeForce2 series, no AA

GeForce3 series, no AA

3D Prophet 4500, no AA

 

2X antialiasing

Voodoo 5, 2X AA

Radeon, 2X AA

Radeon 8500, 2X AA, performance

Radeon 8500, 2X AA, quality

GeForce2 series, 2X AA

GeForce3 series, 2X AA

3D Prophet 4500, 2X AA (Horizontal)

3D Prophet 4500, 2X AA (Vertical)

 

4X antialiasing

Voodoo 5, 4X AA

Radeon, 4X AA

Radeon 8500, 4X AA, performance

Radeon 8500, 4X AA, quality

GeForce2 series, 4X AA

GeForce3 series, 4X AA

3D Prophet 4500, 4X AA

 

Antialiasing performance
Now that you’ve spent the requisite time leaning into the monitor to detect subtle differences in antialiased images, let’s look at how the cards perform with anti-aliasing enabled. Remember, though, that we’re not entirely comparing apples to apples here. Some of these AA modes look better than others, so higher performance isn’t necessarily the whole story.

In 2X AA, the GeForce3-based cards run well ahead of everything else. Let’s see if the picture changes in 4X AA.

4X AA is pretty much the same story. SMOOTHVISION is pretty, but it’s really not playable in Quake III at 1024x768x32, while NVIDIA’s 4X multisampled AA is.

 
Here’s how the various AA modes look in Serious Sam. We’ve chosen to omit the Radeon 8500’s “performance” mode here, so the Radeon 8500 scores are for “quality” mode, or true SMOOTHVISION AA.

Especially in 4X mode, it’s easy to see how the GeForce3’s more efficient multisampling AA produces better performance.

 

Conclusions
It’s hard to know exactly what to make of the results we’ve seen, but some things are clear. In every gaming test we threw at it and in most of the more synthetic benchmarks, the GeForce3 Ti 500 showed that it’s the fastest graphics card on the market, bar-none—an absolute screamer. Turn on anti-aliasing, and the Ti 500 only wins by a larger margin. The GeForce3 chips include a bundle of new capabilities that developers are only now beginning to really use, so it’s reasonably future-proof, too. On top of all that, the GeForce3 chips handle trilinear and anisotropic filtering perfectly, while the Radeon 8500 seems to fudge a little.

You just can’t go wrong buying a GeForce3 Ti 500.

On the other hand, the Radeon 8500 showed signs of greatness in 3DMark 2001. In terms of both fill rate and poly throughput—or put another way, in terms of pixel shaders, fixed-function T&L, and vertex shaders—the Radeon 8500 looked scary fast. Unfortunately, the same card turned out to be just a little bit slower than a Ti 500 in all of our OpenGL and DirectX 7 gaming tests.

Still, the 8500 has much to recommend it. It has a potential leg up in terms of advanced vertex and pixel shader features. Whether developers ever make use of them is another story. We’ve seen ATI have a bit of an advantage in terms of hardware features in the past, but it didn’t really matter. I can’t name off the top of my head a single game that uses Radeon-specific features to give players a better visual experience than could be had with a GeForce2. ATI needs to get more developers behind its technology this time around. I’m a little skeptical that will happen.

ATI’s efforts with TRUFORM may just prove me wrong, though. Already, versions of Serious Sam and Half-Life are available with TRUFORM support.

For some of us, the Radeon 8500’s dual-display support will be a big advantage over the GeForce3. If you want next-gen graphics and multiple monitor support in a single card, the 8500 is the only game in town. Frankly, I’m puzzled why anyone ready to drop upwards of $250 on a video card wouldn’t expect dual video outputs. If cheapy cards like the Radeon VE and GeForce2 MX come with dual-display capability, these high-dollar cards ought to, as well.

And finally, antialiasing addicts will want to see SMOOTHVISION in action. It’s simply the best thing going, AA-wise. 

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!