NVIDIA’s GeForce3 graphics processor

NVIDIA’s position has only gotten stronger since the TNT chip. The GeForce series of chips—extending up into the high end with the Ultra and Quadro lines, and down into the value market with the MX line—has devastated the competition. Since the introduction of the first GeForce chip, S3 has spun away, Matrox has dug safely into its niche markets, and 3dfx finally conceded defeat, selling out to NVIDIA. ATI remains standing, but it’s not clear whether the company is gathering its strength for a renewed attack, or waiting wobbily for its knock-out punch—probably some combination of the two.

Now comes NVIDIA’s GeForce3 chip, the company’s most ambitious undertaking yet. It’s also one of the most poorly named products in recent memory, because the GeForce3 name implies that this chip is a follow-on to the previous GeForce chips. In reality, GeForce3 is a completely new graphics chip that initiates a novel approach to real-time 3D graphics.

It was a risk for NVIDIA to design and produce this chip at this point in time. They could have concentrated on adding raw pixel-pushing power and refined the conventional approach to real-time 3D graphics. Doing so would have arguably delivered more performance on present-day games, and it would have avoided shaking up a market that NVIDIA more or less owned. The fact they didn’t choose this approach is a credit to the company’s leadership—at least, assuming GeForce3 manages to deliver on its promise. Let’s take a look and see whether NVIDIA has pulled it off.

The foundation for a new beginning
The keys to the GeForce3’s new approach to rendering are two of the chip’s main functional units, dubbed vertex and pixel shaders. By now, it’s likely you’ve read a fair amount about these things, and if you’re like me, your eyes probably glaze over at the mention of a vertex shader. But I think these two things—vertex shaders and pixel shaders—are worth understanding well.

Besides, they are, at heart, easy concepts.

OK, not “circle is round” simple or “Rosie O’Donnell is annoying” intuitive, but they’re easy enough to grasp, nonetheless. My first exposure to the GeForce3 came at Comdex last fall, when NVIDIA showed us early silicon with early drivers and a very early software demo. That demo evolved to become the best single demonstration of both vertex and pixel shaders in action together that I’ve seen yet.

But before we get to that, I’m gonna have to take a crack at explaining pixel and vertex shaders. If you can already feel your eyes glazing over, skip ahead to the next section. I won’t hate you. Much.

 

The vertex shader
The vertex shader is analogous to the transformation and lighting unit in previous GeForce chips, except it’s been injected with elasticity.

Stick with me here. Here’s what I mean. In the past, 3D cards were capable of rendering static 3D objects, or of rendering models with very simple joints. This is why Quake II is populated by mutant robots: mutant robot joints are easy to animate in 3D. There’s a blocky torso, and then there’s a simple, hinge joint on which an arm or leg is connected. It all looks very mechanical and not at all organic, so hey, the bad guys are robots!

Then along came the first-gen T&L-capable cards, the GeForce line and the Radeon, which promised, well, faster robot joints and even more frantic mechanical hinge action. These cards also included some basic facilities to get beyond such rudimentary animation—especially the Radeon—but such features weren’t widespread or capable enough to win support from developers. So it didn’t happen.

The GeForce3 changes all that by sporting a programmable vertex shader. With this chip, developers can write programs to control the motion of polygon meshes, define how they interact with one another, and create organic-looking real-time animation.

But why call it a vertex shader? Vertices are the points at the intersections of the lines in triangles. Each triangle has three vertices. Vertices are also the basic pieces of information a computer uses to keep track of 3D objects. Each of these little points carries with it a lot of information, from color information to its coordinates in 3D space to special things like transparency info. The vertex shader manipulates vertex data in concert, calculating how a mesh of polygons will move and bend together, and how light sources will influence the color and intensity of those vertices.

So it handles vertices, and it calculates lighting. Hence the term “vertex shader.” It does a lot more than just apply shading to vertices, but what can ya do? That’s what NVIDIA likes to call it.

The vertex shader does have its limitations. For instance, it’s no help washing the dishes. More immediately, the vertex shader can’t add or subtract polygons; it only manipulates the polygons that are there. But the vertex shader is a programmable, reconfigurable part of the GeForce3 graphics processor. It can be made to perform wacked-out deformation effects, like bullets denting metal, or the all-important animation of organic, 3D models with skin and muscles stretched over virtual skeletons. Or whatever else those devious graphics guys can think up. Developers will be able to write low-level programs to run on this high-level bit of hardware, directly controlling how graphics operations will be computed.

So graphics guys are going to be sounding more and more like CPU geeks, complaining about the number of registers available to them or other esoteric things. But the effects they create will be mind-blowing.

The pixel shader
The pixel shader, on the other hand, handles per-pixel lighting duties. Its precursor was the GeForce-series’ shading rasterizer with its nifty register combiners. I got all weepy over register combiners right here and especially here when writing my GeForce2 review. Click through if you want to know what they were all about. And bring a tissue.

The pixel shader is an evolved version of GeForce’s shading rasterizer. If your eyes didn’t glaze over reading about the vertex shader above, congrats. You probably know that the vertex shader handles lighting, or shading, of vertices. So basic lighting is taken care of before pixel shaders come into the picture. What the pixel shader does is—you guessed it—shade pixels.

Per-pixel lighting augments the lighting calculated by the vertex shader by allowing each pixel’s color to be altered by a number of different effects. Pixel shader effects are created through the creative application of textures to polygons. Basic texture mapping is used everywhere in 3D now; wrapping pictures around polygons is a foundation of 3D as we have known it. What pixel shaders do is allow textures to be used to generate a range of effects. Some of these effects are familiar and well defined: dot-product bump mapping, for instance. Others haven’t even been dreamed up yet.

The math involved in applying some of these individual effects at the pixel level—of applying a texture to a pixel using a certain formula and spitting out the results—is usually relatively simple floating-point math. The cumulative results of applying a number of these effects at once can be breathtaking. Taken to an extreme, pixel shader effects theoretically could produce just about any effect one might want. Adding reflectivity, glossiness, bumpiness, or recreating the effects of LSD without all the harmful side effects—it’s all possible.

In reality, though, these pixel shaders have some substantial shortcomings. GeForce3’s pixel shaders are extended versions of NVIDIA’s register combiners; they’re definitely improved in the some important ways, but they aren’t yet flexible enough to fulfill the full promise of the pixel shader concept. (See John Carmack’s comments on the subject here.) Internal restrictions will limit the effects developers can create.

Also, the pixel shaders can only apply so many textures per rendering pass. In this case, the GeForce3 maxes out at four textures per pixel without resorting to another pass (additional passes harm performance). That’s more than the GeForce2 (two textures per pixel) or the Radeon (three). NVIDIA’s new baby is more flexible in applying those textures than either of those chips, too, but four textures per pass is a limitation.

In these and other areas, there’s room for improvement in future products.

 


Vertices


Polygons


Solid, no textures


Per-pixel shading effects


Per-pixel shading effects


Environment mapped

Vertex and pixel shaders in action
It is a well-worn clichΓ© in hardware reviews such as this one to spend some time looking at a manufacturer’s demo showing off a product’s features. You will have to forgive me for the following, but this demo makes my eyes pop outta my head every time I watch it. And just to prove I’m an incorrigible geek, I get excited about what’s happening behind the scenes, not just the eye-popping finished product. It’s like when I got my first glimpse of Star Raiders on the Atari 800 back in the day, only about 10 million times better.

NVIDIA’s Chameleon demo shows off both vertex and pixel shaders, and it’s a great object lesson in the new capabilities GeForce3 brings to the desktop. Since there’s about zero software for the GeForce3 at present other than demos from NVIDIA or its partners, this one’s worth watching. Well, what really makes it worth watching is the graphics being created in real time by the GeForce3. To give you a good handle on exactly what this demo is doing, I’ve captured some high-res, low-compression screenshots, augmented by a video of this thing in action.

Vertex shader stills
At the right, you can see screen shots of the demo. The first three pictures aren’t exactly stunning, but they will give you a good sense of what the vertex shader is doing. The top picture shows the vertices of the chameleon model used in the demo. These are the points of data the graphics processor manipulates to create motion. Nearly all of those points in the chameleon’s torso, from the neck to the base of the tail, are part of a vertex shader program that flexes the entire polygon mesh as the chameleon walks.

The second picture from the top is the same model represented as polygons. It’s a relatively high-poly model by most standards. Now look at the picture below it. That shows the model as a solid object. Note how the solid version of the model is lit. The lighting is the work of the vertex shader, and with as many polys as this model has, it looks quite good. However, though the model is detailed enough to look very smooth, it lacks the extra polys necessary to give texture to the chameleon’s skin.

Those first three screenshots could have been produced on a GeForce2. What’s impressive about the vertex shader is how it animates things, as we’ll see shortly. But the pixel shader is up next.

Pixel shader stills
The bottom three screenshots at the right are truly impressive, even as stills. These pictures show the same chameleon model with the same number of polygons, only this time, the model is mapped with color, specular, and bump maps. The first two pixel shader screenshots, with the brown and green chameleons, show how pixel shader effects can create realistic-looking lighting. The final shot, the “liquid metal” chameleon, is everybody’s favorite; it uses cubic environment mapping to simulate the model reflecting its surroundings.

Taken together, the vertex and pixel shaders can create the kinds of effects previously reserved for professional, frame-by-frame rendering packages. This isn’t quite Shrek or the Final Fantasy movie, but it’s surprisingly close.

Moving pictures


See the movie. (5MB)
Download Win32 video codec here.

Now let’s set things in motion with a video, so I can really give you a sense of how these things work. Click on the links at the left to get going. (The video is in DiVX πŸ™‚ format, if you need a non-Win32 codec to watch.) There are several things you should notice when watching the video.

First, you can see my mouse pointer in the video, and the camera jumps around some. The GeForce3 generates this animation in real time, and I can control the motion interactively. The motion is fluid, too.

Next, the torso of the model is bending and flexing as part of a vertex shader program. This model is skinned, and has a 25-bone skeleton. The vertex shader, not the CPU, processes movement calculations. Watch as I move the camera around, especially when it’s looking down on the model, and you can see the chameleon’s torso move as an organic-looking unit.

Also, notice how the texture of the chameleon’s skin shifts along with its colors. These are changing pixel shader effects, including varied specular and reflection maps.

Finally, that’s me losing bladder control off camera. Happens every time I watch this demo.

 

The chip specs
In order to do all of these crazy new things, the GeForce3 chip is one of the most complex consumer-oriented semiconductors you can buy. It’s made up of more transistors than an Athlon or Pentium 4 chip, and it’s over twice the transistor count of the GeForce2. Let’s spell it out for you in blue and white…

 

Transistors
(Millions)

Process
width
(Microns)

GeForce2 MX

19

.18

GeForce2 GTS

25

.18

GeForce2 Ultra

25

.18

GeForce3 57 .15
Radeon 64MB DDR 30 .18

Voodoo 5 5500

15 * 2

.25

At 57 million transistors, the GeForce3 dwarfs previous graphics chip designs. Fabbed by TSMC on a new, 0.15-micron process, the chip is small enough, and runs cool enough, that GeForce3 cards don’t look visibly different from GeForce2 Ultra cards. There are other similarities with the GeForce2 Ultra, as well.

Below are the basic clock speeds and 3D pipeline specs for some of the more common video cards out there. The fill rate numbers are worthy of note for several reasons. First, fill rate indicates the pixel-pushing power of a video card, which is a very important metric for performance. Generally, “texel” fill rate is more important than pixel fill rate, because most games and apps layer a number of textures on each pixel, and a “texel” is simply a textured pixel. However, the fill rate numbers below are theoretical peaks, and they’re almost meaningless in many cases. We’ll discuss why below.

 

Core clock

Pixel pipelines

Fill rate Mpixels/sec

Textures per pixel

Fill rate Mtexels/sec

GeForce2 MX

175MHz

2

350

2

700

GeForce2 GTS

200MHz

4

800

2

1600

GeForce2 Ultra

250MHz

4

1000

2

2000

GeForce3 200MHz 4 800 2 1600
Radeon 64MB DDR 183MHz 2 366 3 1100

Voodoo 5 5500

166MHz

2 * 2

667

1

667

One thing that’s not noted on the chart: though the GeForce3’s peak texel fill rate per cycle is 1600 Mtexels per second, the card is capable of laying down an additional two textures in a second clock cycle without another rendering pass. It uses a “loopback” method to accomplish this task, boosting its effective texel fill rate.

In clock speed and pixel fill rate, the GeForce2 Ultra is actually faster. All of which is very interesting, but doesn’t mean much, because right now, the primary bottleneck for video card performance is memory bandwidth. The GeForce3 won’t even come close to hitting its theoretical peak fill rate the vast majority of the time, because there’s no way the card’s video memory can move the amount of data necessary to make that happen.

NVIDIA has been very aggressive about using the fastest DDR SDRAM available to keep memory bandwidth high. Here’s how our examples stack up on that front.

 

Memory clock

Bus width

Memory bandwidth

GeForce2 MX

166MHz

128 bits

2.7GB/sec

GeForce2 GTS

333MHz (166MHz DDR)

128 bits

5.2GB/sec

GeForce2 Ultra

460MHz (230MHz DDR)

128 bits

7.4GB/sec

GeForce3 460MHz
(230MHz DDR)
64 bits DDR * 4 7.4GB/sec
Radeon 64MB DDR 366MHz
(183MHz DDR)
128 bits 5.87GB/sec

Voodoo 5 5500

166MHz

128 bits * 2

5.2GB/sec

Yep, the GeForce2 Ultra and GeForce3 have the same amount of memory bandwidth. However, the way they arrive at those numbers is different. That odd notation for the GeForce3’s bus width is the best way I could express the configuration of the chip’s crossbar memory controller.

“Its what?”

 

Opening the memory bottleneck
NVIDIA has endowed the GeForce3 with a number of tricks to make more efficient use of its memory bandwidth. If you read my ATI Radeon review, you may be familiar with some of those tricks. Among them:

  • Z compression — Like the Radeon, the GeForce3 is capable of compressing and decompressing Z data on the fly. This info, which describes the depth of each pixel (its position on the Z axis), chews up a lot of memory bandwidth. NVIDIA claims a peak compression ratio of 4:1 on Z data, and that compression routine is “lossless,” so visual fidelity isn’t compromised.
  • Z occlusion culling — Much like the Radeon’s Hierarchical Z feature, the GeForce3 implements Z occlusion culling. A fancy name for a simple concept, Z occlusion culling attempts to prevent the chip from rendering pixels that will end up behind other pixels, and thus not visible, once the complete scene is drawn.

    Drawing pixels that wind up being occluded by others pixels is known as overdraw, and most scenes have a fair amount of overdraw. Generally, 3D chips draw two to three times the number of pixels needed in order to render a given scene. Doing so consumes fill rate and memory bandwidth unnecessarily. (For a look at a rendering architecture that eliminates overdraw, see our Kyro review.) Scenes with more depth complexity are even worse, and as 3D gets more advanced, the average depth complexity is rising.

    Z occlusion culling is an important first step toward cutting down on overdraw. NVIDIA claims an average memory bandwidth savings of 50 to 100% in real-world apps. We’ll test that theory below.

  • Crossbar memory controller — Traditionally found in high-end systems from Sun or SGI where multiple devices are contending for resources, a crossbar memory controller design allows more efficient use of available memory bandwidth. To understand how the crossbar approach is implemented in the GeForce3, first consider a GeForce2 memory controller. It transfers data to its local memory over a 128-bit bus two times per clock cycle (with DDR memory), effectively reading or writing 256 bits at once. Should a piece of data be only 64 bits wide, the controller wastes the other 192 bits of data for that clock cycle.

    The GeForce3’s crossbar controller breaks data into 64-bit pieces, using four independent paths to memory and, effectively, four memory controllers. When 256 bits of data need transferred at once, it can use all four paths to transfer 256 bits in the same amount of time as a GeForce2-style controller. When only 64 bits of data need to be read or written, one controller can handle that task, leaving the other three controllers available for other work.

  • Higher-order surfaces — All of the tricks above are intended to cut down on pixel-pushing memory use, which consumes a video card’s path to its local memory. Geometry data to describe a 3D scene presents another bandwidth problem at a different place: the AGP bus. The system has to transfer thousands of polygons across the AGP bus in order for a graphics chip to build a 3D scene. NVIDIA has given the GeForce3 the ability to understand geometrically complex curved surfaces as formulas, which are much smaller and easier on the AGP bus.

    Incidentally, ATI has announced support for a competing approach to higher-order surfaces in its upcoming follow-on to the Radeon, so it may be a while before these things sort themselves out and the feature is widely used.

All told, these bandwidth conservation techniques ought to allow the GeForce3 to outperform a GeForce2 Ultra, even though they use the same kind of memory at the same clock speed. Let’s test it and find out.

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. All tests were run at least twice, and the results were averaged.

The test system was built using:

Processor: AMD Athlon processor – 1.33GHz on a 266MHz (DDR) bus

Motherboard: Gigabyte GA7-DX motherboard – AMD 761 North Bridge, Via VT82C686B South Bridge

Memory: 256MB PC2100 DDR SDRAM in two 128MB DIMMs

Audio: Creative SoundBlaster Live!

Storage: IBM 75GXP 30.5GB 7200RPM ATA/100 hard drive

The system was equipped with Windows 2000 SP1 with DirectX 8.0a. No, we didn’t use Win9x/ME for this test, and no, we don’t regret it. For Microsoft operating systems, Win2K is the present and the future. If a product can’t perform well in Win2K, it deserves to be counted as a poor performer.

For comparative purposes, we used the following video cards and drivers:

  • 3dfx Voodoo 5 5500 64MB with 1.04.00 drivers
  • ATI Radeon 64MB DDR with 5.13.1.3132 drivers
  • NVIDIA GeForce2 GTS 64MB (Asus AGP-V7700) with 11.01 NVIDIA reference drivers
  • NVIDIA GeForce2 Ultra 64MB (NVIDIA reference card) with 11.01 NVIDIA reference drivers
  • NVIDIA GeForce3 64MB (NVIDIA reference card) with 11.01 NVIDIA reference drivers

We used the following versions of our test applications:

  • 3DMark 2001 Build 200
  • Evolva Rolling Demo with Bump Mapping
  • Quake III Arena 1.17
  • Serious Sam v1.00
  • SPECviewperf 6.1.2
  • VillageMark 1.17
  • Vulpine GLMark 1.1

The test systems’ Windows desktop was set at 1024×768 in 32-bit color at a 75Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests. Most of the 3D gaming tests used the default or “normal” image quality settings, with the exception that the resolution was set to 640×480 in 32-bit color.

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

VillageMark
One of the best ways to test the effectiveness of the GeForce3’s memory optimizations is with VillageMark. Created by the folks behind the Kyro chips, VillageMark has gobs of overdraw and is extremely fill-rate intensive. That makes it a great test for Z occlusion culling, Z compression, and the like.

The GeForce3 outpaces all of the rest of our test cards, easily outperforming the GeForce2 Ultra with the same speed memory. The Radeon, with its own set of HyperZ tricks, beats out the GeForce2 Ultra, despite the Radeon’s slower memory. It’s an impressive showing for both of the cards able to make more intelligent use of memory.

Serious Sam
Next up in our performance tests is Croteam’s Serious Sam. The source of hours of procrastination during the overly long gestation process of this review, Serious Sam is serious fun. It’s also very nice as a benchmark, because it provides us with more and better data than most other games. As a result, we’ve been able to create a different type of benchmark graph than what you may be accustomed to seeing. The graphs show second-by-second performance, plotted over time. That gives us a lot more information than a simple frame rate average.

Before we move on, though, I should mention that Serious Sam creates some potential problems for reviewers. The Serious Sam game engine is smart enough to adapt to the hardware on which it’s running, which is good. For instance, the game knows how to take advantage of the Radeon’s three texture units and the GeForce3’s quad texturing capabilites. Some apps aren’t smart enough to do that. However, the game has been known to adjust a range of settings for various video cards, including some image quality settings one might wish to keep constant in an ideal testing scenario, making it tough to do a true apples-to-apples comparison. For the sake of these tests, I elected to use the game as a real-world application benchmark, allowing it to auto-adjust to each video card as it wanted. Keep that in mind as you take in the results below.

Phew. Now certain pointy-heads won’t be sending me nasty e-mail.

Moving on, let’s look at those results.

At a low resolution, the pack is bunched up pretty tightly. Interestingly enough, the cards all follow the same basic performance pattern—the lines are the same shape, only a little higher or lower than one another. The GeForce3 is not the fastest card here. The Ultra takes that distinction, but it’s close. The moribund Voodoo 5 brings up the rear—without T&L and multi-texturing, the V5 can’t always compete with the others.

 

At an intermediate resolution, the GeForce3 separates itself from its predecessors, showing off its superior fill rate. Notice how the performance lines for each type of card follow their own contours at this resolution. The two GeForce2 cards are nearly the same, while the other architectures are unique. The performance equation at 1024×768 is complex, because both fill rate and polygon throughput play a part. Each type of chip answers those two challenges in different ways.

In this super-hi-res mode, it’s all about fill rate. Here, the GeForce3 has no problem beating out everything else, showing off its memory efficiency and its quad texturing abilities. Even at its performance lows, the GeForce3 runs faster than the Ultra at its performance highs. Very impressive.

 

Quake III Arena
Now for some Quake III, where we can be more assured that visual quality settings are kept on equal footing. Will the GeForce3 repeat its performance?

Once more, at low res where polygon throughput matters most, the Ultra proves fastest. It is close, though, and all the cards run very fast.

At 1024×768, the GeForce3 wins easily. Will the gap widen as the resolutions goes up?

Not this time. The Ultra pulls back into contention, neck-and-neck with the GeForce3. This result illustrates a point: the GeForce3’s memory conservation measures can help, as they did with Serious Sam, but they aren’t always so effective.

 

3DMark 2001
Next up is 3DMark 2001, which isn’t entirely fair. It’s very much a DirectX 8 benchmark, and NVIDIA and Microsoft have worked together to develop DirectX 8. One of DX8’s main aims is to expose GeForce3-class hardware features. There are portions of 3DMark 2001, including the “Nature” game test and the pixel shader test, which only the GeForce3 could complete. That difference is reflected in the overall 3DMark scores…

Nevertheless, many of 3DMark’s individual tests are intriguing, so we’ll spend some time looking at them. Let’s start with fill rate.

Fill rate

Here again, on a synthetic test, we see that the GeForce3 is the all-out fill rate king. Its “loopback” quad texturing abilities are a big help in the multi-texturing test. Similarly, the Radeon’s three texture units allow it to move ahead of the Ultra in the multi-texturing test.

This is about as pure a synthetic fill rate test as we have, and if you look back at the chip charts earlier in the review, you’ll see that none of the chips are even coming close to fulfilling their theoretical peak fill rate capacities. The GeForce2 Ultra, for instance, is rated at 1000 Mpixels per second and 2000 Mtexels per second. In reality, it’s performing at just over a third of its theoretical peak. All of the other cards are the same. With that in mind, the GeForce3’s effective pixel fill rate is relatively efficient: 586.5 out of a theoretical 800 Mpixels per second.

 

Poly throughput
Now let’s turn attention to polygon throughput with some T&L-oriented tests.

The trend is clear: the GeForce2 Ultra is faster than the GeForce3 at T&L. Here’s why: the GeForce3 chip has both its programmable vertex shader and a fixed-function T&L unit to serve legacy apps. (That’s one reason the GeForce3 chip is 57 million transistors.) If a program doesn’t specifically call for a vertex shader, the hard-wired T&L engine is used. In all likelihood, the fixed-function T&L engine on the GeForce3 is very similar to the same engine on the GeForce2. However, the GeForce2 Ultra runs at a higher clock speed than the GeForce3—250 versus 200MHz, respectively. Thus, the Ultra pumps out more polys in these lighting tests.

Were these true vertex shader tests, the rest of the cards might not even be able to complete them. If they did, however, they’d have to use the CPU for vertex manipulation, and performance would drop through the floor. Even 3DMark’s vertex shader test doesn’t use real vertex programs.

It’s no great surprise that 3DMark doesn’t use the vertex shaders’ full capabilities. If it did, the graph above would probably have only one bar on it.

 

Bump mapping
One area where we can compare these cards is bump mapping. All of the cards except the Voodoo 5 can handle dot-product bump mapping. Only the Radeon and GeForce3 can handle environment-mapped bump mapping.

As you can see, the GeForce3 is significantly quicker rendering bump-mapped scenes than the competition.

 

Game tests
Finally, 3DMark 2001 has its three gaming tests. They generally pack in a lot more polygons than today’s games, even in “low detail” mode.

In every case, the GeForce3 is faster than all the rest. These tests aren’t representative of current games; they have way too much polygon detail for that. But they may be a good indicator of games coming down the pike in six to twelve months.

 

Evolva rolling demo
Evolva gives us another chance to look at bump mapping performance. Specifically, this rolling demo of Evolva uses dot-3 bump mapping. Once again, the Voodoo 5 can’t play in our reindeer games.

Clearly, there’s much less of a performance penalty for dot-3 bump mapping using the GeForce3’s pixel shaders.

Vulpine GLMark
This test offers us the chance to use OpenGL to access some of the GeForce3’s new features. All of the cards succeeded in running the test with “Advanced OpenGL features” enabled except—you guessed it—the Voodoo 5. Additionally, we tested the GeForce3 with some advanced, GeForce3-only features (basically, really cool looking water) enabled, to see how that affected its performance.

The GeForce3 comes out on top once more, taking very little performance hit with its extra features enabled. Such a performance can only encourage developers to take fuller advantage of the GeForce3’s abilities.

 

SPECviewperf
Next, let’s see how these cards handle workstation-class OpenGL apps. These cards aren’t really aimed at the workstation market (although NVIDIA’s workstation-class Quadro cards are extremely similar to its GeForce cards). Nonetheless, a good consumer card with a solid OpenGL driver ought to be able to handle occasional duty with workstation apps. The viewperf suite employs a number of high-end 3D applications to test such things.

Looking at the results, it’s easy to see NVIDIA’s cards have benefited from their close relationship to the Quadro line and from NVIDIA’s overall excellence in driver development. The GeForce3 and the Ultra trade off for the top spot, while the Radeon consistently comes in fourth. Perennially last is the Voodoo 5, which actually completed the DRV and DX tests, but no score is displayed above because viewperf recorded the scores as zero. That’s no great surprise, since it took long, overnight sessions for the V5 to complete its viewperf test runs. I’m just shocked the thing didn’t crash.

 

Antialiasing
More silly claims and nonsense have flown around the web over antialiasing than just about any other 3D chip feature. Perhaps because 3dfx’s death throes included a series of giant flails over the Voodoo 5’s antialiasing method, AA became an unusually emotional topic. Whatever the case, antialiasing is a nifty feature, but it’s not something most of us have grown to use with any regularity. The GeForce3’s innovations might help change that.

The methods
There were two basic methods of AA on the scene before the GeForce3, and they were actually very similar. The GeForce3’s approach to AA is markedly different. These three approaches to AA break down like this:

  • Ordered-grid supersampling (OGSS) — This approach is employed by GeForce/GeForce2, Radeon, and Kyro/Kyro II cards. The 3D scene is rendered at a higher resolution than the intended output resolution, and then scaled down. For instance, to produce a scene using 4X OGSS at 640×480 resolution, the card would render the scene at 1280×960, and then scale down the result to 640×480. Thus, the color of each pixel on the screen would be determined by averaging the color values for four sub-pixels. This is a very basic approach to antialiasing, and given the right sample size, it can be fairly effective. OGSS can potentially affect almost every pixel on the screen, cutting down on “jaggies” at edges and helping improve texture quality on surfaces.
  • Rotated grid supersampling (RGSS) — 3dfx’s approach with the Voodoo 5. This hybrid approach is a modified version of OGSS. Rather than doing its supersampling on an ordered grid, the Voodoo 5 renders its supersample images on a grid with a slight tilt angle to it. In the case of 4X FSAA, the samples are “jittered” from one sample to the next in a diamond-shaped pattern. This slightly “off” pattern doesn’t correspond so neatly to the display’s grid of pixels, making it harder for the eye to recognize aliasing patterns.

    At a given sample size, RGSS is likely to produce results slightly superior to OGSS. However, sample size matters more than the difference between these two techniques. 4X OGSS almost always looks superior to 2X RGSS, no matter what the 3dfx fanboys would have you believe.

  • Multisampling — This is the GeForce3’s AA method. Like the Voodoo 5, the GeForce3’s sampling pattern follows a rotated grid—but only in 2X AA mode. In 4X mode, the GeForce3 uses an ordered grid. Either way, though, the GeForce3 takes a different approach to sampling called multisampling.

    You may have noticed that I’m not talking about FSAA here—just AA. That’s because multisampling isn’t full-scene antialiasing like the two techniques I’ve described above. Instead, it does edge antialiasing. Take 2X antialiasing, for instance. The GeForce3 captures two versions of the scene to be rendered, with samples distributed according to a rotated grid pattern, as you’d expect. But it doesn’t do a texture read for both of those two sub-pixels. Instead, the chip does a single texture read for the first sub-pixel and uses that value to determine the color of both sub-pixels. So you’ve saved a little bit by avoiding an extra texture read, although the memory bandwidth requirements for storing the two sub-pixels are the same as with supersampling.

    Here’s the trick with multisampling: If a pixel is on the edge of a polygon, its sub-pixels probably “belong” to different polygons which comprise different objects in a scene—a lighter object in the foreground, say, and a darker one in the background. If so, the sub-pixels will have different color values—not because of multiple texture reads per pixel, but because one pixel sits in front of another due to overdraw. If this is the case, the GeForce3 will perform a blend on the colors of the sub-pixels to determine the final color of the pixel. Thus, the chip effectively antialiases polygon edges, but not surfaces or the textures mapped to them.

    Clear as mud?

    Multisampling and AA theory are difficult subjects, and I’m afraid I can’t do them justice here. For more on multisampling, see Dave Barron’s helpful article on the subject (though it’s not about the GeForce3’s particular implementation). Also, see Tom P. Monkish’s excellent article on AA theory.

The long and the short of it is this: all three techniques are reasonably effective at antialiasing. Multisampling is a little more efficient than supersampling, but supersampling provides more texture clarity.

Now that we’ve got all that theory under our belts, I’ll show you a couple of images that help demonstrate what antialiasing is doing. These images are enlarged 400%, so each pixel in the original image now takes up four. I’ve blown them up to show exactly what’s going on. What I’ve done is steal a page from that article by Tom Monkish I’ve linked above. I’ve taken a non-AA image and a 4X AA image, both from the GeForce3, and performed a DIFF operation in an image processing program. The resulting image shows visually the differences between regular images and antialiased images. You can see how the GeForce3’s AA method affects primarily edges. The brighter the pixel, the starker the difference between the two source images.

The difference between no AA and 4X AA

To make things a little easier to see, here’s a negative of the same image.

A negative of the image above

 

Quincunx
As if all of that weren’t complex enough, NVIDIA adds a new twist to the AA scene with the GeForce3’s “Quincunx” AA mode. [Note: Before I go on, I should pause to crack a joke about the name Quincunx. It’s almost impossible to resist. Especially that one Forge came up with a while back. Instead, however, Wumpus would prefer that I remind you “quincunx” is, in fact, a real word. It describes the pattern of dots found on the fifth side of a six-sided die, and that same pattern is used by the GeForce3’s Quincunx AA mode.] NVIDIA claims Quincunx offers near-4X quality antialiasing at near-2X speeds. Given the GeForce3’s current performance levels, an AA mode somewhere between 2X and 4X is an attractive prospect. The right combination of image quality and playability has proven an elusive target for the first generation of AA-capable cards.

Quincunx uses a combination of 2X multi-sampled AA and what NVIDIA calls a “reconstruction filter.” Poke around for “quincunx” and “reconstruction filter,” and you’ll find lots of esoteric information about wavelet compression theory, usually involving motion video. When it comes to compression, the term “reconstruction filter” is apt, because some of the detail in the original image has been lost in the compression process. When it comes to antialiasing, the term isn’t quite right.

The key to the Quincunx sampling pattern is this reconstruction filter. To achieve a sampling pattern that looks like the side of a die with five dots, the GeForce3 takes two samples like it would during 2X multi-sampled AA, and then grabs three more samples from adjacent pixels. Thus the sampling pattern looks like so:

The Quincunx pattern: 2X multisampling plus three samples from three adjacent pixels

Notably, the GeForce3 does not grab those three additional samples at a sub-pixel level. The reconstruction filter is applied after the 2X multi-sampled image is generated. Quincunx’s three extra samples require no additional memory; its memory requirements match those of 2X AA mode. The reconstruction filter simply looks at nearby pixels and smoothes out the color of the pixel on which it’s operating.

Of course, we have a word for such things: blurring. The GeForce3’s reconstruction filter is just like a Photoshop filter that blurs an image ever so slightly. While this blurring effect may reduce aliasing, it does so at the expense of image clarity and precision. In practice, users may find Quincunx helpful, but personally, I’m not a big fan. I find it hard to tell the difference between Quincunx and 2X AA, and I dislike blurry text.

 

The riddle of the Quincunx
There is one more mystery to Quincunx. The reconstruction filter sounds an awfully lot like the post-filter 3dfx used in its Voodoo-series cards to reduce dithering artifacts in 16-bit color mode. (This was the source of the great “22-bit color” debate.) Both 3dfx’s post-filter and Quincunx’s reconstruction filter blur the output image in order to cut down on artifacts. 3dfx’s filter made taking screenshots difficult, because it altered images at the RAMDAC, after the image left the card’s frame buffer. Screenshots—generally pulled out of the frame buffer—couldn’t capture the effect of this filter.

What I’m wondering is whether NVIDIA’s reconstruction filter isn’t a similar, RAMDAC-based mechanism. I’ve had a devil of a time telling the difference between screenshots taken with Quincunx and those taken in 2X AA mode. Have a look for yourself:

GeForce3, 2X AA

GeForce3, Quincunx AA

Now, it’s possible 3DMark’s image quality test, which generated these screenshots, isn’t doing something right. I wrote MadOnion about the problem but received no reply. I also tried using Quake III Arena screenshots to produce differences between 2X AA and Quincunx mode, to no avail—the images were identical. Here are some official Quincunx images from NVIDIA demonstrating the GeForce3’s various AA modes:

The GeForce3’s AA modes compared

Could it be that the reconstruction filter is not applied to the frame buffer? I’ve got a query in to NVIDIA, and I’m hoping to find out. It’s not a huge difference in the real-world performance of the chip either way, but it is an interesting question.

 

Non-AA reference images
Enough talking about AA theory. The next few pages will show you exactly how the antialiasing output of the cards we’ve tested compares.

Voodoo 5, no AA

Radeon, no AA

GeForce2, no AA

GeForce3, no AA

 

2X antialiasing

Voodoo 5, 2X AA

Radeon, 2X AA

GeForce2, 2X AA

GeForce3, 2X AA

GeForce3, Quincunx AA

 

4X antialiasing

Voodoo 5, 4X AA

Radeon, 4X AA

GeForce2, 4X AA

GeForce3, 4X AA

 

Antialiasing performance
At long last, we’re ready to compare AA performance. As you’d expect, AA performance is about two things: the efficiency of the card’s AA method, and the card’s capacity for fill rate/memory bandwidth. Here’s how our contenders stack up…

In 2X AA mode, the GeForce3 runs well ahead of the pack. The Voodoo 5 finally climbs out of the cellar in one of our tests, staying just long enough to remind everyone who was first with this AA stuff.

In 4X mode, the GeForce 3 is again fastest, turning in an impressive 130 fps at 640×480. Once we reach 1024×768, though, Quincunx is the only playable AA mode on any card. It’s not really fair to stick that bar on a graph and compare it against true 4X AA modes, but hey, I’ve spent enough time in Excel already.

 

Aniso.. err, texture filtering
You might be wondering why I didn’t get more excited about losing texture clarity when moving from supersampling on older cards to multisampling on the GeForce3. This is why. The GeForce3 is capable of a much more potent form of texture antialiasing called anisotropic filtering.

Anisotropic filtering is hard to spell.

It’s not so hard to get your head around, fortunately. Anisotropic filtering improves the fidelity of textures mapped to the various surfaces in a scene. Textures mapped to surfaces that move away from one’s viewpoint at steep angles—the floor in a first-person shooter, for instance—need extra help in order to avoid all kind of ugliness: texture sparkling, jaggies, loss of detail, and general fuzziness. If you’ve used a 3D card, you’ve probably seen it. Anisotropic filtering is a much more sophisticated method of minimizing texture aliasing than the supersampling AA techniques we discussed above. It takes more samples, and it takes them directly from source textures, as needed.

Just as common full-scene and edge antialiasing techniques depend on sample size, anisotropic filtering is affected by the degree of anisotropy involved. The GeForce3 is capable of high degrees of anisotropy, up to 8X. The only other card with similar capabilities is the Radeon, but unlike the Radeon, the GeForce3 can properly handle trilinear filtering and anisotropic filtering at the same time. Also, the Radeon’s anisotropic filtering implementation is rather quirky, as we’ve noted in the past. By comparison, the GeForce3’s anisotropic filtering is superior, and it looks great. Below is a screenshot comparing basic trilinear filtering to 8X anisotropic filtering plus trilinear. The difference in clarity is striking.

There is a fill rate penalty for anisotropic filtering, however, just as there’s overhead for other AA methods. Let’s look at performance.

Once we reach higher resolutions and higher degrees of anisotropy, things start to slow down fast. Still, I’m hooked on the GeForce3’s anisotropic filtering, and I’m tempted to leave it on despite the performance penalty. Combined with multisampled antialiasing, the image quality can be amazing.

 

Conclusions
You’ve probably gathered by now that I’m impressed with the technology in the GeForce3. At the end of the day, I have very few qualms about this product. It’s easily best in its class, and it advances the state of the art in innovative ways. I did have a few problems making the NVIDIA reference design test card work with my Via KT133A motherboard, as I mentioned in a frustrated news post here not long ago, but I don’t blame the GeForce3 for that. (And I did eventually get it working.)

The tough question with this card is whether it’s a good value for consumers. I’m not going to pretend I can make an objective call on this one. GeForce3 cards cost between US$359 and $450 as I write. Perfectly acceptable, speedy GeForce2 Pro cards are selling for about a third of that. But you have to consider two things.

First, the GeForce3’s vertex and pixel shaders will make all previous 3D cards obsolete. Once applications are widely available to take full advantage of the GeForce3’s new capabilities, owning a GeForce2 will be akin to owning a video card without any hardware acceleration at all. This we know. That day may not come soon, but once it does, those folks who saw fit to pony up now for a GeForce3 will be pleased to see that their video card still has some spark left.

Second, we know how that day will come. It will come gradually, in fits and starts, as new Xbox ports and other DirectX 8 and advanced OpenGL applications trickle out into the market. You will probably be asking yourself at what point waiting for prices to drop further, or for other new technologies to hit the market, will no longer be worth the agony of waiting. You may have the experience of visiting a friend’s house to check out his new video card and losing all bladder control over a demo.

How humiliating.

You may then be wondering how many games, apps, or demos it will take before you can justify throwing almost $400, or whatever’s the going rate, at a video card. For some of us—real PC enthusiasts who are mad about graphics—that day is already here.  

Comments closed
    • Anonymous
    • 18 years ago

    thanks for putting into english a lot if tech stuff that I could never take the time to figure out.
    I’m not a computer ace. Now I know what it is to tweek my setting for quake and know why.

    • Anonymous
    • 18 years ago

    While the author of this article may be biast, he certainly backs up his opinions well. I wasn’t left with any questions at the end and I was quite pleased with the easy to understand explanations of the various features. Well done.

    • Alanzilla
    • 18 years ago

    I wish there was a make-the-gerbils-hidden mode. I think I’d like that almost as much as a b[

    • TwoFer
    • 18 years ago

    What I think is that I don’t feed the trolls.

    • Anonymous
    • 18 years ago

    I’m pretty sure TwoFer *meant it* when he called you a troll πŸ˜›

    • Anonymous
    • 18 years ago

    Politic Shmolitic. I don’t kiss ass, or write up Darth Via articles constantly to get hits. TwoFer, get a life, and then maybe a personality, and then maybe, just *maybe*, the balls to say what you think.

    • Forge
    • 18 years ago

    AG #5 – If you’d have been a little more politic, I could’ve told you how to get one for free, like I did. πŸ™‚

    • Anonymous
    • 18 years ago

    *[

    • TwoFer
    • 18 years ago

    Troll.

    • Anonymous
    • 18 years ago

    anyone who buys a geforce 3 now, which is the cost of a complete cheap computer, is nuts.

    Thank you for your time.

    • TwoFer
    • 18 years ago

    …but all the regulars remember, Forge.

    Whatcha need credit for? You’re already -[

    • Forge
    • 18 years ago

    Alanzilla wins the TR Scrabble competition. I, however, lose all the credit I got for my Quadcox (trademark pending) line.

    • Alanzilla
    • 18 years ago

    134 points!

    • Forge
    • 18 years ago

    Damn. The whole niggardly Quadcox discussion is gone.

    πŸ™

    I hadn’t even gotten around to talking about smoking fags yet.

Pin It on Pinterest

Share This