ARE YOU ready? The era of cinematic graphics is upon us.
Err, well, it will be soon.
Or, uhm, maybe not really, but we can show you how it might have looked.
You see, we have our hot little hands on NVIDIA’s NV30 chip, better known as the GeForce FX 5800 Ultra, that was intended to usher in a new era of movie-quality graphics when it debuted some time last year. Instead, it’s months into 2003, and GeForce FX cards still aren’t widely available.
Let’s just say NVIDIA’s had a wee problem with its vaunted execution. That fact hurts even more because NVIDIA’s rival, ATI, hasn’t. The Radeon 9700 debuted last fall to deservedly positive reviews. The 9700 lived up to the high expectations we’d set for this new generation of DirectX 9-class graphics chips, delivering amazing new shiny objects and lots of rippling glowy things onscreen. Err, delivering true floating-point color formats and amazing amounts of real-time graphics processing power.
But the story of the GeForce FX 5800 Ultra isn’t just the story of a late semiconductor. No, it’s also the story of an impressively “overclocked” piece of silicon with a massive, Dustbuster-like appendage strapped to its side. Most intriguingly, it’s the story of a product that may never reach store shelves in any kind of volume, because all signs point to its premature demise.
That said, the GeForce FX 5800 Ultra is still interesting as heck. I’ve managed to finagle one for review, so buckle up, and we’ll see what this puppy can do.
What’s all this talk about cinematic rendering?
Let’s start right off with the graphics egghead stuff, so we can fit the GeForce FX into the proper context.
The GeForce FX 5800 Ultra card is powered by the new NVIDIA GPU, widely known by its code name, NV30, during its development. The NV30 is NVIDIA’s first attempt at the generation of graphics chips capable of a whole range of features anticipated by the specifications for Microsoft’s DirectX 9 software layer.
The single most important advance in this new generation of graphics chips is a rich, new range of datatypes available to represent graphics dataespecially pixels and textures. This capability is the crux of NVIDIA’s marketing push for “cinematic graphics.” By adding the ability to represent graphics data with more precision, chips like NV30 and ATI’s R300 series can render images in real time (or something close to it) nearly as compelling as those produced by movie studios. Now, the GeForce FX may not be ready to replace banks and banks of high-powered computers running in professional render farms just yet, but you might be surprised at how close it comes.
I’ve written more about this new generation of graphics chips and what it means for the world right here. Go read up if you want to understand how such things are possible.
Along with richer datatypes, this new generation of GPUs offers more general programmability, which makes them much more powerful computational devices. Contemporary GPUs have two primary computational units, vertex shaders and pixel shaders. Vertex shaders handle manipulation and lighting (shading) of sets of coordinates in 3D space. Vertex shader programs can govern movements of models and objects, creating realistic bouncing and flexing motions as a CG dinosaur tromps across a scene. Pixel shaders, on the other hand, apply lighting and shading effects algorithms to sets of pixels. Pixel shader programs can produce sophisticated per-pixel lighting effects and generate mind-bending effects like reflections, refractions, and bump mapping. DX9 pixel and vertex shaders incorporate more CPU-like provisions for program execution, including new vector and scalar instructions and basic flow control for subroutines.
Of course, DirectX 9-class pixel shaders include support for floating-point datatypes, as well.
By offering more flexibility and power to manipulate data, vertex and pixel shaders amplify the graphics power of a GPU. Some fairly recent theoretical insights in graphics have shown us that traditional, less-programmable graphics chips could render just about any scene given rich datatypes and enough rendering passes. This realization has led to the development of high-level shading languages like Microsoft’s HLSL and NVIDIA’s Cg. These shading languages can compile complex graphics operations into multiple rendering passes using Direct X or OpenGL instructions. Programmable hardware shaders can produce equivalent results in fewer rendering passes, bringing the horizon for real-time cinematic rendering closer.
So that is, in a nutshell, where all this fancy-pants talk about cinematic rendering comes from. Built on a 0.13-micron manufacturing process with over 125 million transistors, NVIDIA’s GeForce FX is a powerful graphics rendering engine capable of doings things of which previous graphics chips could only dream. The leap from GeForce4 to GeForce FX is as large a leap forward as we’ve seen since 3D graphics has been driven by custom chips.
The big, green, shiny fly in the ointment, however, is ATI’s Radeon R300 series of chips, which brought similar graphics capabilities to the mass market last September. In the following pages, we’ll review the capabilities of NVIDIA’s GeForce FX with an eye toward its competition. We’ll break it down into key areas like pixel-pushing power, vertex and pixel shaders, texture and edge antialiasing, and performance in real-world games to see how the GFFX stacks up.
Cinematic graphics: Lubed up and a bit chilly
ATI counters the nymph with a chimp. Canny.
Before we dig into the GPU performance stuff, though, we should stop and talk about the GeForce FX 5800 Ultra card itself. Words can’t do it justice, so let’s just skip to the pictures:
BFG Tech’s Asylum GeForce FX 5800 Ultra
The passive cooler on the back side of the card is no slouch, either
The Radeon 9800 is smaller in every dimension than the GeForce FX 5800 Ultra It’s big. Double-stacked big. Like an indulgent Oreo for the obese, it takes up twice as much room as a standard AGP card, encroaching on the nearest PCI slot to get room for its cooler.
You may not be able to tell this from the picture, but the GeForce FX card is also heavy. The card’s cooler has more copper in it than a roll of pennies, and using my highly scientific “hold one in each hand” method, I’ve determined the GeForce FX card weighs about the same as three Radeon 9700 cards.
The most infamous feature of the GFFX 5800 Ultra card, though, has to be its Dustbuster cooler. (I coined the term in reference to the FX cooler in this article, thanks very much.) NVIDIA’s “FX Flow” cooler design is reminiscent of the cooler on Abit’s OTES card, which we reviewed a while back. A heat pipe design pulls heat from the surface of the GPU and into the copper fins. The FX Flow’s blower pulls air into the PC case through the lower set of vents, then pushes the air out over the copper fins you see in the upper chamber of its plastic case.
Incidentally, this card and cooling design is not the work of BFG Tech, even though we’re looking at their card. All GeForce FX 5800 Ultra cards are essentially the same, because for this product, NVIDIA generally is supplying its partners with complete cards rather than chips. Likewise, the cooler you see on our BFG Tech card is NVIDIA’s reference cooler with an Asylum sticker slapped on the side. Other NVIDIA partners have cooked up alternative cooler designs for the FX 5800 Ultra, but most cards will probably be similar to this one.
For general non-3D applications, the FX Flow blower on our BFG Tech card doesn’t run at all, leaving the cooling job to the card’s massive copper heat sink apparatus. The copper heatsinks on the card get too hot to touch, even when the PC is sitting in power-save mode, but the fan stays silent, and all is peaceful. Once we kick off a 3D game or application, the blower spins up to speed, emitting a high-pitched whine that elicits flashbacks to my days on the flight deck. And I was never even in the service. After the 3D application ends, the blower will generally spin right back down to a stop. Sometimes it stays on for a while afterwards to bring the chip’s temperature into check.
To give you an idea how loud it is, I used a digital sound level meter (Extech model 407727) to measure the sound of our test system with the GeForce FX and with a Radeon 9800 Pro. The meter was mounted on a tripod about two feet away from our test system, whose only other real noisemakers are a bog-standard AMD Athlon XP retail cooler and a Codegen PSU with a fairly quiet fan inside. (The hard drive is a Maxtor DiamondMax D740X.) I tested noise levels at the Windows desktop and running Unreal 2003. Here are the results:
This thing is loud. Decibels come on a logarithmic scale, so the numeric difference you see here may not capture the difference in noise levels adequately.
Let me be brutally honest here. I hate this cooler. It’s louder than a bright plaid leisure suit, and the white noise repeatedly lulled me to sleep as I was trying complete this review. I didn’t like Abit’s original OTES card, and I don’t like NVIDIA’s expensive knock-off of it any better. Perhaps I’m just not hard-core enough about this stuffI don’t like high-CFM CPU coolers, eitherbut this thing isn’t for me. You will want to think long and hard before you decide that you can live with one of these cards in your PC.
Now, let’s get on with testing this thing in action.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test system was configured like so:
|Processor||Athlon XP ‘Thoroughbred’ 2600+ 2.083GHz|
|Front-side bus||333MHz (166MHz DDR)|
|Motherboard||Asus A7N8X Deluxe|
|North bridge||nForce2 SPP|
|South bridge||nForce2 MCP-T|
|Memory size||512MB (2 DIMMs)|
|Memory type||Corsair XMS3200 PC2700 DDR SDRAM (333MHz)|
|Storage||Maxtor DiamondMax Plus D740X 7200RPM ATA/100 hard drive|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 1, DirectX 9.0|
The test system’s Windows desktop was set at 1024×768 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used Catalyst revision 7.84 drivers for the Radeon 9800 Pro, and ATI cards and Catalyst 3.1 (7.83) drivers for the 9700 Pro. We used NVIDIA’s 42.68 drivers for the GeForce4 Ti 4600, and we used the brand-spankin’-new 43.45 drivers for the GeForce FX 5800 Ultra.
We used the following versions of our test applications:
- FutureMark 3DMark 2001 SE Build 330
- FutureMark 3DMark03
- Codecreatures Benchmark Pro
- Comanche 4 demo benchmark
- Quake III Arena v1.31
- Serious Sam SE v1.07
- VillageMark v1.17
- Unreal Tournament 2003 with 2199 patch
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Even today, pixel-pushing power is still one of the key determinants of the performance of a graphics chip. Getting a handle on the GeForce FX 5800 Ultra’s fill rate performance, however, is more than a little slippery. Let’s start by looking at our chip table, and then I’ll tell you why it’s not exactly right.
|Core clock (MHz)||Pixel pipelines||Peak fill rate (Mpixels/s)||Texture units per pixel pipeline||Peak fill rate (Mtexels/s)||Memory clock (MHz)||Memory bus width (bits)||Peak memory bandwidth (GB/s)|
|GeForce4 Ti 4600||
|GeForce FX 5800||400||4||1600||2||3200||800||128||12.8|
|GeForce FX 5800 Ultra||500||4||2000||2||4000||1000||128||16.0|
|Radeon 9700 Pro||325||8||2600||1||2600||620||256||19.8|
|Radeon 9800 Pro||380||8||3040||1||3040||680||256||21.8|
The GeForce FX 5800 Ultra runs at much higher clock rates than its competition, and its DDR-II memory does, too. NVIDIA chose a cutting-edge approach to developing the GeForce FX, relying on newer technologies and higher clock rates to deliver performance. The chip’s 500MHz core clock speed gives it relatively high pixel and texel fill rates. The NV30 has four independent memory controllers in a crossbar arrangement, which is essentially the same as the GeForce3 and GeForce4 Ti chips, with the exception that the NV30 been tweaked to support DDR-II-style signaling. Its memory bus is only 128 bits wide, but 1GHz DDR-II memory gives the GXFX 5800 Ultra memory throughput of 16GB/s.
ATI has taken different approach with the Radeon 9700 and 9800 series, settling for lower clock rates but getting more work done each clock cycle. The high-end ATI chips have 256-bit-wide memory interfaces (with four 64-bit memory controllers), which give them more memory bandwidth than the GeForce FX 5800 cards, even with conventional DDR memory and the concomitant lower memory clock speeds.
Now, here’s why the above isn’t quite right. Our usual assumptions about graphics chips pipelines don’t entirely apply to the GeForce FX. NVIDIA is very coy about exactly how the NV30 GPU looks inside. For quite a while, most of the world believed NV30 was an 8-pipeline design with one texture unit per pipe. Turns out that isn’t so. Instead, the FX 5800 Ultra is… well, complicated. Lately, the company’s representatives have taken to talking about arrays of functional units instead of pixel pipelines. It’s sometimes hard to penetrate.
When asked, NVIDIA explains the NV30’s capabilities like so:
8 z pixels per clock
8 stencil ops per clock
8 textures per clock
8 shader ops per clock
4 color + z pixels per clock with 4x multisampling enabled
It is architected to perform those functions.
Basically, its 8 pipes with the exception of color blenders for traditional ROP operations, for which it has hardware to do 4 pixels per clock for color & Z. It is that it has 8 “full” pipes that can blend 4 pixels per clock with color.
Now, the phrase “color + Z pixels” in there is key for our discussion, because that’s generally the kind of pixels most current 3D applications are rendering. That’s your standard pixel, with a color value, situated in 3D space. When doing this sort of conventional rendering, the NV30 can produce four pixels per clock cycle with up to two textures applied to each.
This configuration gives the NV30 a bit of a disadvantage next to ATI’s R3x0 series in terms of single-textured fill rate. Our table above reflects that difference, and it’s generally correct so far.
However, the NV30 can do certain types of operations, including stencil ops, at 8 pixels per clock. This ability makes the NV30 more formidable than a straight-up “4 x 2” pixel pipeline specification might indicate. NVIDIA claims the rendering and shadowing techniques used in upcoming games like Doom III will take particular advantage of the NV30’s eight-pipe capabilities.
Of course, we can measure these things. Here are the scores from 3DMark2001’s fill rate test, which is a simple test of traditional “color + Z” rendering.
The GeForce FX turns in performance more or less like we’d expect given the numbers on our chip table above, very much like a “4 x 2” design. Its single-textured fill rate is lower than the Radeons, but its multi-textured fill rate is second to none.
These numbers bode well for the GeForce FX in scenarios where games and applications apply at least two textures per pixel. However, some newer games are substituting pixel shader effects for additional textures, so the underlying strength of ATI’s true eight-pipe approach remains formidable.
In situations where applications want to apply more than one or two textures to a pixel, both the NV30 and the R300 series chips can “loop back” pixels through their pipelines in order to apply more textures. Both chips can apply up to 16 textures without resorting to multipass rendering.
The primary constraint for fill rate is sometimes memory bandwidth rather than GPU processing power, and memory bandwidth bottlenecks are more of a concern than ever when dealing with 64-bit and 128-bit floating-point color modes. In cases where memory bandwidth is a primary limitation, the Radeon 9700 and 9800 cards will have the advantage.
Then again, modern GPUs employ a whole range of tricks to make better use of their fill rate and memory bandwidth, including compression of various types of graphics data. Both the NV30 and its R3x0-series competition can compress Z data (depth information on pixels) and color (framebuffer) data using lossless algorithms. Color compression is new in this generation of chips, and it’s most useful when edge antialiasing techniques are active, where multiple samples are often the same color. As I understand it, the NV30’s color compression is always active, while ATI only turns on color compression with antialiasing.
NVIDIA doesn’t talk much about it, but I believe the NV30 employs an “Early Z” occlusion detection algorithm, like the R3x0 chips, to reduce the possibility the chip will render a pixel that would be situated behind another pixeland thus not visiblein the final scene. With fancy shader programs in the mix, pixels become more expensive to render, so eliminating occluded pixels up front becomes a higher priority.
All of these methods of bandwidth conservation improve efficiency, and if implemented well, they offer the NV30 some hope of outperforming the ATI chips, even with less memory bandwidth. We’ll use VillageMark to test these chips’ fill rate and occlusion detection capabilities, both with and without antialiasing enabled.
Even with less memory bandwidth, the GFFX 5800 Ultra matches up well against the ATI cards, leading the pack in the non-AA tests. The ATI cards lead when 4X edge antialiasing and 8X anisotropic filtering are enabled, but even then, the GeForce FX runs remarkably close to the Radeon 9700.
The NV30’s pixel shaders can execute shader programs with as many as 1024 instructions in a single rendering pass with dynamic branching and looping. This ability should allow the NV30 to handle complex shader effects with grace. The chip’s limits are quite a bit higher than the 64-instruction limit imposed by DirectX 9’s pixel shader 2.0 specification.
By contrast, the Radeon 9700 Pro conforms largely to the PS 2.0 specification; it can execute a maximum 64 instructions per pass, with few exceptions. More complex shader programs will only be possible on the 9700 with the aid of a high-level shading language, which can break down effects into multiple rendering passes. The overhead associated with multiple rendering passes can harm performance, as well.
ATI addressed this limitation in the R350 chip by adding an “F-buffer” that stores intermediate pixel fragment values between passes through the pixel shaders; no trip through the rest of the graphics pipeline is required. With the F-buffer, the R350 can execute pixel shader programs of any length. You can read more about it in our Radeon 9800 Pro review.
We’ll have to see whether NVIDIA’s approach is superior to ATI’s when we have tests available to us that compile to our test hardware from high-level shading languages. Currently, the benchmarks available to us are limited to basic DX8 and DX9 shader variants, from 1.1 to 2.0.
Differences between ATI’s and NVIDIA’s pixel shader implementations add more complexity to the task of comparing the two companies’ chips. The NV30’s pixel shaders are composed of an array of arithmetic logic units, and the bit depth of pixel shader data affects clock-for-clock performance. In order to balance performance versus precision, NV30 offers support for two floating-point color bit depths, 16 bits per color channel (or 64 bits total) and 32 bits per color channel (or 128 bits total). ATI split the difference between the two; the R3x0-series pixel shaders process data at 24 bits per color channel, or 96 bits total, even when using 128-bit framebuffer modes.
NVIDIA’s approach offers developers more flexibility. They can choose higher-precision datatypes when needed, and they can fall back to 64-bit color when it is sufficient. However, ATI’s 96-bit compromise isn’t necessarily a bad one for this first generation of chips with high-color capabilities. The 96-bit limitation will probably be more of a disadvantage in professional rendering applications where the R3x0 chips will live on FireGL cards instead of Radeons.
I should mention one more thing about the GeForce FX’s pixel shaders. All of this funky talk about “arrays of computational units” got me wondering whether the FX architecture doesn’t share computational resources between its vertex and pixel shaders, which is likely to happen as graphics hardware evolves. So I asked NVIDIA, and the answer was straightforward:
The GeForce FX architecture uses separate, dedicated computation units for vertex shading versus pixel shading. The benefit is that there is never a trade-off between vertex horsepower and pixel horsepower.
In future GPUs, pixel and vertex shader instruction sets are likely to merge, and the walls between these two units may begin to dissolve. This change may enable some impressive new vertex-related effects. However, we’re not there yet.
Now, let’s test what we can, which is DX8 and early DX9 pixel shader performance.
Despite its advantage in clock speed, the GeForce FX comes out behind the Radeon 9800 Pro in all of our DirectX 8-class pixel shader tests. The FX decisively outperforms the 9700 Pro in one benchmark, 3DMark 2001’s pixel shader test, but otherwise runs behind both ATI chips.
Our sole DirectX 9-class shader test is 3DMark03, and here we come to a complication. NVIDIA has apparently optimized its 43.45 drivers specifically for 3DMark03, possibly by cutting pixel shader precision from 128 to 64 bitsor maybe less in some cases. (NVIDIA has given itself some cover on this front by raising objections to 3DMark03’s testing methodology.) To illustrate the performance difference, I’ve also tested the GeForce FX with revision 43.00 drivers, which don’t include the 3DMark03-specific optimizations. Both sets of results are presented below.
3DMark03’s pixel shader 2.0 test creates procedural volumetric textures of wood and marble via pixel shader programs. This is but one use of pixel shaders, but it may be common in future apps and games.
With the 43.45 drivers, the GeForce FX is very competitive with the ATI cards. With the 43.00 drivers, the GeForce FX can’t keep up. We’ll explore the issue of 3DMark03 optimizations in more detail below.
From what we can tell, though, the GeForce FX isn’t significantly more powerful than the ATI chips when running DirectX 8 or early DX9-class pixel shader programs. In NVIDIA’s own ChameleonMark, it’s consistently slower than the ATI cards, in fact.
If you were expecting me to tell you the number of vertex shader units the GeForce FX 5800 Ultra has, you probably haven’t been paying attention. ATI has laid it out clearly: the R300 and R350 each have four vertex shader units, each consisting of a 128-bit vector processor and a 32-bit scalar processor. The GeForce FX 5800 Ultra hasyou guessed itan “array” of computational units.
Ah, the disclosure.
Anyhow, finding out the relevant info about the FX isn’t too difficult. We’ll test the cards and see how they perform.
The ATI chips’ vertex shaders appear to be more powerful than the NV30, despite the fact the NV30 runs at a higher clock speed. The FX benefits from specific optimizations in 3DMark03, but not quite enough to catch up with the 9700.
Now let’s test legacy transformation and lighting performance, which is generally provided by a vertex program on newer cards.
The FX excels here, probably because old-school T&L runs on a single vertex shader unit, and the FX’s higher clock speed becomes more of an asset.
Quake III Arena
Now for the gaming tests. We tested with a Quake III demo from a CPL match involving my bud fatal1ty, of course. It’s a longish demo for benchmarking, but it should be a nice test. You can grab the demo from our server here, at least until we find out the thing is copyrighted somehow.
The GeForce FX 5800 Ultra performs very well in Quake III, outpacing the competition ever more decisively as the display resolution increases. With anisotropic filtering and edge AA enabled, the FX’s advantage grows slimmer, though.
Comanche 4 is obviously limited by our test platform rather than by graphics cards when AA and aniso aren’t enabled. However, once we turn up AA and aniso, the GeForce FX 5800 Ultra runs neck-and-neck with the Radeon 9700 Pro. The 9800 Pro, though, is faster at higher resolutions.
Codecreatures Benchmark Pro
The Codecreatures benchmark makes extensive use of DirectX pixel shaders, so it’s an intriguing test for these cards.
Here we see the opposite result of what we saw in Quake III. The Radeon 9800 Pro is fastest without aniso and AA, but the FX pulls way out into the lead once we turn up the eye candy.
Unreal Tournament 2003
Here we have the latest and greatest multiplayer first-person shooter engine. UT2003 uses larger textures and more polygons than older games, and it uses pixel shaders for some effects.
Without aniso and AA, the FX battles the ATI cards for supremacy. The FX does relatively better with UT’s high detail settings than with lower detail.
With AA and aniso, the pattern reverses itself. The FX is competitive at the low-detail settings, but has trouble keeping up at the higher detail level.
Serious Sam SE
In order to test Serious Sam on a more-or-less level playing field, we used Serious Sam’s “Extreme Quality” add-on to set graphics options. For the most part, that will mean the graphics settings are comparable from card to card, with one exception. The Radeon cards are doing 16X anisotropic filtering here, and the NVIDIA cards are at their maximums of 8X aniso. However, with the adaptive aniso algorithm that ATI uses, the difference between 8X aniso and 16X aniso is very minor.
The GeForce FX blows away the Radeon cards here, averaging about 20 fps faster at everything but the lowest resolution.
Serious Sam lets us look at actual performance over time rather than just averages, so let’s see what we can see. One thing to look for: performance dips. In many ways, the quality of the gaming experience is defined by what happens in the worst-case scenarios.
The GeForce FX’s dominance isn’t adequately captured by its average scores. At 1600×1200, the FX is so fast, its performance low-points are higher than the previous-gen GeForce4’s performance peaks.
With 4X AA plus 8X aniso, the FX’s performance lead narrows, but remains intact. Let’s look at the performance over time.
Again, the FX shows impressive strength across the board.
The FX splits the difference with the Radeon 9700 Pro, losing the two lower resolution tests, but making up ground as fill rate requirements rise. The Radeon 9800 Pro, however, is fastest overall.
You can see how the cards perform here in each of the individual 3DMark game tests. The most interesting test is the Nature game, where DX8 pixel shaders play a prominent role. The FX is sandwiched neatly between the Radeon 9700 Pro and 9800 Pro in the Nature test.
Now we’re back to 3DMark03, where we need two sets of drivers in order to test adequately. NVIDIA’s optimizations in the 43.45 drivers affect not only performance, but image quality. I’ll show you what I mean, but let’s look at the performance results first.
The GeForce FX 5800 Ultra is outright fastest in 3DMark03 when using the 43.45 driver, but it’s much slower with 43.00. The individual game test scores make up this composite “3DMark03” score. The results of those individual tests follow.
Obviously, NVIDIA has made considerable performance progress through driver optimizations, particularly in the Game 4 test, Mother Nature, which uses Direct X 9-class pixel shaders. Let me show you why that is.
3DMark03 image quality and driver optimizations
There are several screenshots below taken from frame 1,799 of 3DMark03’s Mother Nature test. These images are fairly low-compression JPEGs, but you can click them to see full-screen versions of the images in lossless PNG format. I’ve performed a gamma correct 1.1 on the JPEG versions to make things more visible in them. The PNG versions are unaltered.
You’ll want to notice several things about the images. The first image, for instance, was produced by the DirectX 9 reference rasterizer. It’s a software-only tool that takes hours to produce this single image, which is useful for comparison’s sake. The next image comes from the Radeon 9800 Pro, and it looks very similar to the output from the reference rasterizer.
The next two images come from the GeForce FX with two different driver revisions: 43.00, which hasn’t been optimized for 3DMark03, and 43.45, which has. The 43.00 image has some corruption on the rock’s edge, right beneath the flowing stream. Also, the butterfly’s shadow is too large and exhibits artifacts in the 43.00 image. 43.45 corrects these problems, but creates a new problem of its own: the dynamic range on the sky simply isn’t what it should be. Somehow, NVIDIA has cut the precision down enough in the process of optimization that the scene doesn’t look as it should.
I asked NVIDIA how they’d optimized for 3DMark03 in their newer drivers, but they didn’t answer me. Had they simply cut all pixel shader precision to 64 bits in floating point, I wouldn’t have objected too strenuously. A case could be made for using 64-bit FP datatypes for this sort of performance test. However, whatever’s going on with the sky in the 43.45 drivers looks like it goes well beyond such a reasonable adjustment.
Next up is antialiasing performance, which we’ve been testing in most of our game tests by turning up 4X AA and 8X aniso. Now we’ll isolate the various AA and texture filtering modes to see how the cards scale with each of them.
These AA tests are intended to show performance scaling, but they are not entirely comprehensive. The GeForce FX includes a broad range of edge antialiasing modes, and we didn’t test all of them. For instance, we skipped NVIDIA’s quirky “Quincunx” mode, which combines 2X AA with a blurring filter. I find that mode essentially useless.
Also, NVIDIA has a couple of AA modes called “4XS” and “6XS” that combine multisampled edge AA with full-screen supersampling. When I tested “4X” AA, I used the usual multisampled mode. Since the FX doesn’t have a straight-up 6X multisampled mode, I compared the FX’s 6XS mode to the Radeon cards in their 6X mode.
Finally, the 43.45 drivers have a slider option for 16X AA, but I found no performance difference between 8X and 16X mode at all. I didn’t include those results, because I couldn’t be sure the 16X mode was functioning properly.
The FX takes a relatively big hit when going from 4X mode to 6XS mode, which is to be expected, since supersampling is much less efficient than multisampling. All in all, though, the FX performs pretty well with edge AA in use. The FX scales much better than the GeForce4 Ti, and it’s comparable to the ATI chips.
There’s much more to say about these chips’ edge antialiasing methods, but I will have to try to address those in more detail in another article. For now, we need to concentrate on another type of antialiasing: texture filtering.
To test texture filtering, I used Quake III Arena’s timedemo function at 1600×1200 resolution with various filtering settings.
The GeForce FX runs Quake III especially well, so it has a bit of advantage here in overall performance. In terms of scaling, there aren’t many surprises, either. The FX takes a little more of a performance hit with trilinear filtering than the GeForce4 does, but it’s so much faster overall, no one will complain.
Incidentally, I used the “Quality” setting in the NVIDIA “Performance & Quality Settings” control panel throughout my performance testing. There are two other options, “Application” and “Performance”. Let’s take a look at what the three settings do.
Below are some low-compression JPEG images from Quake III that show the effects of the various filtering settings. You can click the images to see full-quality PNG versions. I’ve arranged the screenshots from highest quality filtering to lowest, and the difference is very visible in the look of the textures in the scene, especially in the long bridge surface stretching out in front of the viewpoint.
The difference between “Application” and “Quality” is difficult to see with the naked eye, but you can tell by looking at the definition of the seams between the tiles underneath the railgun, if you look closely. (The railgun is in the player’s weapon sights, floating above that little platform there.) The definition of textures is cleaner with “Application,” even right up next to the viewpoint.
I’m not sure how NVIDIA can call the “Performance” option 8X anisotropic filtering. The number of samples here is obviously lower, and there’s visible banding along the Z axis as the eye scans up the platform. The tile seams under the railgun are no longer distinct.
I included this sample of trilinear + bilinear filtering to show you how little difference there is between this filtering mode and the “8X” “Performance” option. Compare this image to the one at the top of the group to see how much difference anisotropic filtering can really make.
Fair warning: if you’re on dial-up, I recommend skipping the next two pages. I’ve embedded PNG images into the page because JPEG Simply Won’t Do, and you dial-up users would do better to just jump past those pages.
The “Application” setting looks gorgeous, with long, smooth gradients between a small number of mip-map levels. The “Quality” option isn’t quite as pretty, and the texture level of detail is a little lower here. Gradients are still smooth, but less so.
The “Performance” option looks very similar to the trilinear + bilinear option, but remarkably, the transitions between mip maps are very sudden. That explains the banding along the Z axis that we saw in our previous set of example shots.
NVIDIA’s “Performance” mode is an interesting beast. Perhaps some folks will find this combination of filtering methods useful if they have GeForce FX 5200 and 5600 cards, but I have a hard time imagining why a GeForce FX 5800 Ultra owner would choose to compromise his card’s image quality in this way.
You’ve probably seen enough of the GeForce FX 5800 Ultra now to draw some of your own conclusions. For me, wrapping up this review is a little odd, because it feels more like a post-mortem than a new product review. NVIDIA won’t confirm it, but the near-universally-accepted rumor is that very few NV30-based products, either the GeForce FX 5800 Ultra or the plain-jane GeForce FX 5800, will ever see store shelves in North America. NVIDIA’s add-in board partners haven’t been receiving supplies of NV30-based cards in any kind of volume, and none I’ve talked to are optimistic about ever receiving them.
Meanwhile, rumors are flying that the follow-on to NV30, the NV35, is up and running in NVIDIA’s labs and coming to market fairly soon. NVIDIA has formed a new manufacturing partnership with IBM, whose semiconductor fabrication technology is top-notch. The NV35 is expected to outperform the current Radeon 9800 Pro without the need for a Dustbuster cooling system.
But I do have a few definite opinions about the GeForce FX 5800 Ultra before it rides off to take its place alongside the 3dfx Voodoo 5 6000 in the Museum of Large Video Cards That Didn’t Quite Make It. This would have been a great product had it arrived six months earlier at this same clock speed with lower heat levels, a more reasonable cooler, and lower prices. As it stands, the GeForce FX 5800 Ultra is not a good deal, and I wouldn’t recommend buying one. Yes, it’s very fast, especially in current games. It’s also loud, expensive, and did I mention loud?
Go get a Radeon 9800 Pro if you want a high-end graphics card. As time passes, I continue to marvel at how much ATI got right with the R300 chip and its derivatives. It simply has very few weaknesses. The technology is sound, and the choices they made in building the product were very smart.
Also, I’m not compelled by NVIDIA’s talk of “arrays of computational units” and the like to describe the NV30. I’m not convinced they’ve changed their way of designing GPUs; I think they’ve mainly just changed their way of talking about them. In doing so, NVIDIA has managed to obscure some basic information about the chip. The NV30 doesn’t offer any uniquely compelling technological advantages over the R300-series GPUs, judging by what we’ve seen, and I’m still waiting to find out how the NV30’s purportedly different design offers capabilities or flexibility unmatched by its competitors.
The NV30’s basic technology will live on across NVIDIA’s product line, from the upcoming NV35 to the dirt-cheap GeForce FX 5200, all of which will have DirectX 9-class rendering abilities. As a foundation for NVIDIA’s lineup, the NV30 isn’t bad. We’re still waiting to test truly intensive next-generation software written with high-level shading languages to see how the NV30 handles those. Indications are, though, that the ATI chips have more powerful pixel and vertex shaders, not just on a clock-for-clock basis, but even with a pronounced clock-speed disadvantage. Whether NVIDIA can compete well with NV30-derived technology will depend on how good those particular implementationsNV31, NV34, and NV35truly are. NV30 technology will give those chips rough feature parity with ATI, but performance is the question.