Even so, the GeForce FX series was relatively slow at pixel shader programs and was behind the curve in some respects. ATI’s R3x0-series chips had more pixel pipelines, better antialiasing, and didn’t require as much tuning in order to achieve optimal performance. More importantly, NVIDIA itself had lost some of its luster as the would-be Intel of the graphics world. The company clearly didn’t enjoy being in second place, and sometimes became evasive or combative about its technology and the issues surrounding it.
But as of today, that’s all ancient history. NVIDIA is back with a new chip, the NV40, produced by a new, crystal-clear set of design principles. The first NV40-based product is the GeForce 6800 Ultra. I’ve been playing with one for the past few days here in Damage Labs, and I’m pleased to report that it’s really, really good. For a better understanding of how and why, let’s look at some of the basic design principles that guided NV40 development.
- Massive parallelism Processing graphics is about drawing pixels, an inherently paralleliziable task. The NV40 is has sixteen real, honest-to-goodness pixel pipelines”no funny business,” as the company put it in one briefing. By contrast, NV30 and its high-end derivatives had a four-pipe design with two texture units per pipe that could, in special cases involving Z and stencil operations, process eight pixels per clock. The NV40 has sixteen pixel pipes with one texture unit per pipe, and in special cases, it can produce 32 pixels per clock. To feed these pipes, the NV40 has six vertex shader units, as well.
All told, NV40 weighs in at 222 million transistors, roughly double the count of an ATI Radeon 9800 GPU and well more than even the largest desktop microprocessor. To give you some context, the most complex desktop CPU is Intel’s Pentium 4 Prescott at “only” 125 million transistors. Somewhat surprisingly, the NV40 chip is fabricated by IBM on a 0.13-micron fabrication process, not by traditional NVIDIA partner TSMC.
By going with a 0.13-micron fab process and sixteen pipes, NVIDIA is obviously banking on its chip architecture, not advances in manufacturing techniques and higher clock speeds, to provide next-generation performance.
- Scalability With sixteen parallel pixel pipes comes scalability, and NVIDIA intends to exploit this characteristic of NV40 by developing a top-to-bottom lineup of products derived from this high-end GPU. They will all share the same features and differ primarily in performance. You can guess how: the lower end products will have fewer pixel pipes and fewer vertex shader units.
Contrast that plan with the reality of NV3x, which NVIDIA admits was difficult to scale from top to bottom. The high-end GeForce FX chips had four pixel pipes with two texture units eacha 4×2 designwhile the mid-range chips were a 4×1 design. Even more oddly, the low-end GeForce FX 5200 was rumored to be an amalgamation of NV3x pixel shaders and fixed-function GeForce2-class technology.
NVIDIA has disavowed the “cascading architectures” approach where older technology generations trickle down to fill the lower rungs of the product line. Developers should soon be able to write applications and games with confidence that the latest features will be supported, in a meaningful way, with decent performance, on a low-end video card.
- More general computational power The NV40 is a more capable general-purpose computing engine than any graphics chip that came before it. The chip supports pixel shader and vertex shader versions 3.0, as defined in Microsoft’s DirectX 9 spec, with support for long instruction programs, looping, branching, and dynamic flow control. Also, NV40 can process data internally with 32 bits of floating-point precision per color channel (red, green, blue, and alpha) with no performance penalty. Combined with the other features of 3.0 shaders, this additional precision should allow developers to employ more advanced rendering techniques with fewer compromises and workarounds.
- More performance per unit of transistors Although GPUs are gaining more general programmability, this trend stands in tension with the usual mission of graphics chips, which has been to accelerate graphics functions through custom logic. NVIDIA has attempted to strike a better balance in NV40 between general computing power and custom graphics logic, with the aim of achieving more efficiency and higher overall performance. As a result, NV40’s various functional units are quite flexible, but judiciously include logic to accelerate common graphics functions.
By following these principles, NVIDIA has produced a chip with much higher performance limits than the previous generation of products. Compared to the GeForce FX 5950 Ultra, NVIDIA says the NV40 has two times the geometry processing power, four to eight times the 32-bit floating-point pixel shading power, and four times the occlusion culling performance. The company modestly says this is the biggest single performance leap between product generations in its history. For those of us who are old enough to remember the jump from the Riva 128 to the TNT, or even from the GeForce3 to the GeForce4 Ti, that’s quite a claim to be making. Let’s see if they can back it up.
The GeForce 6800 Ultra reference card is an AGP-native design; there is no bridge chip. Rumor has it a PCI Express-native version of NV40 will be coming when PCI-E motherboards arrive, but this first spin of the chip will fit into current mobos without any extra help.
This card is also quite sight to behold, with yet another custom NVIDIA cooler onboard, and two Molex connectors for auxiliary power.
NVIDIA recommends a power supply rated to at least 480W for the GeForce 6800 Ultra, and the two molex connectors should come from different rails on the power supply, not just a Y cable. For most of us, buying and installing a GeForce 6800 Ultra will require buying and installing a beefy new power supply, as well. (Not that it won’t be worth it.)
Clock speeds for the GeForce 6800 Ultra are 400MHz for the GPU and 550MHz for the 256MB of onboard DDR3 memory (or 1.1GHz once you factor in the double data rate thing.) For the privilege of owning this impressive piece of technology, you can expect to pay about $499.
Although the cooler design on our GeForce 6800 Ultra reference card is a dual-slot affair (that is, it hangs out over the PCI slot adjacent to the AGP slot), NVIDIA does have a single-slot cooler design that it will make available to its board partners. Here’s a picture the company provided:
Expect most GeForce 6800 Ultra boards to occupy two slots, at least initially. However, judging by how much it burns my hand when I touch it, the 6800 Ultra runs quite a bit cooler than the downright sizzlin’ GeForce FX 5950 Ultra. I wouldn’t be shocked to see single-slot designs come back into favor amongst NVIDIA card makers. Then again, the dual-slot cooler runs nice and quiet, and card makers may not want to sacrifice peace for an extra PCI slot.
There will be a non-Ultra version of the GeForce 6800 available before long at $299. That product will be a single slot design with a single Molex connector. However, it will have only twelve pixel pipes, and its clock speeds haven’t been determined yet.
NVIDIA’s previous generation of cards presented a couple of intriguing problems for benchmarking. Most prominent among those, perhaps, was NVIDIA’s use of an “optimized” method of trilinear filtering, a common texture filtering technique. This “optimized” mode reduces image quality for the sake of additional performance, and over time, it has earned the nickname “brilinear” filtering, because it seems to be a halfway version of real trilinear, with some proximity to bilinear filtering only. Through the course of multiple driver revisions, NVIDIA introduced this new technique, pledged to make it a driver checkbox option, made it a driver checkbox option, turned it on selectively in spite of the driver checkbox setting, and removed the checkbox from the driver.
All of this drama has presented a conundrum for reviewers, because the change really does influence the card’s visual output, if only slightly, and its performance, mildly but perhaps a little less slightly. For many people, some of this stuff must sound like arguing over how many angels can fit on the head of a pin, but we do try to perform apples-to-apples comparisons whenever possible. Fortunately, NVIDIA has provided a checkbox in its NV40 drivers that disables “trilinear optimizations.”
I will show you the visual and performance differences between “brilinear” and trilinear filtering in the course of this review, so you can see what you think of it. For the sake of fairness, we disabled NVIDIA’s trilinear optimizations on NV40 for the bulk of our comparative benchmarks testing. ATI’s Radeon 9800 XT produces very similar image output to the GeForce 6800 Ultra with trilinear optimizations disabled, as you will see.
Unfortunately, NVIDIA’s 60.72 drivers do not provide the option of disabling trilinear optimizations on the GeForce FX 5950 Ultra, so we were unable to test NVIDIA’s previous generation card at the same image quality as its replacement and its primary competitor.
NVIDIA was also kind enough to expose a setting in its driver that allows the user to disable its adaptive anisotropic filtering optimizations by choosing “High quality” image settings. In this case, both ATI and NVIDIA use adaptive aniso, so disabling this optimization wouldn’t really be fair. Also, I spent some time trying to find visual difference between NVIDIA’s adaptive aniso and non-adaptive aniso (using different angles of inclination, looking for mip-map level of detail changes, doing mathematical “diff” operations between screenshots) and frankly, I didn’t find much of anything. I did benchmark the two modes, as you’ll see in our texture filtering section, but I could find no reason not to leave NVIDIA’s adaptive aniso turned on during our tests.
In the end, none of these settings impact performance more than a few percentage points, and the GeForce 6800 Ultra doesn’t need to worry about its handicap versus the previous generation of cards.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test system was configured like so:
|System||MSI K8T Neo|
|Processor||AMD Athlon 64 3400+ 2.2GHz|
|Chipset drivers||4-in-1 v.4.51
|Memory size||1GB (2 DIMMs)|
|Memory type||Corsair TwinX XMS3200LL DDR SDRAM at 400MHz|
|Hard drive||Seagate Barracuda V 120GB SATA 150|
|Audio||Creative SoundBlaster Live!|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 1, DirectX 9.0b|
We used ATI’s CATALYST 4.4 drivers on the Radeon card and ForceWare 60.72 beta 2 on the GeForce cards. One exception: at the request of FutureMark, we used NVIDIA’s 52.16 drivers for all 3DMark benchmarking and image quality tests on the GeForce FX 5950 Ultra.
The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- FutureMark 3DMark03 Build 340
- Far Cry 1.1 with trdemo1.
- Comanche 4 demo
- Quake III Arena v1.31 with trdemo1.dm_67
- Wolfenstein: Enemy Territory with demo0000.dm_82
- Serious Sam SE v1.07 with Demo0003
- Unreal Tournament 2004 with trdemo1.demo4
- Splinter Cell v1.2 with TRKalinatekDemo.bin
- ShaderMark 2.0 build 1e
- rthdribl 1.2
- D3D FSAA Viewer 4
- D3D RightMark beta 4
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Pixel filling power
Let’s kick off this party with a look at fill rate, that ancient graphics standard of pixel-pushing power. The GeForce 6800 Ultra’s theoreticals for fill rate are staggering. Here are some common high-end cards from the past year or so for comparison, with the 6800 Ultra at the bottom of the table.
|Core clock (MHz)||Pixel pipelines||Peak fill rate (Mpixels/s)||Texture units per pixel pipeline||Peak fill rate (Mtexels/s)||Memory clock (MHz)||Memory bus width (bits)||Peak memory bandwidth (GB/s)|
|GeForce FX 5800 Ultra||500||4||2000||2||4000||1000||128||16.0|
|Radeon 9700 Pro||325||8||2600||1||2600||620||256||19.8|
|Radeon 9800 Pro||380||8||3040||1||3040||680||256||21.8|
|Radeon 9800 Pro 256MB||380||8||3040||1||3040||700||256||22.4|
|Radeon 9800 XT||412||8||3296||1||3296||730||256||23.4|
|GeForce FX 5900 Ultra||450||4||1800||2||3600||850||256||27.2|
|GeForce FX 5950 Ultra||475||4||1900||2||3800||950||256||30.4|
|GeForce 6800 Ultra||400||16||6400||1||6400||1100||256||35.2|
Holy Moses that’s a lotta oomph! Thanks to its sixteen pixel pipes, the GeForce 6800 Ultra doubles the theoretical peak pixel and textured pixel (texel) fill rates of the Radeon 9800 XT. Of course, memory bandwidth often limits actual fill rates, and any chip as powerful as the GeForce 6800 Ultra is likely to be limited by today’s memory technology. However, with 35.2GB/s of peak memory bandwidth thanks to a 256-bit path to its DDR3 memory, the GeForce 6800 Ultra isn’t exactly deprived.
Let’s put the theory to test with some synthetic fill rate benchmarks.
The theory works out pretty well in this test. With only a single texture, the NV40 can only reach half its theoretical peak, but with multiple textures per pixel, the GPU gets very near to its theoretical peak, and in doing so, it absolutely trounces the Radeon 9800 XT and GeForce FX 5950 Ultra.
For a different look at fill rate, we also tested with D3D RightMark, which lets us see performance with up to eight textures applied.
The GeForce 6800 Ultra again outclasses the other guys by miles, especially in the more common cases of one, two, and four textures per pixel. As an 8×1 design, the Radeon 9800 XT’s performance decays much like the 16×1 NV40, while the 4×2 GeForce FX produces a zig-zag pattern, performing much better with even numbers of textures per pixel.
Now, on to some real games.
We’ve presented our benchmark results in line graphs, and we’ve elected to test a full range of resolutions both with and without edge antialiasing and anisotropic texture filtering. That may seem deeply dorky to you as you look at some of these results, because in older games, none of these cards is straining at all at 1600×1200 without AA and aniso. However, I happen to think that format is illuminating, so bear with me. Sometimes, every once in a while, performance in a game is limited by something other than fill rate, and tests at lower resolutions will show that.
Unreal Tournament 2004
UT2004 seems to be limited to about 70 frames per second no matter what, but once we reach higher resolutions, even without AA and aniso, the gap between the GeForce 6800 Ultra and the older cards begins to become apparent. As the fill rate requirements grow, the NV40 pulls further and further away.
The above numbers were taken with the game’s default settings. To better test these high-end cards, I also tried UT2004 will all the visual quality settings maxed out.
The UT announcer does a lot of cursing, but these cards don’t seem to mind the highest settings one bit. Frame rates are barely affected.
Quake III Arena
Q3A isn’t shy about reaching much higher framerates at low res than UT2004, and once we turn up the resolution and the eye candy, the GeForce 6800 Ultra starts to look like a monster.
Like UT2004, Comanche 4 can’t seem to break 70 frames per second no matter what. This older DirectX 8-class game doesn’t slow down at all at 1600×1200, but with antialiasing, the GeForce 6800 Ultra again pulls far ahead of the Radeon 9800 XT and GeForce FX 5950 Ultra.
Wolfenstein: Enemy Territory
Wolf: ET is based on the Quake III engine, but it appears to use much larger textures and perhaps more polygons than Q3A. Again, the GeForce 6800 Ultra delivers a stompin’.
The Radeon 9800 XT really seems to struggle here, more so than it did with older driver revisions. I’m not quite sure why.
Far Cry is easily the most advanced game engine in a game available on store shelves today, with extensive shadowing, lush vegetation, real DirectX 9-class lighting, pixel shader water effects, the works. Of our game benchmarks, this may be the most representative of future games.
However, Far Cry benchmarking is a little dicey, because the game’s demo recording feature doesn’t record things like environmental interactions with perfect accuracy. I was able to manage a fairly repeatable benchmarking scenario by recording the opening stages of the game and using the save/load game feature. The player walks through some tunnels, comes out into the open, and walks down to the beach. There’s no real interaction with the bad guys, which makes the sequence repeatable. Far Cry‘s eye candy is still on prominent display. I used FRAPS to capture frame rates during playback.
Rather than test both with and without forced AA and aniso, I used the in-game settings. For this test, I cranked up the machine spec setting and all the advanced visual settings to “Very High” with the aniso setting maxed out at 4. These settings are significantly more demanding than the game’s defaults, but I really wanted to push these cards for once.
The 5950 trails well behind the Radeon 9800 XT in this DX9 game, as we’ve long feared might be the case. However, the GeForce 6800 Ultra has no trouble outpacing the 9800 XT.
AquaMark appears to be vertex shader limited at lower resolutions, because the GeForce 6800 Ultra outruns the older cards by a fair margin, even at 640×480. The situation doesn’t change much as resolutions increase, either. The NV40 just can’t be denied.
Splinter Cell is fill rate limited on the Radeon and GeForce FX, but clearly not on the GeForce 6800 Ultra, even at 1600×1200. Let’s look at frame rates over time, so we can see where the peaks and valleys are during gameplay.
At higher resolutions, the other cards’ peaks are looking a lot like the GeForce 6800 Ultra’s valleys. Not only are frame rate averages up with NV40; frame rates are up across the board.
Serious Sam SE
Serious Sam can run in either Direct3D or OpenGL. Since the game’s default mode is OpenGL, we ran our tests with that API. To keep things on an even playing field, we used the “Default settings” add-on to defeat Serious Sam SE’s graphics auto-tuning features.
The GeForce FX and Radeon 9800 XT are pretty evenly matched, but the GeForce 6800 Ultra barely breaks a sweat.
With AA and aniso on, the GeForce 6800 Ultra’s frame rate doesn’t dip below 80 FPS at 1600×1200. The Radeon 9800XT doesn’t once reach 60 FPS.
At FutureMark’s request, we are using NVIDIA’s 52.16 drivers for the GeForce FX 5950 in this test. FutureMark says newer NVIDIA drivers have not been validated for use with 3DMark03. Unfortunately, we have no way of using a validated driver with the GeForce 6800 Ultra, and FutureMark has given us no choice but to report the results with the 60.72 drivers here. Make of these results what you will.
In 3DMark03’s overall composite score and in each of the component tests, the GeForce 6800 Ultra positively dominates, nearly doubling the Radeon 9800 XT’s overall score at 1024×768. You may have heard rumors that the NV40 could hit over 12,000 in 3DMark03. I’m sure that’s quite true. Our Athlon 64 3400+ test rig isn’t the fastest available platform for 3DMark03not even close. Coupled with a Pentium 4 Extreme Edition at 3.4GHz, the GeForce 6800 Ultra might top 13K in 3DMark03.
The synthetic pixel and vertex shader tests confirm the GeForce 6800 Ultra’s proficiency. The vertex shader scores appear to support NVIDIA’s claim that NV40 has twice the vertex shader power of NV38.
3DMark image quality
The Mother Nature scene from 3DMark has been the source of some controversy over time, so I wanted to include some screenshots to show how the three cards compare. On this page and in all the following pages with screenshots, you’re looking at low-compression JPEG images. You can click on the image to open a new window with a lossless PNG version of the image.
The results look very similar between the three cards, at least to my eye.
ShaderMark is intended to test pixel shader performance with DirectX 9-class pixel shaders. Specifically, ShaderMark 2.0 is geared toward pixel shader revisions 2.0 and 2.0a. (Version 2.0a or “2.0+” uses longer programs and flow control.) ShaderMark also has the ability to use a “partial precision” hint on NVIDIA hardware to request 16-bit floating point mode. Otherwise, the test uses 32 bit of precision on NVIDIA cards and, no matter what, 24 bits per color channel on the Radeon 9800 XT due to that chip’s pixel shader precision limits.
Unfortunately, even on NV40, some of ShaderMark’s shaders won’t run right. We found this to be the case with the DeltaChrome S8 Nitro, as well, so I suspect the problem has to do with the program looking for ATI-specific capabilities of some sort.
I’d say NVIDIA has put its pixel shader performance problems behind it. The performance delta between the GeForce FX and the GeForce 6800 series is massive. The Radeon 9800 XT is about half the speed, with 24 bits of precision, of the GeForce 6800 Ultra with 32 bits. And in this context, the NV40’s ability to use 16-bit precision looks like another big advantage, because 16 bits is all that’s necessary to render any of these shaders without visible artifacts.
ShaderMark image quality
One of the quirks of running ShaderMark on the GeForce 6800 Ultra with 32-bit precision was a texture placement problem with the background texture (not the pixel shaders or the orb thingy, just the background.) The problem didn’t show up in “partial precision” mode, but it did in FP32 mode. Since the changing background texture is distracting, and since 16 bits per color channel is more than adequate for these pixel shaders, I’ve chosen to use the “partial precision” images from the GeForce 6800 Ultra. Also, both modes showed an apparent gamma problem with the pixel shaded objects on the NV40. They’re very bright. There’s nothing I could do about that.
The images shown are the GeForce 6800 Ultra screenshots, until you run your mouse over them, at which point the Radeon-generated images will appear.
Per pixel diffuse lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Point phong lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Spot phong lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Directional anisotropic lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Bump mapping with phong lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Self shadowing bump mapping with phong lighting (move mouse over the image to see the Radeon 9800 XT’s output)
Procedural stone shader (move mouse over the image to see the Radeon 9800 XT’s output)
Procedural wood shader (move mouse over the image to see the Radeon 9800 XT’s output) Aside from the gamma differences, the output is nearly identical.
To measure texture filtering performance, we used Serious Sam SE running at 1600×1200 resolution in Direct3D mode. Note that I’ve tested the GeForce 6800 Ultra with several different settings. The “base” GeForce 6800 Ultra setting is with trilinear optimizations disabled and adaptive anisotropic filtering enabled. I also tested with “brilinear” filtering (or trilinear optimizations) enabled. Finally, the “High Quality” setting disabled adaptive anisotropic filtering, forcing full aniso filtering on all surfaces.
Well, it really doesn’t matter which optimizations are disabled. The GeForce 6800 Ultra is mind-numbingly fast regardless. However, you can see that fudging trilinear with the “brilinear” optimization really does make a difference in performance. Adaptive ansio’s performance impact is much smaller, as is its impact on image quality.
Texture filtering quality
Here’s a sample scene from Serious Sam, grabbed in Direct3D mode, that shows texture filtering in action along the wall, the floor, and on that 45-degree inclined surface between the two.
The difference between “brilinear” filtering and true trilinear is difficult to detect with a static screenshot and the naked eyeat least, it is for me. Remember that the GeForce FX 5950 is doing “brilinear” filtering in all cases.
Texture filtering quality
Once we dye the different mip maps colors using Serious Sam’s built-in developer tools, we can see the difference between the filtering methods more clearly.
Here’s NVIDIA’s trilinear optimization at work. Mip-map boundary transitions aren’t as smooth as they are on the Radeon 9800 XT and on the GeForce 6800 Ultra with “brilinear” disabled.
Anisotropic texture filtering quality
At long last, NVIDIA cards are able to do 16X anisotropic filtering plus trilinear. Again, not that we can tell the difference much, at least in this example. All of the cards look great.
Anisotropic texture filtering quality
With our mip maps dyed various colors, the trilinear optimizations become very easy to see here. Notice how much more aggressive the filtering is on the inclined surface between the floor and the wall in the bottom two shots.
You’ve already seen the GeForce 6800 Ultra put up some amazing performance numbers with 4X antialiasing and 8X anisotropic filtering. Let’s have a look at how it scales across the various AA modes.
Not bad, although that 8X AA mode is a killer.
Like ATI, NVIDIA uses multisampling for antialiasing, a technique that avoid unnecessary texture reads on subpixels (or fragments) not on the edge of an object boundary. This is a pretty efficient way to do edge AA, but it does produce some artifacts. Fortunately, NVIDIA has improved the NV40’s antialiasing over the NV30 series in several key ways, one of which is support for “centroid sampling,” which avoids retrieving incorrect texture samples and prevents one of the artifacts associated with multisampling. There was a brouhaha over centroid sampling in relation to Half-Life 2 not long ago, because ATI’s R300 series supports centroid sampling and NV3x requires a workaround. Now, when applications request it, they can get centroid sampling from the NV40.
Another improvement to NV40’s antialiasing is a rotated grid pattern for 4X AA. I’m not sure why, but the GeForce FX series used a sampling grid aligned with the screen. Rotated grids patterns have been around since at least the 3dfx Voodoo 5 series, and they have the advantage of handling of near-vertical and near-horizontal edges better. Rotated grids also help throw off the eye’s pattern recognition instincts when a scene is in motion, arguably producing superior results. The holy grail in AA is a truly random sampling pattern from pixel to pixel, but a rotated grid is a nice first step.
I used a cool little antialiasing pattern viewer to show each card’s AA patterns at different sample rates. The red dots are the geometry sample points, and the green dots are texture sample points. You can see below how the GeForce 6800 Ultra has a rotated grid pattern at 4X AA, while the GeForce FX does not.
The GeForce FX does have rotated grid multisampling at 2X and 8X AA, but not at 4X. The NV40 corrects that oversight. Notice, also, how NVIDIA’s 8X modes appear to do texture sampling closer to the geometry sample points. If this is a correct representation, NVIDIA’s method may eliminate some of the artifacts associated with multisampling without resorting to true centroid sampling.
Also, check out that funky sample pattern for the Radeon 9800 XT at 6X AA. You have got to like that.
We’ll start off with non-AA images, just to establish a baseline.
2X AA shows little difference between the three cards.
At 4X AA, the GeForce 6800 Ultra is clearly superior to the GeForce FX 5950 Ultra on near-vertical and near-horizontal edges, like the edges of the big bomber’s tail wings. To my eye, the Radeon 9800 XT and GeForce 6800 Ultra are pretty well matched. You may want to click through to look at the uncompressed PNG images and see for yourself. I think perhaps, just maybe, the Radeon 9800 XT does a better job of smoothing out jaggies. Just a little.
Notice a couple of clear differences here. Check out the tail section of the plane in the lower left corner of the screen. The Radeon 9800 XT’s 6X AA mode has done a better job smoothing out rough edges than the GeForce 6800 Ultra at 8X AA. That may be the result of ATI’s funky sampling pattern; I’m not sure. Then check out the dark emblem at the top of the tail fin. On the Radeon, it’s very blurry, while on the GeForce 6800 Ultra it’s quite defined. If you click back through and look at the pictures without AA, that marking looks more like it does in the Radeon 6X AA shot.
Interesting. Not sure what to make of that.
To illustrate the effects of antialiasing, I’ve run a “diff” operation between each card’s original output and 4X antialiased output.
The GeForce 6800 Ultra clearly “touches” more edge pixels than its predecessor. The edges outlines are more pronounced in the GeForce 6800 Ultra image. However, the Radeon 9800 XT touches more pixels than eitherthough not necessarily more edge pixels. The Radeon appears to modify more of the screen than the GeForce 6800 Ultra, which is probably less efficient.
High dynamic range image-based lighting
The GeForce 6800 Ultra now supports the proper floating-point texture formats for my favorite DirectX 9 demo. Have a look at some screenshots to see how it looks.
The banding is obvious on the GeForce FX, but the GeForce 6800 Ultra runs this demo like a champ, faster and I think perhaps with discernibly better output in some cases thanks to its FP32 pixel shaders.
High dynamic range image-based lighting
Here’s another example where the GeForce FX struggled.
The FX just couldn’t handle this lighting technique, at least as this program implemented it. The NV40 has no such problems.
In fact, I should mention in relation to high-dynamic-range lighting that the NV40 includes provisions to improve performance and image fidelity for HDR lighting techniques over what ATI’s current GPUs support. John Carmack noted one of the key limitations of first-gen DirectX 9 hardware, including R300, in his .plan file entry from January 2003:
The future is in floating point framebuffers. One of the most noticeable thing this will get you without fundamental algorithm changes is the ability to use a correct display gamma ramp without destroying the dark color precision. Unfortunately, using a floating point framebuffer on the current generation of cards is pretty difficult, because no blending operations are supported, and the primary thing we need to do is add light contributions together in the framebuffer. The workaround is to copy the part of the framebuffer you are going to reference to a texture, and have your fragment program explicitly add that texture, instead of having the separate blend unit do it. This is intrusive enough that I probably won’t hack up the current codebase, instead playing around on a forked version.
So in order to handle light properly, the cards had to use a pixel shader program, causing a fair amount of overhead. The NV40, on the other hand, can do full 16-bit floating-point blends in the framebuffer, making HDR lighting much more practical. Not only that, but NV40 uniquely supports 16-bit floating-point precision for texture filtering, including trilinear and anisotropic filtering up to 16X. I’d hoped to see something really eye-popping, like Devebec’s Fiat Lux running in real time using this technique, but no such luck yet. Perhaps someone will cook up a demo soon.
So what happens when you plug a graphics chip with 222 million transistors running at 400MHz into your AGP slot? I got my trusty watt meter and took some readings to find out. I tested the whole system, as shown in our Testing Methods section, to see how many watts it pulled. The monitor was plugged into a separate power source. I took readings with the system idling at the Windows desktop and with it running the real-time HDR lighting demo at 1024×768 full-screen. For the “under load” readings, I let the demo run a good, long while before taking the reading, so the card had plenty of time to heat up and kick on its cooler if needed.
At idle, the NVIDIA chips do a nice job of cutting their power consumption. Under load, the GeForce 6800 Ultra can really suck up the juice. Surprisingly, though, it’s not much worse than the GeForce FX 5950 Ultra.
We’ve covered a lot of ground in this review, and I haven’t touched on so many things. NVIDIA was very open about sharing NV40 architectural details, but I haven’t discussed them in any depth. The chip has an all-new video processing unit, too, that deserves some attention, especially because it can encode and decode MPEG 2 and 4 video. There are big issues to address, such as DirectX shader model 3.0 and its differences from 2.0. (Thumbnail sketch: Pixel shader 3.0 will matter for performance more than anything else, because developers will not be writing shaders in assembly. Some elements of the PS 3.0 spec will enhance image quality by requiring FP32 precision, as well.) I’ve even left out some of NV40’s new 3D capabilities, like vertex texture fetch and displacement mapping.
There was simply no time for me to do all this testing and address all of these things in the past week or so. However, at the end of the day, we need to know more about what ATI is cooking up before we can put many of the NV40’s more advanced features into context. That day will come soon enough, and we’ll examine these issues in more detail when it does.
For now, I think we’ve established a few things about the NV40 with all of our testing. First and foremost among these is the fact that NVIDIA isn’t blowing smoke this time around. Many of our crazy tests in this mega-long review are aimed at exposing weaknesses or just verifying proper operation of the GPU, and the GeForce 6800 Ultra passed with flying colors. The NV40 is exceptionally good, with no notable weaknesses in performance or capability. NVIDIA has caught up to ATI’s seminal R300 chip in virtually every respect, while adding a host of new features that make NV40 a better graphics processor, including long shader programs, more mathematical precision, and floating-point framebuffer blends.
And it’s a freaking titan of graphics performance.
Not only is the sixteen-pipe GeForce 6800 Ultra fast, but it’s also rather efficient, extracting gobs more framebuffer bandwidth out of just a little more memory bandwidth, relatively speaking, than the GeForce FX 5950 Ultra.
All of this bodes well for the coming generation of graphics processors based on NV40, from the twelve-pipe GeForce 6800 on down the line. An eight-pipe version of this chip would make a dandy $199 card, and it would almost certainly give an eight-pipe R300 more than it could handle. NVIDIA has left ATI with no margin for error.