Fill rate
Even today, pixel-pushing power is still one of the key determinants of the performance of a graphics chip. Getting a handle on the GeForce FX 5800 Ultra's fill rate performance, however, is more than a little slippery. Let's start by looking at our chip table, and then I'll tell you why it's not exactly right.

  Core clock (MHz) Pixel pipelines Peak fill rate (Mpixels/s)Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s)
GeForce4 Ti 4600

300

412002240065012810.4
GeForce FX 5800400416002320080012812.8
GeForce FX 5800 Ultra5004200024000100012816.0
Radeon 9700275822001220054025617.3
Parhelia-51222048804352055025617.6
Radeon 9700 Pro325826001260062025619.8
Radeon 9800 Pro380830401304068025621.8

The GeForce FX 5800 Ultra runs at much higher clock rates than its competition, and its DDR-II memory does, too. NVIDIA chose a cutting-edge approach to developing the GeForce FX, relying on newer technologies and higher clock rates to deliver performance. The chip's 500MHz core clock speed gives it relatively high pixel and texel fill rates. The NV30 has four independent memory controllers in a crossbar arrangement, which is essentially the same as the GeForce3 and GeForce4 Ti chips, with the exception that the NV30 been tweaked to support DDR-II-style signaling. Its memory bus is only 128 bits wide, but 1GHz DDR-II memory gives the GXFX 5800 Ultra memory throughput of 16GB/s.

ATI has taken different approach with the Radeon 9700 and 9800 series, settling for lower clock rates but getting more work done each clock cycle. The high-end ATI chips have 256-bit-wide memory interfaces (with four 64-bit memory controllers), which give them more memory bandwidth than the GeForce FX 5800 cards, even with conventional DDR memory and the concomitant lower memory clock speeds.

Now, here's why the above isn't quite right. Our usual assumptions about graphics chips pipelines don't entirely apply to the GeForce FX. NVIDIA is very coy about exactly how the NV30 GPU looks inside. For quite a while, most of the world believed NV30 was an 8-pipeline design with one texture unit per pipe. Turns out that isn't so. Instead, the FX 5800 Ultra is... well, complicated. Lately, the company's representatives have taken to talking about arrays of functional units instead of pixel pipelines. It's sometimes hard to penetrate.

When asked, NVIDIA explains the NV30's capabilities like so:

It renders:

8 z pixels per clock
8 stencil ops per clock
8 textures per clock
8 shader ops per clock
4 color + z pixels per clock with 4x multisampling enabled

It is architected to perform those functions.

Basically, its 8 pipes with the exception of color blenders for traditional ROP operations, for which it has hardware to do 4 pixels per clock for color & Z. It is that it has 8 "full" pipes that can blend 4 pixels per clock with color.
Now, the phrase "color + Z pixels" in there is key for our discussion, because that's generally the kind of pixels most current 3D applications are rendering. That's your standard pixel, with a color value, situated in 3D space. When doing this sort of conventional rendering, the NV30 can produce four pixels per clock cycle with up to two textures applied to each.

This configuration gives the NV30 a bit of a disadvantage next to ATI's R3x0 series in terms of single-textured fill rate. Our table above reflects that difference, and it's generally correct so far.

However, the NV30 can do certain types of operations, including stencil ops, at 8 pixels per clock. This ability makes the NV30 more formidable than a straight-up "4 x 2" pixel pipeline specification might indicate. NVIDIA claims the rendering and shadowing techniques used in upcoming games like Doom III will take particular advantage of the NV30's eight-pipe capabilities.

Of course, we can measure these things. Here are the scores from 3DMark2001's fill rate test, which is a simple test of traditional "color + Z" rendering.

The GeForce FX turns in performance more or less like we'd expect given the numbers on our chip table above, very much like a "4 x 2" design. Its single-textured fill rate is lower than the Radeons, but its multi-textured fill rate is second to none.

These numbers bode well for the GeForce FX in scenarios where games and applications apply at least two textures per pixel. However, some newer games are substituting pixel shader effects for additional textures, so the underlying strength of ATI's true eight-pipe approach remains formidable.

In situations where applications want to apply more than one or two textures to a pixel, both the NV30 and the R300 series chips can "loop back" pixels through their pipelines in order to apply more textures. Both chips can apply up to 16 textures without resorting to multipass rendering.

The primary constraint for fill rate is sometimes memory bandwidth rather than GPU processing power, and memory bandwidth bottlenecks are more of a concern than ever when dealing with 64-bit and 128-bit floating-point color modes. In cases where memory bandwidth is a primary limitation, the Radeon 9700 and 9800 cards will have the advantage.

Occlusion detection
Then again, modern GPUs employ a whole range of tricks to make better use of their fill rate and memory bandwidth, including compression of various types of graphics data. Both the NV30 and its R3x0-series competition can compress Z data (depth information on pixels) and color (framebuffer) data using lossless algorithms. Color compression is new in this generation of chips, and it's most useful when edge antialiasing techniques are active, where multiple samples are often the same color. As I understand it, the NV30's color compression is always active, while ATI only turns on color compression with antialiasing.

NVIDIA doesn't talk much about it, but I believe the NV30 employs an "Early Z" occlusion detection algorithm, like the R3x0 chips, to reduce the possibility the chip will render a pixel that would be situated behind another pixel—and thus not visible—in the final scene. With fancy shader programs in the mix, pixels become more expensive to render, so eliminating occluded pixels up front becomes a higher priority.

All of these methods of bandwidth conservation improve efficiency, and if implemented well, they offer the NV30 some hope of outperforming the ATI chips, even with less memory bandwidth. We'll use VillageMark to test these chips' fill rate and occlusion detection capabilities, both with and without antialiasing enabled.

Even with less memory bandwidth, the GFFX 5800 Ultra matches up well against the ATI cards, leading the pack in the non-AA tests. The ATI cards lead when 4X edge antialiasing and 8X anisotropic filtering are enabled, but even then, the GeForce FX runs remarkably close to the Radeon 9700.

Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list