Single page Print

Fill rate
To show exactly where a new product sits among the current crop of 3D graphics cards, we like to pull out the ol' chip table and compare the specs. As always, these numbers can lie. The key ones, like memory bandwidth and fill rate, are theoretical peaks, not real-world numbers. Real-world performance will vary depending on implementations. Still, the chips' specs are instructive, so here's our trusty table:

Core clock (MHz) Pixel pipelines Peak fill rate (Mpixels/s) Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s)
Radeon 8500 275 4 1100 2 2200 550 128 8.8
GeForce4 Ti 4600 300 4 1200 2 2400 650 128 10.4
Parhelia-512 220 4 880 4 3520 550 256 17.6
Radeon 9700 Pro 325 8 2600 1 2600 620 256 19.4

The Radeon 9700 Pro leads in nearly every category. It's endowed with gobs of memory bandwidth and a blistering pixel fill rate that's more than double that of the closest competition.

One of the most important items in the table above is texel fill rate, which describes a card's ability to produce pixels with texture maps applied to them. In current games, texel fill rate is key to good performance at high resolutions. The 9700 Pro's texel fill rate is good, but it's not head and shoulders above the other cards. As I said above, ATI's use of only one texture unit per pixel pipeline is a bit of a compromise. The 9700 chip can apply a single texture per clock cycle, while others can apply two or four textures per clock.

However, with eight pixel pipelines, the 9700 chip has an advantage. Matrox's Parhelia, for instance, has the highest theoretical peak texel fill rate, with four texture units in each of its four pixel pipes. Parhelia isn't able to use all four texture units per pipe in most current games and apps, because those apps don't apply four textures per rendering pass. As a result, Parhelia performs much slower than its stats-sheet specs would suggest. The 9700, on the other hand, can almost always put all of its texture units to use at once, even if an app only applies one texture per pass.

Should the 9700 need to apply more than one texture per rendering pass, it can send pixels back through its pipelines up to 16 times before sending the results out to a frame buffer. This process will chew up clock cycles, but it's not nearly as much trouble as writing the results to memory, reading them back in, and then applying another texture. (For the record, the Radeon 8500 can "loop back" pixels in this manner up to three times per rendering pass, for a total of six textures per pass. The GeForce4 Ti can do it twice, delivering four textures per pass.)

Of course, none of this pixel filling ability is worth much if the VPU—visual processing unit, don't ask—can't write those pixels out to memory as fast as it can process them. In order to make most effective use of its memory, the Radeon 9700 includes a sophisticated crossbar memory interface very similar to the one in NVIDIA's GeForce4 Ti chips. However, at 256 bits, the 9700 has double the number of paths to memory and thus double the raw memory bandwidth. The 9700's memory interface incorporates four memory controllers on the VPU side and four 64-bit channels into main memory. Between the memory channels and the array of controllers is a switched fabric. Any one of the memory controllers can talk to any one (or two or four) memory channels via the switched fabric. As you might imagine, this approach is much more efficient than simply transferring data sequentially in 256-bit chunks.


A block diagram of the Radeon 9700's memory controller. Source: ATI.

Let's see how all of this technology translates into performance. 3DMark's synthetic fill rate test measures pixel and texel fill rates. This test doesn't exploit all of the advanced bandwidth conservation and pixel loopback techniques we've discussed, but it should give us a good idea about a card's basic pixel-pushing prowess. We'll test performance in real games at very high resolutions a bit later.

The theory works out rather nicely in practice with the 9700. The chip is by far the fastest in terms of pixel (or single-textured) fill rate, and it delivers the highest texel fill rate in all but one display resolution, where the Parhelia grabs a small edge. Notice, also, how much closer the R9700 Pro comes to its theoretical fill rate than the Parhelia.