Texturing
We've talked quite a bit about the top portion of that SP cluster, but not much about the lower part. Attached to each group of 16 SPs is a texture address and filtering unit. Each one of these units can handle four texture address operations (basically grabbing a texture to apply to a fragment), for a total of 32 texture address units across the chip. These units run at the G80's core clock speed of 575MHz, not the 1.35GHz of the SPs. The ability to apply 32 textures per clock is formidable, even if shader power is becoming relatively more important. Here's how the math breaks down versus previous top-end graphics cards:

  Core
clock
(MHz)
Pixels/
clock
Peak
fill rate
(Mpixels/s)
Textures/
clock
Peak
fill rate
(Mtexels/s)
Effective
memory
clock (MHz)
Memory
bus width
(bits)
Peak memory
bandwidth
(GB/s)
GeForce 7900 GTX65016104002415600160025651.2
Radeon X1950 XTX65016104001610400200025664.0
GeForce 8800 GTS50020100002412000160032064.0
GeForce 8800 GTX57524138003218400180038486.4

So in theory, the G80's texturing capabilities are quite strong; its 18.4 Gtexel/s theoretical peak isn't vastly higher than the GeForce 7900 GTX's, but its memory bandwidth advantage over the G71 is pronounced. As for pixel fill rates, both ATI and Nvidia seem to have decided that about 10 Gpixels/s is sufficient for the time being.


The G80's texture address and filtering units.
Source: NVIDIA.

The G80 appears capable of delivering on its theoretical promise in practice, and then some. I've included the 3DMark fill rate tests mainly because they show us how the G71's pixel fill rate scales up with display resolution (freaky!) and how close these GPUs can get to their theoretical peak texturing capabilities (answer: very close.) However, I prefer RightMark's test overall, and it shows the G80 achieving just under twice the texturing capacity of the R580+.

The G80's texturing abilities are also superior to the G71's in a way that our results above don't show. The G71 uses one of the ALUs in each pixel shader processor to serve as a texture address unit. This sharing arrangement is sometimes very efficient, but it can cause slowdowns in texturing and shader operations, especially when the two are tightly interleaved. The G80's texture address units are decoupled from the stream processors and operate independently, so that texturing can happen freely alongside shader processing—just like, dare I say it, ATI's R580+.

More impressive than the G80's texture addressing capability, though, is its capacity for texture filtering. You'll see eight filtering units and four address units in the diagram to the left, if you can work out what "TA" and "TF" mean. The G80 has twice the texture filtering capacity per address unit of the G71, so it can do either 2X anisotropic filtering or bilinear filtering of FP16-format textures at full speed, or 32 pixels per clock. (Aniso 2X and FP16 filtering combined happen at 16 pixels per clock.) These units can also filter textures in FP32 format for extremely high precision results. All of this means, of course, that the G80 should be able to produce very nice image quality without compromising performance.