Texturing, ROP hardware, and memory interface
Ah, the basic math that—outside of shaders—determines so much of a GPU's character. Let's have a look at the numbers, and then we'll talk about why they are the way they are.

Peak
pixel
fill rate
(Gpixels/s)
Peak bilinear
texel
filtering
rate
(Gtexels/s)
Peak bilinear
FP16 texel
filtering
rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
GeForce 8800 GTX 13.8 18.4 18.4 86.4
GeForce 9800 GTX 10.8 43.2 21.6 70.4
GeForce 9800 GX2 19.2 76.8 38.4 128.0
GeForce GTX 260 16.1 36.9 18.4 111.9
GeForce GTX 280 19.3 48.2 24.1 141.7
Radeon HD 2900 XT 11.9 11.9 11.9 105.6
Radeon HD 3870 12.4 12.4 12.4 72.0
Radeon HD 3870 X2 26.4 26.4 26.4 115.2

Each of the GT200's thread processing clusters has the ability to address and bilinearly filter eight textures per clock, just like in the G92. That's up from the G80, whose TPCs were limited to addressing four textures per clock and filtering eight. As in both of those chips, the GT200 filters FP16 texture formats at half the usual rate. Because the new GPU has 10 TPCs, its texturing capacity is up, from 64 texels per clock in G92 to 80 texels per clock in GT200. That's not a huge gain in texture filtering throughput, but Nvidia expects more efficient scheduling to bring GT200 closer to its theoretical peak than G92.

Meanwhile, the GT200's ROP partitions runneth over. It has eight of 'em, 50% more than the G80 and twice the number in the G92. Each of its ROP partitions can output four pixels per clock, which means the GT200 can draw pixels at a rate of 32 per clock cycle. As a result, the single-GPU GeForce GTX 280's hypothetical peak pixel-pushing power surpasses even the GeForce 9800 GX2's. Beyond the increase in number, the ROP hardware is largely unchanged, although it can now perform frame-buffer blends in one clock cycle instead of two, so the GT200's blend rate is 32 samples per clock, versus 12 per clock on the G80.

To me, the GT200's healthy complement of ROP partitions is the most welcome development of all because, especially on Nvidia's GPUs, the ROP hardware plays a big role in antialiasing performance. Lots of ROP capacity means better frame rates with higher levels of antialiasing, which is always a good thing.

Another thing the wealth of ROP partitions provides is an ample path to memory, 512 bits in all. That kind of external bandwidth means the GT200 has to have lots of traces running from the GPU to memory and lots of space on the chip dedicated to I/O pads, and some folks have questioned the wisdom of such things. After all, the last example we have of a GPU with a 512-bit interface is the Radeon HD 2900 XT, and it turned out to be awfully large for the performance it delivered. Nvidia insists the primary limiter of the GT200's size is its shader cores and says the I/O pads are roughly balanced to this. Although the GT200 sticks with tried-and-true GDDR3 memory, it's capable of supporting GDDR4 memory types, as well—not that it may ever be necessary. The GTX 280's whopping 142 GB/s of bandwidth outdoes anything we've seen to date, even the dual-GPU cards.

Speaking of bandwidth, we've found that synthetic tests of pixel fill rate tend to be limited more by memory bandwidth that anything else. That seems to be the case here, since none of the cards reach anything close to a theoretical peak and the top four finish in order of memory bandwidth.

The texturing results prove to be more interesting, in part because the numbers and units don't correspond to these GPUs' abilities at all. They're typically a little more than ten times the theoretical peak. I've looked at FutureMark's whitepaper and even inquired directly with them about what's going on here, but I haven't yet received an answer. The results do appear to make sense for what this is: a relative comparison of FP16 texel fill rate.

RightMark's fill rate test uses integer texture formats, so it's a little different. Here, the GTX 280's texel throughput essentially doubles that of the GeForce 8800 GTX. The GT200's more efficient scheduling does seem to be helping a little bit, as well; the GTX 260 matches the GeForce 9800 GTX, despite having a slightly lower theoretical peak.