Testing RV770's mettle
So how do the rearchitected bits of RV770 work when you put them all together? Let's have a look. First, here's a quick table showing the theoretical peak capacities of some relevant GPUs, which we can use for reference.
|GeForce 8800 GTX||13.8||18.4||18.4||86.4|
|GeForce 9800 GTX||10.8||43.2||21.6||70.4|
|GeForce 9800 GX2||19.2||76.8||38.4||128.0|
|GeForce GTX 260||16.1||36.9||18.4||111.9|
|GeForce GTX 280||19.3||48.2||24.1||141.7|
|Radeon HD 2900 XT||11.9||11.9||11.9||105.6|
|Radeon HD 3870||12.4||12.4||12.4||72.0|
|Radeon HD 3870 X2||26.4||26.4||26.4||115.2|
|Radeon HD 4850||10.0||25.0||12.5||63.6|
|Radeon HD 4870||12.0||30.0||15.0||115.2|
Oddly enough, on paper, the RV770's numbers don't look all that impressive. The Radeon HD 4850 trails the GeForce 9800 GTX in every category, and the 4870 isn't much faster in most departmentsexcept for memory bandwidth, of course, thanks to GDDR5. But what happens when we measure throughput with a synthetic test?
Color fill rate tests like this one tend to be limited mainly by memory bandwidth, as seems to be largely the case here. The Radeon HD 4850 manages to outdo the GeForce 9800 GTX, though, despite a slightly lower memory clock. As for the 4870, well, it beats out the GeForce GTX 260 and the Radeon HD 3870 X2, which would seem to suggest that its GDDR5 memory is fast and relatively efficient. The GTX 260 and 3870 X2 have similar memory bandwidth in theory, but they're slower in practice.
This is a test of integer texture filtering performance, so many of the GPUs should be faster here than in our next test. The RV770 doesn't look too bad, and its performance scales down gracefully as the number of filter taps increases. But Nvidia's GPUs clearly have more texture filtering capacity, both in theory and in practice, with 32-bit texture formats.
This test, however, measures FP16 texture filtering throughput, and here, the tables turn. Amazingly, the Radeon HD 4850 outdoes the GeForce GTX 280, and the 4870 is faster still. Only the "X2" cards, with dual GPUs onboard, are in the same league. It would seem Nvidia's GPUs have some sort of internal bottleneck preventing them from reaching their full potential with FP16 filtering. If so, they're in good company: the Radeon HD 3870's theoretical peak for FP16 filtering is almost identical to the Radeon HD 4850's, yet the 4850 is much faster.
Incidentally, if the gigatexel numbers produced by 3DMark seem confusing to you, well, I'm right there with you. I asked FutureMark about this problem, and they've confirmed that the values are somehow incorrect. They say they're looking into it nowor, well, after folks are back from their summer vacations. In the meantime, I'm assuming we can trust the relative performance reported by 3DMark, even if the units in which they're reported are plainly wrong. Let's hope I'm right about that.