The theory—and practice
You've already heard me ramble on at length about texture filtering and memory bandwidth, and you may be wondering why. Well, specs are pretty important in graphics cards, even to this day. No, they're not destiny—a more efficient architecture might outperform a less efficient one, even if the latter had higher peak performance in theory. In fact, it happens all the time. But constraints like memory bandwidth do tend to dictate relative performance, especially among similar products from the same GPU maker. Below, I've compiled some key numbers for the cards we're testing and a few of the higher-end ones we're not, so you can get a sense of the landscape. Please note that these numbers are based on the actual clock speeds of the cards we're testing, not the "stock" clocks established by the GPU makers for each GPU type.

Peak
pixel
fill rate
(Gpixels/s)
Peak bilinear
texel
filtering
rate
(Gtexels/s)
Peak bilinear
FP16 texel
filtering
rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
GeForce 9500 GT 4.4 8.8 4.4 25.6
GeForce 9600 GSO 6.7 26.6 13.3 38.5
GeForce 9600 GT 11.6 23.2 11.6 62.2
GeForce 9800 GT 9.6 33.6 16.8 57.6
GeForce 9800 GTX+ 11.8 47.2 23.6 70.4
GeForce 9800 GX2 19.2 76.8 38.4 128.0
GeForce GTX 260 16.1 36.9 18.4 111.9
GeForce GTX 280 19.3 48.2 24.1 141.7
Radeon HD 4650 4.8 19.2 9.6 16.0
Radeon HD 4670 6.0 24.0 12.0 32.0
Radeon HD 3850 11.6 11.6 11.6 57.6
Radeon HD 4850 10.0 25.0 12.5 63.6
Radeon HD 4870 12.0 30.0 15.0 115.2
Radeon HD 4870 X2 24.0 60.0 30.0 230.4

Notice the tremendous range we're looking at between the cheapest video cards and the most expensive. The Radeon HD 4870 X2 has just shy of ten times the memory bandwidth of the GeForce 9500 GT—and we're using the more expensive version of the 9500 GT with GDDR3. There is a real sense in which you get what you pay for when you buy a graphics card. The key specs do tend to track with price.

The Radeon HD 4670 is an important baseline for us because, at 80 bucks, it has easily more texture filtering capacity than the pricier Radeon HD 3850—and its new architecture is almost assuredly more efficient, too. Heck, the 4670 has nearly as much filtering capacity as the 4850. The 3850 and 4850, though, both have quite a bit more memory bandwidth, about twice as much. I'm intrigued to see whether the 4670 can overcome that deficit. If it can, at least in part, it will signal something important about the viability of low-end graphics cards.

The 4670's would-be competition from Nvidia, the rebate-driven GeForce 9600 GSO, boasts slightly higher capacities in every category than the 4670. That will make for interesting times. Here's what happens when we test these things with a directed benchmark.

3DMark's color fill rate test measures the graphics card's ability to draw pixels, essentially, and we've found that this test tends to be limited by memory bandwidth more than anything else. The cards are largely true to form here, with the exception that the Radeon HD 4850 edges ahead of the GeForce 9800 GTX+.

The texture fill test is arguably more important for performance in many games, and it's typically less limited by memory bandwidth alone. In fact, the Radeon HD 4670 outperforms both the Radeon HD 3850 and the GeForce 9600 GT in this test, despite having less memory bandwidth. Still, the 4670 can't keep pace with the 4850; the 4850 almost doubles the 4670's texturing throughput, just as it has about double the memory bandwidth.

Peak shader
arithmetic (GFLOPS)
Single-issue Dual-issue
GeForce 9500 GT 90 134
GeForce 9600 GSO 259 389
GeForce 9600 GT 237 355
GeForce 9800 GT 339 508
GeForce 9800 GTX+ 470 705
GeForce 9800 GX2 768 1152
GeForce GTX 260 477 715
GeForce GTX 280 622 933
Radeon HD 4650 384 -
Radeon HD 4670 480 -
Radeon HD 3850 464 -
Radeon HD 4850 1000 -
Radeon HD 4870 1200 -
Radeon HD 4870 X2 2400 -

And no, the units in the texture fill rate test don't seem to track with our expectations at all—they seem to be off by miles. I've contacted the folks at FutureMark about this problem repeatedly, and they've told me repeatedly that the people who might address it are on vacation. This has been going on since, oh, June-ish, so I suggest you apply for your job at FutureMark today. The benefits are excellent.

The table to the left shows the next piece of the GPU performance picture, and increasingly the most important one: shader processing power. We've split these theoretical peak numbers into two columns in order to allow room for a quirk of Nvidia's shader processors: they can issue an additional multiply instruction in certain cases, raising their theoretical peak shader arithmetic capacity by a third. They can't use this additional MUL in every situation, though, and old GeForces can't use it as often as the newer GTX 200 series due to some architectural constraints.

That said, the Radeons have some constraints of their own, including the arguably more difficult instruction scheduling required by their five-ALU-wide superscalar execution units. So, of course, we'll want to measure shader power with a few directed tests, as well.

To set the stage for that, note that the Radeon HD 4670 again is a potential overachiever. Its 480 GFLOPS of peak shader power match the single-issue numbers for the GeForce GTX 260, a much more expensive card. The GeForce 9600 GSO, the would-be competitor to the 4670, trails it by quite a bit, theoretically. However, the 4670's relatively weak memory bandwidth could hold it back.

That appears to be just what happens. Despite having a little more theoretical shader prowess than the Radeon HD 3850, the 4670 trails the 3850 in each one of the shader tests. The GeForce 9600 GSO also has a leg up on the 4670 in each case.

All of this sets the stage nicely for what comes next, which is real game tests. Games will most likely care less about any single one of these performance factors. Instead, each one will stress its own distinct mix of them. The question is: how will our cheap graphics cards, with their obvious strengths and weaknesses, hold up overall?