The theoryand practice
You've already heard me ramble on at length about texture filtering and memory bandwidth, and you may be wondering why. Well, specs are pretty important in graphics cards, even to this day. No, they're not destinya more efficient architecture might outperform a less efficient one, even if the latter had higher peak performance in theory. In fact, it happens all the time. But constraints like memory bandwidth do tend to dictate relative performance, especially among similar products from the same GPU maker. Below, I've compiled some key numbers for the cards we're testing and a few of the higher-end ones we're not, so you can get a sense of the landscape. Please note that these numbers are based on the actual clock speeds of the cards we're testing, not the "stock" clocks established by the GPU makers for each GPU type.
|
Peak pixel fill rate (Gpixels/s) |
Peak bilinear texel filtering rate (Gtexels/s) |
Peak bilinear FP16 texel filtering rate (Gtexels/s) |
Peak memory bandwidth (GB/s) |
|
| GeForce 9500 GT | 4.4 | 8.8 | 4.4 | 25.6 |
| GeForce 9600 GSO | 6.7 | 26.6 | 13.3 | 38.5 |
| GeForce 9600 GT | 11.6 | 23.2 | 11.6 | 62.2 |
| GeForce 9800 GT | 9.6 | 33.6 | 16.8 | 57.6 |
| GeForce 9800 GTX+ | 11.8 | 47.2 | 23.6 | 70.4 |
| GeForce 9800 GX2 | 19.2 | 76.8 | 38.4 | 128.0 |
| GeForce GTX 260 | 16.1 | 36.9 | 18.4 | 111.9 |
| GeForce GTX 280 | 19.3 | 48.2 | 24.1 | 141.7 |
| Radeon HD 4650 | 4.8 | 19.2 | 9.6 | 16.0 |
| Radeon HD 4670 | 6.0 | 24.0 | 12.0 | 32.0 |
| Radeon HD 3850 | 11.6 | 11.6 | 11.6 | 57.6 |
| Radeon HD 4850 | 10.0 | 25.0 | 12.5 | 63.6 |
| Radeon HD 4870 | 12.0 | 30.0 | 15.0 | 115.2 |
| Radeon HD 4870 X2 | 24.0 | 60.0 | 30.0 | 230.4 |
Notice the tremendous range we're looking at between the cheapest video cards and the most expensive. The Radeon HD 4870 X2 has just shy of ten times the memory bandwidth of the GeForce 9500 GTand we're using the more expensive version of the 9500 GT with GDDR3. There is a real sense in which you get what you pay for when you buy a graphics card. The key specs do tend to track with price.
The Radeon HD 4670 is an important baseline for us because, at 80 bucks, it has easily more texture filtering capacity than the pricier Radeon HD 3850and its new architecture is almost assuredly more efficient, too. Heck, the 4670 has nearly as much filtering capacity as the 4850. The 3850 and 4850, though, both have quite a bit more memory bandwidth, about twice as much. I'm intrigued to see whether the 4670 can overcome that deficit. If it can, at least in part, it will signal something important about the viability of low-end graphics cards.
The 4670's would-be competition from Nvidia, the rebate-driven GeForce 9600 GSO, boasts slightly higher capacities in every category than the 4670. That will make for interesting times. Here's what happens when we test these things with a directed benchmark.


3DMark's color fill rate test measures the graphics card's ability to draw pixels, essentially, and we've found that this test tends to be limited by memory bandwidth more than anything else. The cards are largely true to form here, with the exception that the Radeon HD 4850 edges ahead of the GeForce 9800 GTX+.
The texture fill test is arguably more important for performance in many games, and it's typically less limited by memory bandwidth alone. In fact, the Radeon HD 4670 outperforms both the Radeon HD 3850 and the GeForce 9600 GT in this test, despite having less memory bandwidth. Still, the 4670 can't keep pace with the 4850; the 4850 almost doubles the 4670's texturing throughput, just as it has about double the memory bandwidth.
| |||||||||||||||||||||||||||||||||||||||||||||||
And no, the units in the texture fill rate test don't seem to track with our expectations at allthey seem to be off by miles. I've contacted the folks at FutureMark about this problem repeatedly, and they've told me repeatedly that the people who might address it are on vacation. This has been going on since, oh, June-ish, so I suggest you apply for your job at FutureMark today. The benefits are excellent.
The table to the left shows the next piece of the GPU performance picture, and increasingly the most important one: shader processing power. We've split these theoretical peak numbers into two columns in order to allow room for a quirk of Nvidia's shader processors: they can issue an additional multiply instruction in certain cases, raising their theoretical peak shader arithmetic capacity by a third. They can't use this additional MUL in every situation, though, and old GeForces can't use it as often as the newer GTX 200 series due to some architectural constraints.
That said, the Radeons have some constraints of their own, including the arguably more difficult instruction scheduling required by their five-ALU-wide superscalar execution units. So, of course, we'll want to measure shader power with a few directed tests, as well.
To set the stage for that, note that the Radeon HD 4670 again is a potential overachiever. Its 480 GFLOPS of peak shader power match the single-issue numbers for the GeForce GTX 260, a much more expensive card. The GeForce 9600 GSO, the would-be competitor to the 4670, trails it by quite a bit, theoretically. However, the 4670's relatively weak memory bandwidth could hold it back.




That appears to be just what happens. Despite having a little more theoretical shader prowess than the Radeon HD 3850, the 4670 trails the 3850 in each one of the shader tests. The GeForce 9600 GSO also has a leg up on the 4670 in each case.
All of this sets the stage nicely for what comes next, which is real game tests. Games will most likely care less about any single one of these performance factors. Instead, each one will stress its own distinct mix of them. The question is: how will our cheap graphics cards, with their obvious strengths and weaknesses, hold up overall?
| Friday night topic: The trouble with Best Buy | 144 |