Single page Print

Doing the math

Peak
pixel
fill rate
(Gpixels/s)
Peak bilinear
texel
filtering
rate
(Gtexels/s)
Peak bilinear
FP16 texel
filtering
rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
Peak
shader
arithmetic
(GFLOPS)
GeForce 9600 GT 10.4 20.8 10.4 57.6 208
Palit GeForce 9600 GT 11.2 22.4 11.2 64.0 224
GeForce 8800 GT 9.6 33.6 16.8 57.6 336
GeForce 8800 GTS 10.0 12.0 12.0 64.0 230
GeForce 8800 GTS 512 10.4 41.6 20.8 62.1 416
GeForce 8800 GTX 13.8 18.4 18.4 86.4 346
GeForce 8800 Ultra 14.7 19.6 19.6 103.7 384
GeForce 9800 GTX 10.8 43.2 21.6 70.4 432
GeForce 9800 GX2 19.2 76.8 38.4 128.0 768
Radeon HD 2900 XT 11.9 11.9 11.9 105.6 475
Radeon HD 3850 10.7 10.7 10.7 53.1 429
Radeon HD 3870 12.4 12.4 12.4 72.0 496
Diamond Radeon HD 3870 1GB 13.3 13.3 13.3 55.7 531
Radeon HD 3870 X2 26.4 26.4 26.4 115.2 1056

The table above shows how Diamond's HD 3870 1GB stacks up in theoretical terms. Its higher GPU core clock grants it more fill rate and shader power than the stock HD 3870, although its lower memory clock cuts bandwidth considerably. The 1GB card's memory bandwidth is still comparable to that of its GeForce 9600 GT and 8800 GT competition, though.

The more interesting question here involves overall performance. Not to give too much away, but the 3870 has somewhat underachieved versus the 9600 GT and 8800 GT given its raw shader FLOPS capacity. Why is that? One possibility is that the RV670 GPU's five-wide superscalar execution units don't process data as efficiently as Nvidia's scalar units. I'm not sold on that explanation, though. AMD has implemented all sorts of voodoo magic in its driver compiler, including serializing a pixel shader program for execution on a that fifth ALU while another executes in vector fashion on the other four ALUs. Also, the performance of the 9600 GT argues against shader power being a primary constraint in today's games. The more likely explanations involve the RV670's relatively weak texturing capacity and the fact that R6x0-series GPUs—either by design or because of a rumored flaw in the ROP logic—cannot perform the resolve step for multisampled antialiasing in their ROP hardware; they must use the shader core for this task.

Another possibility, I suppose, is that the RV670 doesn't compress and manage memory as efficiently as the GeForces do. If so, Diamond's 1GB card may be an answer.

3DMark lets us measure performance in some of our theoretical categories. In actuality, sheer pixel throughput tends to be limited by memory bandwidth, which is why Diamond's 1GB card scores lower in single-texture fill rate than the 512MB GDDR4 version of the HD 3870. Multitextured fill rate hits no such limits; the 1GB card nearly reaches its theoretical peak capacity. However, that capacity is appreciably lower than the GeForce 9600 GT's, let alone the 8800GT's.

The 3870 1GB shows its shader power, mixing it up with the GeForce cards from test to test. One intriguing result: the stock Radeon HD 3870's performance suffers in the simple vertex shader test, likely due to GDDR4's higher access latencies. With its GDDR3 memory, Diamond's 3870 1GB avoids that fate.

As ever, these results don't track perfectly with performance in actual games, although they do give us some insight. For gaming performance, we have... actual games.