Doing the math
|GeForce 9600 GT||10.4||20.8||10.4||57.6||208|
|Palit GeForce 9600 GT||11.2||22.4||11.2||64.0||224|
|GeForce 8800 GT||9.6||33.6||16.8||57.6||336|
|GeForce 8800 GTS||10.0||12.0||12.0||64.0||230|
|GeForce 8800 GTS 512||10.4||41.6||20.8||62.1||416|
|GeForce 8800 GTX||13.8||18.4||18.4||86.4||346|
|GeForce 8800 Ultra||14.7||19.6||19.6||103.7||384|
|GeForce 9800 GTX||10.8||43.2||21.6||70.4||432|
|GeForce 9800 GX2||19.2||76.8||38.4||128.0||768|
|Radeon HD 2900 XT||11.9||11.9||11.9||105.6||475|
|Radeon HD 3850||10.7||10.7||10.7||53.1||429|
|Radeon HD 3870||12.4||12.4||12.4||72.0||496|
|Diamond Radeon HD 3870 1GB||13.3||13.3||13.3||55.7||531|
|Radeon HD 3870 X2||26.4||26.4||26.4||115.2||1056|
The table above shows how Diamond's HD 3870 1GB stacks up in theoretical terms. Its higher GPU core clock grants it more fill rate and shader power than the stock HD 3870, although its lower memory clock cuts bandwidth considerably. The 1GB card's memory bandwidth is still comparable to that of its GeForce 9600 GT and 8800 GT competition, though.
The more interesting question here involves overall performance. Not to give too much away, but the 3870 has somewhat underachieved versus the 9600 GT and 8800 GT given its raw shader FLOPS capacity. Why is that? One possibility is that the RV670 GPU's five-wide superscalar execution units don't process data as efficiently as Nvidia's scalar units. I'm not sold on that explanation, though. AMD has implemented all sorts of voodoo magic in its driver compiler, including serializing a pixel shader program for execution on a that fifth ALU while another executes in vector fashion on the other four ALUs. Also, the performance of the 9600 GT argues against shader power being a primary constraint in today's games. The more likely explanations involve the RV670's relatively weak texturing capacity and the fact that R6x0-series GPUseither by design or because of a rumored flaw in the ROP logiccannot perform the resolve step for multisampled antialiasing in their ROP hardware; they must use the shader core for this task.
Another possibility, I suppose, is that the RV670 doesn't compress and manage memory as efficiently as the GeForces do. If so, Diamond's 1GB card may be an answer.
3DMark lets us measure performance in some of our theoretical categories. In actuality, sheer pixel throughput tends to be limited by memory bandwidth, which is why Diamond's 1GB card scores lower in single-texture fill rate than the 512MB GDDR4 version of the HD 3870. Multitextured fill rate hits no such limits; the 1GB card nearly reaches its theoretical peak capacity. However, that capacity is appreciably lower than the GeForce 9600 GT's, let alone the 8800GT's.
The 3870 1GB shows its shader power, mixing it up with the GeForce cards from test to test. One intriguing result: the stock Radeon HD 3870's performance suffers in the simple vertex shader test, likely due to GDDR4's higher access latencies. With its GDDR3 memory, Diamond's 3870 1GB avoids that fate.
As ever, these results don't track perfectly with performance in actual games, although they do give us some insight. For gaming performance, we have... actual games.
|The TR Podcast 175: the Zen of chipmaking and ARM's Cortex-A72 revealed||4|
|Elon Musk lays out vision for a battery-powered future||103|
|Inside ARM's Cortex-A72 microarchitecture||34|
|Asus' 144Hz MG279Q monitor may top out at 90Hz with FreeSync||56|
|Deal of the week: A Bay Trail netbook for $161, free case fans, and more||18|
|DirectX 12 Multiadapter shares work between discrete, integrated GPUs||95|
|Gigabyte's 9-series motherboards are Broadwell-ready||44|
|The TR Podcast will be live on Twitch shortly!||3|
|AMD delays FreeSync support for multi-GPU systems||40|