The best way to solve these shader performance disputes, of course, is to test the chips. We have a few tests that may give us some insight into these matters.
The Radeon 2900 XT comes out looking good in 3DMark's vertex shader tests, solidly ahead of the GeForce 8800 GTX. Oddly, though, we've seen similar or better performance in this test out of a mid-range GeForce 8600 GTS than we see here from the 8800 GTX. The GTX may be limited by other factors here or simply not allocating all of its shader power to vertex processing.
This particle test runs a physics simulation in a shader, using vertex texture fetch to store and access the results. Here, the Radeon 2900 XT is slower than the 8800 GTX, but well ahead of the GTS. The Radeon X1950 XTX can't participate since lacks vertex texture fetch.
Futuremark says the Perlin noise test "computes six octaves of 3-dimensional Perlin simplex noise using a combination of arithmetic instructions and texture lookups." They expect such things for become popular in future games for use in procedural modeling and texturing, although procedural texturing has always been right around the corner and never seems to make its way here. If and when it does, the R600 should be well prepared, because runs this shader quite well.
Next up is a series of shaders in ye old ShaderMark, a test that's been around forever but may yet offer some insights.
The Radeon HD 2900 XT lands somewhere north of the GeForce 8800 GTS, but it can't match the full-fledged G80 in ShaderMark generally.
ShaderMark also gives us an intriguing look at image quality by quantifying how closely each graphics cards' output matches that of Microsoft's reference rasterizer for DirectX 9. We can't really quantify image quality, but this does tell us something about the computational precision and adherence to Microsoft's standards in these GPUs.
DirectX 10 has much tighter standards for image quality, and these DX10-class GPUs are remarkably close together, both overall and in individual shaders.
Finally, here's a last-minute addition to our shader tests courtesy of AMD. Apparently already aware of the trash talk going on about the potential scheduling pitfalls of its superscalar shading core, AMD sent out a simple set of DirectX 10 shader tests in order to prove a point. I decided to go ahead and run these tests and present you with the results, although the source of the benchmarks is not exactly an uninterested third party, to say the least. The results are informative, though, because they present some difficult scheduling cases for the R600 shader core. You can make of them what you will. First, the results, and then the test explanations:
1) "float MAD serial" - Dependant Scalar Instructions Basically this test issues a bunch of scalar MAD instructions that are sequentially executed. This way only one out of 5 slot of the super-scalar instruction could be utilized. This is absolutely the worst case that would rarely be seen in the real-world shaders.The GeForce 8800 GTX is just under three times the speed of the Radeon HD 2900 XT in AMD's own worst-case scenario, the float MAD serial with dependencies preventing superscalar parallelism. From there, the R600 begins to look better. The example of the float4 MAD parallel is impressive, since AMD's compiler does appear to be making good use of the R600's potential when compared to G80. The next two floating-point tests make use of the "fat" ALU in the R600, and so the R600 looks quite good.
2) "float4 MAD parallel" - Vector Instructions This test issues 2 sequences of MAD instructions operating on float4 vectors. The smart compiler in the driver is able to split 4D vectors among multiple instructions to fill all 5 slots. This case represents one of the best utilization cases and is quite representative of instruction chains that would be seen in many shaders. This also demontrates [sic] the flexibility of the architecture where not only trivial case like 3+2 or 4+1 can be handled.
3) "float SQRT serial" - Special Function This is a test that utilizes the 5th "supped up" [sic] scalar instruction slot that can execute regular (ADD, MUL, and etc.) instructions along with transcendental instructions.
4) "float 5-instruction issue" - Non Dependant Scalar Instructions This test has 5 different types of scalar instructions (MUL, MAD, MIN, MAX, SQRT), each with it's own operand data, that are co-issued into one super-scalar instruction. This represents a typical case where in-driver shader compiler is able to co-issue instructions for maximal efficiency. This again shows how efficiently instructions can be combined by the shader compiler.
5) "int MAD serial" - Dependant DX10 Integer Instructions This test shows the worst case scalar instruction issue with sequential execution. This is similar to test 1, but uses integer instructions instead of floating point ones.
6) "int4 MAD parallel" - DX10 Integer Vector Instructions Similar to test 2, however integer instructions are used instead of floating point ones.
We get the point, I think. Computationally, the R600 can be formidable. One worry is that these shaders look to be executing pure math, with no texture lookups. We should probably talk about texturing rather than dwell on these results.