Single page Print

Pixel shader performance
Pixel shader performance is closely related to fill rate, occlusion detection, and memory bandwidth, but it's not the same thing. Reasonably complex pixel shader programs can make the VPU the primary limiting factor in performance, and a good pixel shader implementation can be several times faster at a given task than a competing one.

The Radeon 9700's pixel shaders, which meet the requirements for DirectX 9's version 2.0 pixel shaders, are much more capable than the DX8-class pixel shaders in all of the competing cards we're testing. The 9700's pixel shaders can execute 64 color operations and 32 texture ops in a single rendering pass, which is four to eight times what the DX8-class cards can achieve. Also, the 9700 has 96 bits of precision in its pixel shaders, while older chips' pixel shaders have no more than 48 bits of precision. More importantly for performance in current apps, the Radeon 9700 has eight pixel shaders, while the rest of the pack has only four.

This advantage in pixel shading capacity results in markedly better performance on synthetic tests, such as 3DMark's DirectX 8.x pixel shader benchmarks.

The Radeon 9700 Pro nearly doubles the scores of the other cards, which is about what we'd expect in this case.

We'll also use NVIDIA's ChameleonMark to measure pixel shader performance. Although we've included Radeon 8500 scores, please note that ChameleonMark doesn't produce the proper output on this card in the Glass and Shiny tests because the 8500 doesn't support the cubemap function the program uses (ATI has its own implementation that works fine, however). The Radeon 9700 and Parhelia both run these tests flawlessly.

ATI's new card utterly dominates NVIDIA's pixel shader benchmark, especially in the tests where the cubemap function is employed. Still, true DirectX 9 applications with longer, more complex shader programs created in high-level shading languages should take even better advantage of the Radeon 9700's pixel shading power. The R9700 Pro's true increase in computational capacity isn't fully reflected in these tests, but they do give us a taste of things to came.