Pixel shaders
The NV30's pixel shaders can execute shader programs with as many as 1024 instructions in a single rendering pass with dynamic branching and looping. This ability should allow the NV30 to handle complex shader effects with grace. The chip's limits are quite a bit higher than the 64-instruction limit imposed by DirectX 9's pixel shader 2.0 specification.
By contrast, the Radeon 9700 Pro conforms largely to the PS 2.0 specification; it can execute a maximum 64 instructions per pass, with few exceptions. More complex shader programs will only be possible on the 9700 with the aid of a high-level shading language, which can break down effects into multiple rendering passes. The overhead associated with multiple rendering passes can harm performance, as well.
ATI addressed this limitation in the R350 chip by adding an "F-buffer" that stores intermediate pixel fragment values between passes through the pixel shaders; no trip through the rest of the graphics pipeline is required. With the F-buffer, the R350 can execute pixel shader programs of any length. You can read more about it in our Radeon 9800 Pro review.
We'll have to see whether NVIDIA's approach is superior to ATI's when we have tests available to us that compile to our test hardware from high-level shading languages. Currently, the benchmarks available to us are limited to basic DX8 and DX9 shader variants, from 1.1 to 2.0.
Differences between ATI's and NVIDIA's pixel shader implementations add more complexity to the task of comparing the two companies' chips. The NV30's pixel shaders are composed of an array of arithmetic logic units, and the bit depth of pixel shader data affects clock-for-clock performance. In order to balance performance versus precision, NV30 offers support for two floating-point color bit depths, 16 bits per color channel (or 64 bits total) and 32 bits per color channel (or 128 bits total). ATI split the difference between the two; the R3x0-series pixel shaders process data at 24 bits per color channel, or 96 bits total, even when using 128-bit framebuffer modes.
NVIDIA's approach offers developers more flexibility. They can choose higher-precision datatypes when needed, and they can fall back to 64-bit color when it is sufficient. However, ATI's 96-bit compromise isn't necessarily a bad one for this first generation of chips with high-color capabilities. The 96-bit limitation will probably be more of a disadvantage in professional rendering applications where the R3x0 chips will live on FireGL cards instead of Radeons.
I should mention one more thing about the GeForce FX's pixel shaders. All of this funky talk about "arrays of computational units" got me wondering whether the FX architecture doesn't share computational resources between its vertex and pixel shaders, which is likely to happen as graphics hardware evolves. So I asked NVIDIA, and the answer was straightforward:
The GeForce FX architecture uses separate, dedicated computation units for vertex shading versus pixel shading. The benefit is that there is never a trade-off between vertex horsepower and pixel horsepower.In future GPUs, pixel and vertex shader instruction sets are likely to merge, and the walls between these two units may begin to dissolve. This change may enable some impressive new vertex-related effects. However, we're not there yet.
Now, let's test what we can, which is DX8 and early DX9 pixel shader performance.
![]()
![]()



Despite its advantage in clock speed, the GeForce FX comes out behind the Radeon 9800 Pro in all of our DirectX 8-class pixel shader tests. The FX decisively outperforms the 9700 Pro in one benchmark, 3DMark 2001's pixel shader test, but otherwise runs behind both ATI chips.
Our sole DirectX 9-class shader test is 3DMark03, and here we come to a complication. NVIDIA has apparently optimized its 43.45 drivers specifically for 3DMark03, possibly by cutting pixel shader precision from 128 to 64 bitsor maybe less in some cases. (NVIDIA has given itself some cover on this front by raising objections to 3DMark03's testing methodology.) To illustrate the performance difference, I've also tested the GeForce FX with revision 43.00 drivers, which don't include the 3DMark03-specific optimizations. Both sets of results are presented below.
3DMark03's pixel shader 2.0 test creates procedural volumetric textures of wood and marble via pixel shader programs. This is but one use of pixel shaders, but it may be common in future apps and games.

With the 43.45 drivers, the GeForce FX is very competitive with the ATI cards. With the 43.00 drivers, the GeForce FX can't keep up. We'll explore the issue of 3DMark03 optimizations in more detail below.
From what we can tell, though, the GeForce FX isn't significantly more powerful than the ATI chips when running DirectX 8 or early DX9-class pixel shader programs. In NVIDIA's own ChameleonMark, it's consistently slower than the ATI cards, in fact.
