Single page Print
Polygon throughput and vertex shader performance

A block diagram
of a Radeon 9700
vertex shader unit.
Source: ATI.

The Radeon 9700 has four DX9-class vertex shader units running at the chip's 325MHz clock rate. That's twice the number of DX8 shader units in the GF4 Ti and Radeon 8500 chips, although vertex shader implementations vary in their performance. Matrox's Parhelia has four DX9-class vertex shaders running at the chip's 220MHz clock speed. DX9's vertex shader 2.0 spec incorporates support for flow control (branching, loops, and subroutines) in vertex programs, but no current apps can use this capability.

Each of the Radeon 9700 chip's four vertex shader pipelines has vector and scalar processing units, so they can process the two types of operations in parallel.

We'll test vertex shader performance first with Matrox's SharkMark. SharkMark was written by Matrox to show off the power of Parhelia's quad vertex shader units, so it does a nice job taking advantage of the Radeon 9700's four vertex shaders, as well.

In SharkMark, the Radeon 9700 delivers exactly twice the vertex processing ability of a GeForce4 Ti 4600, and it outruns the Parhelia, too. 3DMark2001 also includes a vertex shader test; let's see how the 9700 performs there.

Again, the 9700 shows roughly twice the vertex processing power of competing chips. The Parhelia drops back into the back in 3DMark for some reason, possibly because 3DMark's vertex shader programs aren't as complex as SharkMark's—at least, to my eye they certainly aren't.

Next, we'll look at traditional DirectX 7/OpenGL style transform and lighting. I believe all of these cards implement T&L engines as vertex shader programs rather than using dedicated circuitry for T&L. Traditional transform and lighting performance is still the key to good performance in most current apps.

Here again, the 9700 leads the pack, although not by quite as much. Legacy T&L performance will matter less and less over time.

AGP write performance
For the sake of completeness, I'll include another round of tests of AGP texture download performance. What we're talking about here is the ability to move rendered images from a graphics card's local memory over the AGP bus into main memory. Games don't generally have a need to transfer data to main memory, but applications like video processing tools and high-quality rendering programs do. Please see my article on this subject if you want to know more.

Once again, none of these cards move data back over the AGP bus at anything close to an acceptable rate for real-time graphics applications. This problem can probably be fixed in software. In fact, these especially low transfer rates aren't as much of a problem in Windows 98 or in OpenGL. But in Win2K/XP with Direct3D, AGP texture download rates are slow as molasses.