FutureMark's answer
Three days after 3DMark03's release, FutureMark published its response to NVIDIA's criticisms (as reported by the enthusiast press). This paper restates the case for synthetic benchmarks generally as a part of overall 3D performance evaluations, and it addresses some of NVIDIA's specific complaints. Let's leave the general arguments about benchmarking aside for now and look at FutureMark's response to the specific tech issues.

  • Not enough multitexturing in game test 1 — FutureMark contends game test 1 is typical of current games in using a single texture for a skybox, and lists several games as examples: Crimson Skies, IL-2 Sturmovik, and Star Trek: Bridge Commander. The paper also shows signs of a past conflict with NVIDIA over this issue:
    As this issue was brought up already during 3DMark03 development, we did a test by adding a second texture layer to the skybox. The performance difference stayed within the error margin (3%), and in our opinion the additional layer did not significantly add to the visual quality of the test. Thus, there were no game development or technical reasons for implementing a multitextured skybox.
    Obviously, FutureMark and NVIDIA had indeed been at odds over this issue.

  • The stencil shadow volumes implementation — FutureMark takes on NVIDIA's whitepaper directly here, arguing that the efficiency of vertex shader skinning justifies its approach. What's more, NVIDIA's example doesn't quite fit FutureMark's implementation, as explained in sparkling Finnish English:
    Since each light is performance-wise expensive, game developers have level designs optimized so that as few lights as possible are used concurrently on one character. Following this practice, 3DMark03 sometimes uses as many as two lights that reach a character concurrently, not five as mentioned in some instances.
    ...instances like, perhaps, NVIDIA's whitepaper? Hmm.

    To back up its claims, FutureMark suggests running 3DMark03 in different resolutions, to see whether game tests 2 and 3 are bottlenecked by vertex shader performance. "If the benchmark was vertex shader limited, you would get the same score on all runs, since the amount of vertex shader work remains the same despite the resolution change."

    That's easy enough. Let's have a look.

    Indeed, the game test results scale with fill rate, suggesting vertex shaders are not a primary performance limiter here, especially in the case of the DirectX 9-class GPUs. This fact may not completely justify FutureMark's stencil shadow volumes implementation, but it certainly shoots down some claims made in NVIDIA's whitepaper.

  • Too much pixel shader 1.4 — Because pixel shader 1.4 is a standard forged by ATI and Microsoft to accomodate ATI's R200-series chips, we looked at 3DMark03's use of pixel shader 1.4 with some skepticism. After all, other GPUs like the Matrox Parhelia and SiS Xabre support PS1.3, but of the non-DX9 chips, only ATI hardware supports PS1.4. Rather than refer to FutureMark's whitepaper, let me offer our question for FutureMark's Tero Sarkkinen and his direct response:
    TR: Why did you use pixel shader 1.4 with a fallback to 1.1 instead of 1.3? Doesn't this choice unfairly disadvantage NVIDIA cards and other non-ATI GPUs?

    Tero Sarkkinen: Firstly, when we design a benchmark, we do not care which manufacturer happens to have what type of hardware out there. We follow DirectX standard and what game developers are doing. Pixel shader 1.4 is NOT an ATI specific technology, it is technology that belongs in the DirectX standard.

    Fallback to 1.3 (instead of fallback to 1.1) would not have changed the performance at all. We tried it. There is very very little change from 1.1 to 1.2 to 1.3, the real change comes from 1.3 to 1.4. The 1.4 pixel shader only needs a single rendering path for each light (and the depth pass, which is similar to how Doom3 works). Note that 1.3 pixel shaders only add a few instructions to 1.1 pixel shaders. However, 1.4 pixel shaders allow 6 texture stages, compared to 4 in 1.1 (or in 1.3) pixel shaders. 1.4 shaders further allow each texture to be read twice.

    That's FutureMark's story. We'll explore the issue of pixel shader versions in more depth below.

  • Not enough DirectX 9 — FutureMark contends the game test 4 uses an appropriate mix of pixel shader types. "Because each shader model is a superset of the prior shader models, this will be very efficient on all DirectX 9 hardware." Also, the scene's most striking elements, the water, sky, and leaves, use 2.0 shaders.

    Furthermore, FutureMark claims the test's workload is appropriate for DX9-class hardware, with an average of 780,000 polys and "well over 100MB of graphics content" per frame. The paper states with confidence that "there will be a clear correlation between 3DMark03 and game benchmark results" once 3D games start using pixel and vertex shaders more thoroughly.

FutureMark defends the usefulness of its benchmark and claims the test's impartiality is key. The implication is clear: sometimes it's not easy being a benchmark house that produces unbiased products.