NVIDIA's beef
NVIDIA was kind enough to allow me time to speak at length with two key employees, Tony Tamasi, the company's General Manager of Desktop Graphics Processors, and Mark Daly, Director of Technical Marketing, who manages the teams responsible for benchmarking and making NVIDIA's graphics technology demos. Daly and Tamasi were very helpful in stating NVIDIA's case against 3DMark03 and very patient in answering my (sometimes-boneheaded) questions. They were also both very consistently "on message," sticking to the company line on 3DMark03 like George Bush sticks to a Karl Rove script on the campaign trail. I mention this fact because it's so very, well, remarkable coming from techie types talking tech.
NVIDIA's problems with 3DMark03 seem to encompass nearly everything about the benchmark. That is, the company sees very little good in the test as it exists now. However, NVIDIA's complaints generally fall into two categories: general, overarching criticisms and specific, technical critiques. NVIDIA's big-picture complaints can be summed up in two points:
- 3DMark03 is a bad benchmark This is a big point with lots of little sub-points, but the complaints all fall easily under this banner. NVIDIA's key contention is that 3DMark03 isn't representative of actual games. Near as I can tell, that means not now, nor ever in the future, although there is some ambiguity on this point. NVIDIA's specific technical criticisms seem to bounce around from talking about now and talking about the future without much discernible pattern. NVIDIA suggests synthetic benchmarks are not a useful component of a graphics performance test suite, and recommends testing only with "actual games."
- Wasted resources Optimizing for 3DMark03, says NVIDIA, pulls critical software engineering resources away from other tasks. Because 3DMark03 isn't representative of actual games, optimizations for 3DMark are in no way beneficial for actual games. What's more, online reviewers and editors who choose to use 3DMark in their performance evaluations create an irresistible need for NVIDIA to keep wasting resources optimizing code paths never used by real applications.
These larger complaints only make sense if NVIDIA's more targeted technical criticisms of 3DMark03 hold up. I won't cover all of the technical complaints in exacting detail, but in truth, NVIDIA's whitepaper essentially makes four main complaints about 3DMark's four game tests. A weighted average of these four tests alone determines the "overall" 3DMark score most users like to compare between systems.
- Not enough multitexturing in game test 1 The first game test is a WWII-era air battle scene supposedly representative of legacy DirectX 7-class games, and much of what's on screen at any given time is simply sky or ground. These elements are made up of very few polygons, and only one texture is applied to the skybox and ground surfaces. As a result, NVIDIA claims, Game 1 is largely a test of single-textured (or pixel) fill rate, which isn't representative of current or future games. Furthermore, 3DMark2001 was more "forward looking" than this test, because it employed multitexturing in its three DX7-class game tests.
- The stencil shadow volumes implementation Game tests 2 and 3 use the same basic rendering paths, and they both use stencil shadow volumes to create a realistic shadowing effect. However, NVIDIA's whitepaper claims 3DMark03's rendering method is "bizarre" because it requires objects to be skinned many times in the vertex shader for each frame rendered:
3DMark03 uses an approach that adds six times the number of vertices required for the extrusion. In our five light example, this is the equivalent of skinning each object 36 times! No game would ever do this. This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.
The paper suggests caching the results of the vertex skinning operation between passes would be more efficient, and more like John Carmack's implementation in Doom III. - Too much pixel shader 1.4 Game tests 2, 3, and 4 all use pixel shader programs based on the pixel shader 1.4 specification from DirectX 8.1. In the case of game tests 2 and 3, pixel shader 1.4 is inappropriate because pixel shader 1.4 "is virtually non-existent in DX8 games." Furthermore, if 1.4 pixel shaders aren't available, the benchmark falls back to pixel shader 1.1 instead of 1.3 in order to render the scenes.
- Not enough DirectX 9 The game 4 test, dubbed "Mother Nature," doesn't use enough of DX9's new features. Only two of the nine pixel shaders use the new PS 2.0 spec; the other seven use PS 1.4. Thus, "the amount of DX9 represented in the 3DMark03 score is negligible. It's not a DX9 benchmark."
While presenting these concerns, Tamasi acknowledged the difficulty of FutureMark's task in constructing a good benchmark, and he expressed a deep skepticism about the feasibility of ever building a good forward-looking synthetic test representative of future games. He pointed out the difficulty of FutureMark's business model, as well. NVIDIA seemed to be concerned that this one company could have so much power in determining the industry's performance metrics. He pointed to the example of the SPEC committee as a possible alternative to FutureMark's approach.
Tamasi also stressed the need for developers to include performance tests in their games, and said NVIDIA's developer relations team has long been encouraging just that and offering resources to help make it possible.
I believe that's a fair summation of NVIDIA's complaints about 3DMark03. These things were, according to NVIDIA, problem enough to prompt NVIDIA to remove itself from FutureMark's beta program and begin discouraging use of the benchmark created by a former partner.