Newest insights invalidate parts of the theory in this article. We stated, that R420 is clearly slower in discarding pixels during the Z/Stencil-Pass. As a result, the whole rendering process would be slowed down. This doesn't seem to apply as strongly as we stated. The NV40 might have a lower peak-performance when discarding pixels, but possesses finer granularity. In the end there might very well be a small advantage for the NV40. There's also some situations where R420 can't use its HierarchicalZ mechanism.So the theories about some sort of inherent problem with ATI hardware seem to be off base. I recently asked someone at ATI about 3DCenter's theories, and they said the 3DCenter article wasn't quite right. Looks like they weren't kidding. With the advent of Catalyst A.I., it's possible ATI could catch up eventually.
Still the differences in performance per clock can't be fully explained with just that. Nvidia seems to have used optimizations in their drivers, similar to those which already helped to win Q3A benches for quite a long time now. driver does intercept and replace CPU demanding calls with less demanding ones. Special optimizations probably favor both, the Q3A and Doom 3 engine. In addition, Nvidia could have shown off quite some creativity in terms of replacing shaders. The company seems to have found a solution that allows for replacement of shaders without affecting image quality.