More on the GeForce FX architecture

We've been trying to sort out how many rendering pipelines the GeForce FX has, and I thought I should post an update with some relevant links and some new information from NVIDIA. First, if you haven't yet, you'll want to read Dave Salvator's article on the GeForce FX's architecture, in which NVIDIA's Tony Tamasi and David Kirk dance around the issues like virtuoso performers. Nonetheless, the article is revealing. Here's Kirk on the number of pipes in the FX:
Pipes don't mean as much as they used to. . . . There are really 32 functional units that can do things in various multiples. We don't have the ability in NV30 to actually draw more than eight pixels per cycle. It's going to be a less meaningful question as we move forward...[GeForceFX] isn't really a texture lookup and blending pipeline with stages and maybe loop back anymore. It's a processor, and texture lookups are decoupled from this hard-wired pipe.
Kirk said that back in November, believe it or not. Recently, speculation about the FX was possibly started by Dave Baumann's post over at Beyond3D. Fuad (Mike gets mad when I call him "Fraud") over at The Inq got the first confirmation from NVIDIA that the FX can't do traditional rendering at 8 pixels per clock.

I asked NVIDIA a pair of follow-up questions after their recent confirmation for us that the FX architecture isn't exactly what we'd thought. These are my own feeble attempts to evoke a useful answer from the NVIDIA PR machine, so take them for what you will. The first addresses concerns about relatively slow pixel shader performance on the FX, which has been coupled with speculation about reduced clock-for-clock capacity in higher color modes:

TR: Is pixel shading done at 8 pixels per clock in all color modes, including 128-bit FP color?

NVIDIA: Yes. All pixel shading is done at 8 ops per clock at 128-bit color.

In the second question, I flail about wildly trying to smack the pinata of truth about NVIDIA engineering:
TR: How exactly is the GFFX arranged internally? Does it have something like four "full" rendering pipes and four partial/adjunct pipes with facilities for shading, stencil, and texture ops? Does it have parts of the pipeline clocked at twice the speed of the rest of the chip, like the simple ALUs in the Pentium 4? Or is it a more conventional 4 pipe by 2 texture unit design with beefed up pixel shaders and Z/stencil-test faculties?

NVIDIA: It renders:

8 z pixels per clock
8 stencil ops per clock
8 textures per clock
8 shader ops per clock
4 color + z pixels per clock with 4x multisampling enabled

It is architected to perform those functions.

Basically, its 8 pipes with the exception of color blenders for traditional ROP operations, for which it has hardware to do 4 pixels per clock for color & Z. It has 8 "full" pipes that can blend 4 pixels per clock with color.

The long and the short of these revelations about the FX's architecture is this: At least in current games, the FX performs more like a conventional 4 x 2 pipe design. Only time and testing will tell how much of a disadvantage this limitation will become for the FX, especially when compared to competing chips from ATI and others, of course. We'll keep watching for new developments.
Tip: You can use the A/Z keys to walk threads.
View options

This discussion is now closed.