The GeForce4 Ti
NVIDIA describes the features of the "Titanium" version of the GeForce4 with a number of marketing terms, including Lightspeed Memory Architecture II, nFinite FX II Engine, Accuview AA, and nView dual-display support. To better understand how these pieces fit into the whole, have a look at NVIDIA's diagram of the GeForce4 Ti chip:
By and large, this chip is very similar to the GeForce3. The most important changes are simple ones: the chip now runs at speeds as high as 300MHz, and memory runs as fast as 325MHz (or 650MHz DDR). GeForce4 cards will come with memory chips in a ball-grid array (BGA) package, which should cut costs and improve heat dissipation. ATI's new 128MB Radeon 8500 cards will have BGA memory, too. It's hip.
GeForce4 Ti cards will come in several flavors. The GF4 Ti 4600 will feature 128MB of memory and the aforementioned 300/325MHz core/memory clock speed. At these speeds, the Ti 4600 cards will be unquestionably faster, in terms of fill rate, than ATI's top-end Radeon 8500.
GF4 Ti 4400 cards will be clocked slower, but NVIDIA's not stating exactly how much slower; it may be up to the discretion of card makers.
The GF4 Ti chip itself weighs in at 63 million transistors, which is hefty by any measure. Still, the GeForce3 was 57 million transistors, which ain't exactly small. (An Athlon XP processor, for reference, is only about 37.5 million transistors.) Like the GeForce3, the GeForce4 Ti is manufactured on a 0.15-micron fab process.
So what did NVIDIA add with those six million new transistors? Nothing radically new. NVIDIA's engineers have tweaked a few things here and there, however. With some pointed questioning and a little work, we've been able to determine what most of the changes are.
- Vertex shaders The vertex shader unit itself is the same as the one in the GeForce3, but the GF4 Ti includes two of 'em. That puts the GF4 Ti on par with the Xbox GPU and ATI's Radeon 8500. We've already seen how a second vertex shader can help: the Radeon 8500 is an absolute beast in 3DMark 2001.
Also, since the GeForce3's fixed-function T&Luseful for running apps written to take advantage of GeForce/GeForce2-class T&L engineis implemented as a vertex program, the GeForce4 Ti ought to be able to accelerate fixed-function T&L very well.
- Pixel shaders NVIDIA's pixel shaders are perhaps most in need of improvement. Most of us are familiar with the controversy over pixel shader versions between DirectX 8 and 8.1. ATI lobbied for some changes to the pixel shader API to better support the Radeon 8500, and NVIDIA countered that ATI's "new" pixel shader versions didn't add any real functionality.
However, ATI has since demonstrated some convincing advantages of its pixel shader implementation. I keep going back to an old .plan update by John Carmack, written at the introduction of the GeForce3, that I think has since been overlooked. What he said is enlightening.
Now we come to the pixel shaders, where I have the most serious issues. I can just ignore this most of the time, but the way the pixel shader functionality turned out is painfully limited, and not what it should have been.ATI's pixel shaders do offer more general dependent texture addressing, and their superior flexibility and precision allows for more convincing effects. (See here for an example.) The long and short of it: NVIDIA's pixel shaders could stand some improvements.
DX8 tries to pretend that pixel shaders live on hardware that is a lot more general than the reality.
Nvidia's OpenGL extensions expose things much more the way they actually are: the existing register combiners functionality extended to eight stages with a couple tweaks, and the texture lookup engine is configurable to interact between textures in a list of specific ways.
I'm sure it started out as a better design, but it apparently got cut and cut until it really looks like the old BumpEnvMap feature writ large: it does a few specific special effects that were deemed important, at the expense of a properly general solution.
Yes, it does full bumpy cubic environment mapping, but you still can't just do some math ops and look the result up in a texture. I was disappointed on this count with the Radeon as well, which was just slightly too hardwired to the DX BumpEnvMap capabilities to allow more general dependent texture use.
Enshrining the capabilities of this mess in DX8 sucks. Other companies had potentially better approaches, but they are now forced to dumb them down to the level of the GF3 for the sake of compatibility. Hopefully we can still see some of the extra flexibility in OpenGL extensions.
Apparently, the GeForce4 Ti's pixel shaders have been improved incrementally in order to address some of these concerns. There are a few new pixel shader instructions in the GF4 Ti, including dependent texture lookups into both 2D and 3D textures and Z-correct bump mapping. These changes may or may not bring NVIDIA's pixel shaders up to par with ATI's, but the addition of more general dependent texture addressing is a step in the right direction, even if it isn't a major improvement over the GeForce3.
- Texturing The GeForce3 can apply two textures per pixel in a single clock cycle, but through a "loop back" method it can apply an additional two textures per rendering pass. That handy trick improves performance and image quality by keeping the chip from having to resort to multi-pass rendering. The GF4 Ti offers improved support for three and four textures per pass. NVIDIA has optimized the GF4 Ti chip's caches and pipelines to improve this "loopback" feature, especially when anisotropic and/or triliear filtering is in use.
- LMA-II The GF4 Ti includes the same crossbar memory controller as the GeForce3. NVIDIA has tweaked its Z-buffer compression routine so that it now provides lossless compression as tight as an 8:1 ratio. (At its launch, NVIDIA claimed 4:1 for the GeForce3's Z compression.)
Also, NVIDIA says the GF4 Ti's occlusion detection is improved, though when pressed about how it's improved, all they would say is that it's "more aggressive." We'll seek more detail if we can. These are hardware-level changes, though, not just driver tweaks.