JUST ABOUT ACCORDING TO schedule, NVIDIA has unveiled a top-to-bottom refresh of its entire desktop graphics product line. The new NVIDIA chips, dubbed GeForce4 Ti and GeForce4 MX, bring with them a number of new features and better performance, which is always a good thing. However, they do little to advance the state of the art in 3D graphics, nor has the GeForce4 Ti unambiguously recaptured the technology lead from ATI’s Radeon 8500.
As always, the GeForce4 chips have been launched with much fanfareNVIDIA knows how to throw a mean partyand with a torrent of new “marketing terms” to help describe the chip’s technology to the public. And as always, our analysis of the GeForce4 will go beyond the marketing terms to give you the skinny on the GeForce4 scene. Read on to find out what’s newand what’s notin NVIDIA’s latest GPUs.
The modular approach
NVIDIA’s rise to the top in graphics has been fueled by the company’s aggressive six-month product development cycle. Typically, NVIDIA launches a truly new product once a year, with a minor refresh (usually involving higher clock speeds) at six months. One reason this approach has been successful for NVIDIA is because its chip designs are modular, so functional units on a chip can be reused if needed. Lately, NVIDIA has taken this approach to an extreme, integrating and interchanging a number of technologies across the GeForce3, the Xbox GPU, and the nForce core logic chipset. Doing so has allowed the company to introduce a number of extremely complex chips in relatively short order.
By my watch, it’s time for NVIDIA to launch another truly new product. Instead, NVIDIA has elected to introduce two brand-new chips, and though they share some technology, they’re fundamentally very different from one another. Naturally, the GeForce 4 MX is the low-end chip, and the GeForce4 Ti is the new high-end chip. These new chips do incorporate some new and improved functional units, but they’re not what you might be expecting from NVIDIA a year after the launch of the GeForce3.
We’ll look at each chip in some detail to see which bits have been recycled from previous products. Before we dive in deep, however, we’d better pull out the trusty ol’ chip chart to see how these chips stack up in terms of basic pixel-pushing power.
Here’s how the hardware specs match up:
|Core clock (MHz)||Pixel pipelines||Peak fill rate (Mpixels/s)||Texture units per pixel pipeline||Peak fill rate (Mtexels/s)||Memory clock (MHz)||Memory bus width (bits)||Peak memory bandwidth (GB/s)|
|Radeon 64MB DDR||183||2||366||3||1098||366||128||5.9|
|GeForce3 Ti 200||175||4||700||2||1400||400||128||6.4|
|GeForce3 Ti 500||240||4||960||2||1920||500||128||8.0|
|GeForce4 MX 460||300||2||600||2||1200||550||128||8.8|
|GeForce4 Ti 4600||300||4||1200||2||2400||650||128||10.4|
Now remember, as always, that the fill rate (megapixels and megatexels) numbers above are simply theoretical peaks. The other peak theoretical number in the table, memory bandwidth, will often have a lot more to do with a card’s actual pixel-pushing power than the fill rate numbers will. ATI and NVIDIA have implemented some similar tricks to help their newer chips make more efficient use of memory bandwidth, so newer chips will generally outrun older ones, given the same amount of memory bandwidth.
In fact, that chart above so doesn’t capture the actual facts of the matter that we’ll augment it with another chart that shows a few, key features of most newer GPUs.
|Vertex shaders||Pixel shaders?||Textures per rendering pass||Z compression?||Hardware occlusion culling?|
That’s by no means everything that’s important to know about these chips, but it’s what I could squeeze in with confidence about the precise specs. Also, the implementations of many of these features vary, so the fact both GeForce4 and Radeon 8500 have “hardware occlusion culling” doesn’t say much all by itself. The GF4’s culling might be much more effective, or vice versa.
Still, this chart is revealing. As you can see, there are a few surprises for those of us familiar with the GeForce3. The GeForce4 Ti includes a second vertex shader unit, while the GeForce4 MX has no vertex shader at all.
What does it all mean? Let’s take it one chip at a time.
The GeForce4 Ti
NVIDIA describes the features of the “Titanium” version of the GeForce4 with a number of marketing terms, including Lightspeed Memory Architecture II, nFinite FX II Engine, Accuview AA, and nView dual-display support. To better understand how these pieces fit into the whole, have a look at NVIDIA’s diagram of the GeForce4 Ti chip:
By and large, this chip is very similar to the GeForce3. The most important changes are simple ones: the chip now runs at speeds as high as 300MHz, and memory runs as fast as 325MHz (or 650MHz DDR). GeForce4 cards will come with memory chips in a ball-grid array (BGA) package, which should cut costs and improve heat dissipation. ATI’s new 128MB Radeon 8500 cards will have BGA memory, too. It’s hip.
GeForce4 Ti cards will come in several flavors. The GF4 Ti 4600 will feature 128MB of memory and the aforementioned 300/325MHz core/memory clock speed. At these speeds, the Ti 4600 cards will be unquestionably faster, in terms of fill rate, than ATI’s top-end Radeon 8500.
GF4 Ti 4400 cards will be clocked slower, but NVIDIA’s not stating exactly how much slower; it may be up to the discretion of card makers.
The GF4 Ti chip itself weighs in at 63 million transistors, which is hefty by any measure. Still, the GeForce3 was 57 million transistors, which ain’t exactly small. (An Athlon XP processor, for reference, is only about 37.5 million transistors.) Like the GeForce3, the GeForce4 Ti is manufactured on a 0.15-micron fab process.
So what did NVIDIA add with those six million new transistors? Nothing radically new. NVIDIA’s engineers have tweaked a few things here and there, however. With some pointed questioning and a little work, we’ve been able to determine what most of the changes are.
- Vertex shaders The vertex shader unit itself is the same as the one in the GeForce3, but the GF4 Ti includes two of ’em. That puts the GF4 Ti on par with the Xbox GPU and ATI’s Radeon 8500. We’ve already seen how a second vertex shader can help: the Radeon 8500 is an absolute beast in 3DMark 2001.
Also, since the GeForce3’s fixed-function T&Luseful for running apps written to take advantage of GeForce/GeForce2-class T&L engineis implemented as a vertex program, the GeForce4 Ti ought to be able to accelerate fixed-function T&L very well.
- Pixel shaders NVIDIA’s pixel shaders are perhaps most in need of improvement. Most of us are familiar with the controversy over pixel shader versions between DirectX 8 and 8.1. ATI lobbied for some changes to the pixel shader API to better support the Radeon 8500, and NVIDIA countered that ATI’s “new” pixel shader versions didn’t add any real functionality.
However, ATI has since demonstrated some convincing advantages of its pixel shader implementation. I keep going back to an old .plan update by John Carmack, written at the introduction of the GeForce3, that I think has since been overlooked. What he said is enlightening.
Now we come to the pixel shaders, where I have the most serious issues. I can just ignore this most of the time, but the way the pixel shader functionality turned out is painfully limited, and not what it should have been.
DX8 tries to pretend that pixel shaders live on hardware that is a lot more general than the reality.
Nvidia’s OpenGL extensions expose things much more the way they actually are: the existing register combiners functionality extended to eight stages with a couple tweaks, and the texture lookup engine is configurable to interact between textures in a list of specific ways.
I’m sure it started out as a better design, but it apparently got cut and cut until it really looks like the old BumpEnvMap feature writ large: it does a few specific special effects that were deemed important, at the expense of a properly general solution.
Yes, it does full bumpy cubic environment mapping, but you still can’t just do some math ops and look the result up in a texture. I was disappointed on this count with the Radeon as well, which was just slightly too hardwired to the DX BumpEnvMap capabilities to allow more general dependent texture use.
Enshrining the capabilities of this mess in DX8 sucks. Other companies had potentially better approaches, but they are now forced to dumb them down to the level of the GF3 for the sake of compatibility. Hopefully we can still see some of the extra flexibility in OpenGL extensions.
ATI’s pixel shaders do offer more general dependent texture addressing, and their superior flexibility and precision allows for more convincing effects. (See here for an example.) The long and short of it: NVIDIA’s pixel shaders could stand some improvements.
Apparently, the GeForce4 Ti’s pixel shaders have been improved incrementally in order to address some of these concerns. There are a few new pixel shader instructions in the GF4 Ti, including dependent texture lookups into both 2D and 3D textures and Z-correct bump mapping. These changes may or may not bring NVIDIA’s pixel shaders up to par with ATI’s, but the addition of more general dependent texture addressing is a step in the right direction, even if it isn’t a major improvement over the GeForce3.
- Texturing The GeForce3 can apply two textures per pixel in a single clock cycle, but through a “loop back” method it can apply an additional two textures per rendering pass. That handy trick improves performance and image quality by keeping the chip from having to resort to multi-pass rendering. The GF4 Ti offers improved support for three and four textures per pass. NVIDIA has optimized the GF4 Ti chip’s caches and pipelines to improve this “loopback” feature, especially when anisotropic and/or triliear filtering is in use.
- LMA-II The GF4 Ti includes the same crossbar memory controller as the GeForce3. NVIDIA has tweaked its Z-buffer compression routine so that it now provides lossless compression as tight as an 8:1 ratio. (At its launch, NVIDIA claimed 4:1 for the GeForce3’s Z compression.)
Also, NVIDIA says the GF4 Ti’s occlusion detection is improved, though when pressed about how it’s improved, all they would say is that it’s “more aggressive.” We’ll seek more detail if we can. These are hardware-level changes, though, not just driver tweaks.
- Accuview anti-aliasing The GeForce4 Ti’s basic approach to anti-aliasingmultisamplingis the same as the GeForce3’s, but NVIDIA has made some very worthwhile tweaks to the AA implementation in the GF4 Ti, and they’ve given it a new name. Accuview AA uses NVIDIA’s multisampling approach, which effectively provides edge-only antialiasing in a more efficient manner than the traditional super-sampling approach.
Among the improvements:
- NVIDIA claims the GF4 Ti includes wider internal data paths, so it can accommodate multisampled anti-aliasing with very little performance loss.
- The sample patterns for Accuview are improved. The GeForce3 used a rotated grid sample pattern for 2X AA and an inferior ordered grid pattern for 4X mode. The GeForce4 family uses rotated grid patterns for both 2X and 4X modes. NVIDIA denotes this difference in 4X mode by dubbing the rotated grid 4X mode “4XS”. The grid rotation is undoubtedly a good thing; it interrupts the regularity of the pixel grid, helping fool the eye to detect less aliasing. Unfortunately, Accuview doesn’t include a semi-randomized sampling pattern like the Radeon 8500’s SMOOTHVISION doesthat would be even better than a rotated grid.
- Also, as you can see in the diagrams below, Accuview’s AA sampling points are more centered in the pixel. At the subpixel level, the GeForce3 was sampling at the very center and at the edge of the pixel. NVIDIA claims these new sampling patterns provide more accuracy than the previous arrangement.
My understanding is that these new sample patterns ought to cause NVIDIA’s multisampling routine to decide to do blending more often, which is probably why NVIDIA didn’t use these patterns with the GeForce3out of a concern for performance.
- Quincunx remains, for those of you who prefer 2X AA plus full-screen blurring. NVIDIA claims turning on Quincunx doesn’t slow performance at all versus 2X mode. I hear adjusting your monitor to run out of focus doesn’t, either.
- Accuview AA now encompasses anisotropic filtering as well as edge AA. This combination of edge antialiasing (multisampling) and texture antialiasing (anisotropic filtering) is simply The Right Thing To Do. Multisampling is just as effective as supersampling for edge AA, and anisotropic filtering is more effective than supersampling for texture AA. The GeForce3 could do both things at once, but the features weren’t logically grouped together as they are with Accuview.
All in all, Accuview is a sensible upgrade to the GeForce3’s multisampling AA. The combination of efficient edge AA, better sample patterns, and anisostropic filtering (with trilinear, if you so choose) probably puts the GF4 Ti on level with the Radeon 8500 for AA. The Radeon’s edge AA may look a little nicer, but it’s probably a fair amount slower than Accuview. We’ll see.
- nView multi-display support The GeForce4 chips both incorporate dual RAMDACs, dual TDMS transmitters, and a TV-out encoder, so they can drive a wide variety of display combinations, from a single VGA monitor to dual digital flat panels. The GF4 cards are the first cards outside of the Matrox G500 to have dual DVI output capability, which puts them into an industry-leading position. This is also the first time NVIDIA has emphasized dual-display features in one of its high-end graphics chips. On this front, the GF4 moves ahead of the dual-display Radeon 8500, which lacks a second DVI out.
NVIDIA has underscored the utility of multi-monitor support by introducing a new feature set in its drivers that helps manage multiple displays, virtual desktops, and the like. The nView software suite was designed by former Appian engineers, so it ought to be very nice. The nView feature set will extend back to existing NVIDIA cards, as well.
We were able to confirm that the GF4 will be able to run displays concurrently in multiple, independent resolutions. However, we’ll believe it can happen in Windows 2000/XP when we see it, because such things are famously difficult.
And that’s about it for the GeForce4 Ti. The improvements aren’t anything special in terms of 3D capabilities, but they do bring NVIDIA’s high-end product offering up to snuff feature-wise versus the Radeon 8500. And the GF4 Ti 4600 will no doubt be a supremely fast graphics card.
The GeForce4 MX
The GeForce4 MX chip, also know as the NV17, is the new low-end GPU from NVIDIA, and it’s an intriguing combination of technology culled from the GeForce2, GeForce3, and GeForce4 Ti. The fastest GF4 MX variant will be the GF4 MX 460, clocked at 300MHz with 64MB of DDR memory at 275MHz (550MHz DDR). From there, it gets hazy. The GF4 Ti 440 will also use DDR memory, and it will run somewhat slower than the 460. The GF4 420 will be mated with plain ol’ SDRAM.
The strangest thing about the GeForce4 MX is that its 3D rendering core is ripped directly from its predecessor, the GeForce2 MX. The GF4 MX has two pixel pipelines capable of laying down two pixels per clock, and it has a fixed-function T&L engine. There aren’t any pixel or vertex shaders in sight (unless you count the GeForce2’s register combiners as primitive pixel shaders, I suppose). In terms of 3D technology, the GF4 MX is significantly less advanced than the GeForce3 or the Radeon 8500.
It’s possible NVIDIA might try to implement a software vertex shader. DirectX 8 has its own set of routines to handle vertex shader programs on the CPU if no vertex unit is present. NVIDIA might choose to write its own, highly optimized software vertex shaderperhaps making some use of the GF4 MX’s fixed-function T&L unitto help improve performance. However, the fact remains: the GeForce4 MX lacks a vertex shader.
And pixel shaders can’t really be emulated.
Beyond that, the NV17 does include some worthwhile new tech, like the Lightspeed Memory Architecture bits lifted from the GF3/GF4 Ti. The one big modification to LMA for NV17 is that two of the four memory controllers in the crossbar config are eliminated. Those controllers are paired up with rendering pipelines, and the NV17 has only two pipes. Even so, with Z compression, occlusion detection, and fast DDR memory, the GF4 MX 460 ought to bring a monster fill rate to the party. If all you want to do is push pixels in DirectX 7-class games, the NV17 will certainly do so.
The GF4 MX also lifts the antialiasing and display units from the GF4 Ti, so the chip will include one of the better AA implementations around, plus excellent dual-display output capabilities and the nView feature set.
The one piece of unique technology in the NV17 is a nod to the fact the NV17 will find its way into lots of laptop PCs. The chip includes a full MPEG2 decoder, so DVD playback should require almost no CPU overhead. NVIDIA says they left the MPEG2 decoder out of the GF4 Ti because big, fat desktop PCs with GF4 Ti cards don’t need much help with DVD playback. Makes sense to me.
What doesn’t make sense to me is why in the world NVIDIA is introducing this product, with this 3D rendering pipeline, at the beginning of 2002. One would expect a “GeForce4 MX” to include a cut-down version of the GeForce3/4 rendering pipeline, perhaps with two pixel shader/rendering pipes and a single vertex shader. Instead, we’re getting a card that’s incapable of taking advantage of all of the new 3D graphics programming techniques NVIDIA pioneered with the GeForce3.
With every GF4 MX that NVIDIA sells, the installed base for yesterday’s 3D technology will grow, and resistance against truly ground-breaking games and other software will be strengthened. Not only that, but attaching the “GeForce4” name to a chip with a GeForce2 MX rendering core seems deceptive to me, especially since the correlation between the GeForce2 and the GeForce2 MX was pretty tight.
Yes, the GF4 MX will be fast; it will have nice dual-display capabilities; and it will be cheap. But this cheap and easy date will be a nightmare the morning after.
NVIDIA will be “gracefully” phasing out its current products in favor of this new lineup. For the most part, that’s a fine thing. The GF4 Ti is much stronger competition for the Radeon 8500 than the Ti 500 was. With dual vertex shaders, Accuview AA, and dual-display support, the GF4 Ti has few weaknesses now. And with the core and memory clock speeds NVIDIA is suggesting, the GF4 Ti ought to be the fastest GPU on the planet.
Still, I can’t imagine recommending that any current GF3 or GF3 Ti 500 owner upgrade to a GF4 Ti. The only really solid reason to do so would be for the dual-display capabilities. Beyond that, most of the improvements are too minor, too incremental to justify an upgrade.
The loss of the GeForce Ti 200, however, will be a step backwards. Unless the GF4 Ti 4400 cards can get very cheap very soon, NVIDIA may be stuck with the GeForce4 MX 460 card as its mainstream $149-199 product. Given the fact that ATI has just introduced a Radeon 8500LE card with 128MB RAM for $199 list, the choice may be a simple one. The GF4 MX matches up nicely against the Radeon 7500, but with real vertex and pixel shaders, the Radeon 8500LE will blow it away.
Whatever the case, it’s safe to say this is one of NVIDIA’s least memorable “spring cycle” product releases. After the GeForce3 last year and the original GeForce the year before that, the GeForce4 family is a little underwhelming. The GeForce killed off NVIDIA competitors in droves, and the GeForce3 was a revolutionary product. Now that the GeForce4 has arrived, though, ATI is still in a pretty good position. After 3dfx and some of the other 3D chipmakers kicked the bucket, many of us were wondering whether there would be two major players in the graphics market or only one. Looks like it’s gonna be two.