Nvidia’s GeForce 8800 GT graphics processor

This is an absolutely spectacular time to be a PC gamer. The slew of top-notch and hotly anticipated games hitting stores shelves is practically unprecedented, including BioShock, Crysis, Quake Wars, Unreal Tournament 3, and Valve’s Orange Box trio of goodness. I can’t remember a time quite like it.

However, this may not be the best time to own a dated graphics card. The latest generation of high-end graphics cards brought with it pretty much twice the performance of previous high-end cards, and to add insult to injury, these GPUs added DirectX 10-class features that today’s games are starting to exploit. If you have last year’s best, such as a GeForce 7900 or Radeon X1900, you may not be able to drink in all the eye candy of the latest games at reasonable frame rates.

And if you’ve played the Crysis demo, you’re probably really ready to upgrade. I’ve never seen a prettier low-res slide show.

Fortunately, DirectX 10-class graphics power is getting a whole lot cheaper, starting today. Nvidia has cooked up a new spin of its GeForce 8 GPU architecture, and the first graphics card based on this chip sets a new standard for price and performance. Could the GeForce 8800 GT be the solution to your video card, er, Crysis? Let’s have a look.


Meet the G92

In recent years, graphics processor transistor budgets have been ballooning at a rate even faster than Moore’s Law, and that has led to some, um, exquisitely plus-sized chips. This fall’s new crop of GPUs looks to be something of a corrective to that trend, and the G92 is a case in point. This chip is essentially a die shrink of the G80 graphics processor that powers incumbent GeForce 8800 graphics cards. The G92 adds some nice new capabilities, but doesn’t double up on shader power or anything quite that earth-shaking.


Here’s an extreme close-up of the G92, which may convince your boss/wife that you’re reading something educational and technically edifying right about now. We’ve pictured it next to a U.S. quarter in order to further propagate the American hegemonic mindset. Er, I mean, to provide some context, size-wise. The G92 measures almost exactly 18 mm by 18 mm, or 324 mm². TSMC manufactures the chip for Nvidia on a 65nm fab process, which somewhat miraculously manages to shoehorn roughly 754 million transistors into this space. By way of comparison, the much larger G80—made on a 90nm process—had only 681 million transistors. AMD’s R600 GPU packs 700 million transistors into a 420 mm² die area.

Why, you may be asking, does the G92 have so many more transistors than the G80? Good question. The answer is: a great many little additions here and there, including some we may not know about just yet.

One big change is the integration of the external display chip that acted as a helper to the G80. The G92 natively supports twin dual-link DVI outputs with HDCP, without the need for a separate display chip. That ought to make G92-based video cards cheaper and easier to make. Another change is the inclusion of the VP2 processing engine for high-definition video decoding and playback, an innovation first introduced in the G84 GPU behind the GeForce 8600 lineup. The VP2 engine can handle the most intensive portions of H.264 video decoding in hardware, offloading that burden from the CPU.

Both of those capabilities are pulled in from other chips, but here’s a novel one: PCI Express 2.0 support. PCIe 2.0 effectively doubles the bandwidth available for communication between the graphics card and the rest of the system, and the G92 is Nvidia’s first chip to support this standard. This may be the least-hyped graphics interface upgrade in years, in part because PCIe 1.1 offers quite a bit of bandwidth already. Still, PCIe 2.0 is a major evolutionary step, though I doubt it chews up too many additional transistors.

So where else do the G92’s additional transistors come from? This is where things start to get a little hazy. You see, the GeForce 8800 GT doesn’t look to be a “full” implementation of G92. Although this chip has the same basic GeForce 8-series architecture as its predecessors, the GeForce 8800 GT officially has 112 stream processors, or SPs. That’s seven “clusters” of 16 SPs each. Chip designers don’t tend to do things in odd numbers, so I’d wager an awful lot of Nvidia stock that the G92 actually has at least eight SP clusters onboard.

Eight’s probably the limit, though, because the G92’s SP clusters are “fatter” than the G80’s; they incorporate the G84’s more robust texture addressing capacity of eight addresses per clock, up from four in the G80. That means the GeForce 8800 GT, with its seven SP clusters, can sample a total of 56 texels per clock—well beyond the 24 of the 8800 GTS and 32 of the 8800 GTX. We’ll look at the implications of this change in more detail in a sec.

Another area where the GeForce 8800 GT may be sporting a bit of trimmed down G92 functionality is in the ROP partitions. These sexy little units are responsible for turning fully processed and shaded fragments into full-blown pixels. They also provide much of the chip’s antialiasing grunt, and in Nvidia’s GeForce 8 architecture, each ROP has a 64-bit interface to video memory. The G80 packs six ROP partitions, which is why the full-blown GeForce 8800 GTX has a 384-bit path to memory and the sawed-off 8800 GTS (with five ROP partitions) has a 320-bit memory interface. We don’t know how many ROP partitions the G92 has lurking inside, but the 8800 GT uses only four of them. As a result, it has a 256-bit memory interface, can output a maximum of 16 finished pixels per clock, and has somewhat less antialiasing grunt on a clock-for-clock basis.

How many ROPs does G92 really have? I dunno. I suspect we’ll find out before too long, though.

The 8800 GT up close

What the 8800 GT lacks in functional units, it largely makes up in clock speed. The 8800 GT’s official core clock speed is 600MHz, and its 112 SPs run at 1.5GHz. The card’s 512MB of GDDR3 memory runs at 900MHz—or 1.8GHz effective, thanks to the memory’s doubled data rate.



MSI’s NX8800GT

Here’s a look at MSI’s rendition of the GeForce 8800 GT. Note the distinctive MSI decal. This card is further differentiated in a way that really matters: it comes hot from the factory, with a 660MHz core clock and 950MHz memory. This sort of “overclocking” has become so common among Nvidia’s board partners, it’s pretty much expected at this point. MSI doesn’t disappoint.

I don’t want to give too much away, since we’ve measured noise levels on a decibel meter, but you’ll be pleased to know that the 8800 GT’s single-slot cooler follows in the tradition of Nvidia’s coolers for its other GeForce 8800 cards. The thing is whisper-quiet.

The sight of a single-slot cooler may be your first hint that this is not the sort of video card that will put an ugly dent in your credit rating. Here’s another hint at the 8800 GT’s mainstream aspirations. Nvidia rates the power consumption of the 8800 GT at 110W, which makes the single-slot cooler feasible and also means the 8800 GT needs just one auxiliary PCIe power connector, of the six-pin variety, in order to do its thing.



The 8800 GT sports a single six-pin PCIe aux power connector

Another place where the 8800 GT sports only one connector is in the SLI department. That probably means the 8800 GT won’t be capable of ganging up with three or four of its peers in a mega-multi-GPU config. Two-way SLI is probably the practical limit for this card.

Here’s the kicker, though. 8800 GT cards are slated to become available today for between $199 and $249.

Doing the math

So that’s a nice price, right? Well, like so many things in life—and I sure as heck didn’t believe this in high school—it all boils down to math. If you take the 8800 GT’s seven SP clusters and 112 SPs and throw them into the blender with a 1.5GHz shader clock, a 256-bit memory interface, along with various herbs and spices, this is what comes out:

Peak
pixel
fill rate
(Gpixels/s)

Peak texel
sampling
rate
(Gtexels/s)

Peak bilinear

texel
filtering
rate
(Gtexels/s)


Peak bilinear

FP16 texel
filtering
rate
(Gtexels/s)


Peak
memory
bandwidth
(GB/s)

Peak
shader
arithmetic
(GFLOPS)
GeForce 8800 GT 9.6 33.6 33.6 16.8 57.6 504
GeForce 8800 GTS 10.0 12.0 12.0 12.0 64.0 346

GeForce 8800 GTX

13.8 18.4 18.4 18.4 86.4 518
GeForce 8800 Ultra 14.7 19.6 19.6 19.6 103.7 576
Radeon HD 2900 XT 11.9 23.8 11.9 11.9 105.6 475

In terms of texture sampling rates, texture filtering capacity, and shader arithmetic, the 8800 GT is actually superior to the 8800 GTS. It’s also quicker than the Radeon HD 2900 XT in most of those categories, although our FLOPS estimate for the GeForce GPUs is potentially a little rosy—another way of counting would reduce those numbers by a third, making the Radeon look relatively stronger. Also, thanks to its higher clock speed, the 8800 GT doesn’t suffer much in terms of pixel fill rate (and corresponding AA grunt) due to its smaller ROP count. The 8800 GT’s most noteworthy numbers may be its texture sampling and filtering rates. Since its SPs can grab twice as many texels per clock as the G80’s, its texture filtering performance with standard 8-bit integer color formats could be more than double that of the 8800 GTS.

Performance-wise in graphics, math like this isn’t quite destiny, but it’s close. The only place where the 8800 GT really trails the 8800 GTS or the 2900 XT is in memory bandwidth. And, believe it or not, memory bandwidth is arguably at less of a premium these days, since games produce “richer” pixels that spend more time looping through shader programs and thus occupying on-chip storage like registers and caches.

Bottom line: the 8800 GT should generally be as good as or better than the 8800 GTS, for under 250 bucks. Let’s test that theory.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Extreme X6800
2.93GHz
System
bus
1066MHz
(266MHz quad-pumped)
Motherboard XFX
nForce 680i SLI
BIOS
revision
P31
North
bridge
nForce
680i SLI SPP
South
bridge
nForce
680i SLI MCP
Chipset
drivers
ForceWare
15.08
Memory
size
4GB
(4 DIMMs)
Memory
type
2
x Corsair
TWIN2X20488500C5D
DDR2 SDRAM
at 800MHz
CAS
latency (CL)
4
RAS
to CAS delay (tRCD)
4
RAS
precharge (tRP)
4
Cycle
time (tRAS)
18
Command
rate
2T
Audio Integrated
nForce 680i SLI/ALC850

with RealTek 6.0.1.5497 drivers

Graphics GeForce
8800 GT 512MB PCIe

with ForceWare 169.01 drivers

XFX
GeForce 8800 GTS XXX 320MB PCIe

with ForceWare 169.01 drivers

EVGA
GeForce 8800 GTS OC 640MB PCIe

with ForceWare 169.01 drivers


Radeon HD 2900 XT 512MB PCIe

with Catalyst 7.10 drivers

Hard
drive
WD
Caviar SE16 320GB SATA
OS Windows
Vista Ultimate
x86 Edition
OS
updates
KB36710, KB938194, KB938979, KB940105,
DirectX August 2007 Update

Please note that we’re using “overclocked in the box” versions of the 8800 GTS 320MB and 640MB, while we’re testing a stock-clocked GeForce 8800 GT reference card from Nvidia.

Thanks to Corsair for providing us with memory for our testing. Their quality, service, and support are easily superior to no-name DIMMs.

Our test systems were powered by PC Power & Cooling Silencer 750W power supply units. The Silencer 750W was a runaway Editor’s Choice winner in our epic 11-way power supply roundup, so it seemed like a fitting choice for our test rigs. Thanks to OCZ for providing these units for our use in testing.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Crysis demo

The Crysis demo is still fresh from the oven, but we were able to test the 8800 GT in it. Crytek has included a GPU benchmarking facility with the demo that consists of a fly-through of the island in which the opening level of the game is set, and we used it. For this test, we set all of the game’s quality options at “high” (not “very high”) and set the display resolution to—believe it or not—1280×800 with 4X antialiasing.

Even at this low res, these relatively beefy graphics cards chugged along. The game looks absolutely stunning, but obviously it’s using a tremendous amount of GPU power in order to achieve the look.

The demo is marginally playable at these settings, but I’d prefer to turn antialiasing off in order to get smoother frame rates on the 8800 GT. That’s what I did when I played through the demo, in fact.

Notice several things about our results. Although the 8800 GT keeps up with the 8800 GTS 640MB in terms of average frame rates, it hit lower lows of around 10 FPS, probably due to its lesser memory bandwidth or its smaller amount of total RAM onboard. Speaking of memory, the card for which the 8800 GT is arguably a replacement, the 320MB version of the GTS, stumbles badly here. This is why we were lukewarm on the GTS 320MB when it first arrived. Lots of GPU power isn’t worth much if you don’t have enough video memory. GTS 320MB owners will probably have to drop to “medium” quality in order to run Crysis smoothly.

Unreal Tournament 3 demo

We tested the UT3 demo by playing a deathmatch against some bots and recording frame rates during 60-second gameplay sessions using FRAPS. This method has the advantage of duplicating real gameplay, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

Because the Unreal engine doesn’t support multisampled antialiasing, we tested without AA. Instead, we just cranked up the resolution to 2560×1600 and turned up the demo’s quality sliders to the max. I also disabled the demo’s 62 FPS frame rate cap before testing.

All of these cards can play the UT3 demo reasonably well at this resolution, the 8800 GT included. I noticed some brief slowdowns on the GTS 320MB right as I started the game, but those seemed to clear up after a few seconds.

Team Fortess 2

For TF2, I cranked up all of the game’s quality options, set anisotropic filtering to 16X, and used 4X multisampled antialiasing at 2560×1600 resolution. I then hopped onto a server with 24 players duking it out on the “ctf_2fort” map. I recorded a demo of me playing as a soldier, somewhat unsuccessfully, and then used the Source engine’s timedemo function to play the demo back and report performance.

The 8800 GT leads all contenders in TF2. Even at 2560×1600 with 4X AA and 16X aniso, TF2 is perfectly playable with this card, although that didn’t help my poor soldier guy much.

BioShock

We tested this game with FRAPS, just like we did the UT3 demo. BioShock’s default settings in DirectX 10 are already very high quality, so we didn’t tinker with them much. We just set the display res to 2560×1600 and went to town. In this case, I was trying to take down a Big Daddy, another generally unsuccessful effort.

A low of 23 FPS for the 8800 GT puts it right on the edge of smooth playability. The 8800 GT pretty much outclasses the Radeon HD 2900 XT here, amazingly enough. The 2900 XT couldn’t quite muster a playable frame rate at these settings, which my seat-of-the-pants impression confirmed during testing.

Lost Planet: Extreme Condition

Here’s another DX10 game. We ran this game in DirectX 10 mode at 1920×1200 with all of its quality options maxed out, plus 4X AA and 16X anisotropic filtering. We used the game’s built-in performance test, which tests two very different levels in the game, a snowy outdoor setting and a cave teeming with flying doodads.

Here’s another case where the 8800 GTS 320MB stumbles, while the 8800 GT does not. Although the Radeon HD 2900 XT lists for $399, it looks like an also-ran in most of our tests.

Power consumption

We measured total system power consumption at the wall socket using an Extech power analyzer model 380803. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement.

The idle measurements were taken at the Windows Vista desktop with the Aero theme enabled. The cards were tested under load running BioShock in DirectX 10 at 2560×1600 resolution, using the same settings we did for performance testing.

Nvidia has done a nice job with the G92’s power consumption. Our 8800 GT-based test system draws over 20 fewer watts at idle than any of the others tested. Under load, the story is similar. Mash up these numbers with the performance results, and you get a very compelling power efficiency picture.

Noise levels

We measured noise levels on our test systems, sitting on an open test bench, using an Extech model 407727 digital sound level meter. The meter was mounted on a tripod approximately 14″ from the test system at a height even with the top of the video card. We used the OSHA-standard weighting and speed for these measurements.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured, including the stock Intel cooler we used to cool the CPU. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

Nvidia’s single-slot coolers have too often been gratuitously small and noisy in the past year or two, but the 8800 GT is different. This may be the quietest single-slot cooler we’ve ever tested (save for the passive ones), and it doesn’t grow audibly louder under load. That’s a pleasant surprise, since the thing can get very loud during its initial spin-up at boot time. Fortunately, it never visted that territory for us when runnning games.

Conclusions

You’ve seen the results for yourself, so you pretty much know what I’m going to say. The 8800 GT does a very convincing imitation of the GeForce 8800 GTS 640MB when running the latest games, even at high resolutions and quality settings, with antialiasing and high-quality texture filtering. Its G92 GPU has all of the GeForce 8800 goodness we’ve come to appreciate in the past year or so, including DX10 support, coverage-sampled antialiasing, and top-notch overall image quality. The card is quiet and draws relatively little power compared to its competitors, and it will only occupy a single slot in your PC. That’s a stunning total package, sort of what it would be like if Jessica Biel had a brain.

With pricing between $199 and $249, I find it hard to recommend anything else—especially since we found generally playable settings at 2560×1600 resolution in some of the most intensive new games (except for Crysis, which is in a class by itself.) I expect we may see some more G92-based products popping up in the coming weeks or months, but for most folks, this will be the version to have.

The one potential fly in the ointment for the 8800 GT is its upcoming competition from AMD. As we were preparing this review, the folks from AMD contacted us to let us know that the RV670 GPU is coming soon, and that they expect it to bring big increases in performance and power efficiency along with it. In fact, the AMD folks sound downright confident they’ll have the best offering at this price point when the dust settles, and they point to several firsts they’ll be offering as evidence. With RV670, they expect to be the first to deliver a GPU fabbed on a 55nm process, the first to offer a graphics processor compliant with the DirectX 10.1 spec, and the first to support four-way multi-GPU configs in Windows Vista. DirectX 10.1 is a particular point of emphasis for AMD, because it allows for some nifty things like fast execution of global illumination algorithms and direct developer control of antialiasing sample patterns. Those enhancements, of course, will be pretty much academic if RV670-based cards don’t provide as compelling a fundamental mix of performance, image quality, and power efficiency as the GeForce 8800 GT. We’ll know whether they’ve achieved that very soon.

This concludes our first look at the 8800 GT, but it’s not the end of our evaluation process. I’ve been kneedeep in CPUs over the past month or so, culminating today with our review of the 45nm Core 2 Extreme QX9650 processor today, and that’s kept me from spending all of the time with the 8800 GT that I’d like. Over the next week or so, I’ll be delving into multi-GPU performance, some image quality issues, HD video playback, more games, and more scaling tests. We may have yet another new video card for you shortly, too.

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!