Seems like we’ve been waiting for these new GeForces for a long time now. Nvidia gave us a first glimpse at its latest GPU architecture about half a year ago, right around the time that AMD was introducing its Radeon HD 5870. In the intervening six months, AMD has fleshed out its Radeon HD 5000 series with a full suite of DirectX 11-class GPUs and graphics cards. Meanwhile, Nvidia’s GF100 chip is later than a stoner to study hall.
Fortunately, our wait is coming to an end. The GeForce GTX 470 and 480 are expected to be available soon, and we’ve wrangled several of them for testing. We can say with confidence that the GF100 is nothing if not fascinating, regardless of whether it succeeds or fails.
A pair of GeForce GTX 480 cards
Fermi is GF100 is GTX 480
We’ve already covered the GF100 graphics chip and its architecture rather extensively here at TR, so we won’t cover the same ground again in any great detail here. There is much to know about this three-billion-transistor behemoth, though, so we’ll try to bring you up to speed in brief.
Our first look at the GF100 was focused solely on the GPU architecture, dubbed Fermi, and how that architecture has been adapted to serve the needs of the nascent market for GPU-based computing devices. Nvidia intends this chip to serve multiple markets, from consumer gaming cards to high-performance computing clusters, and the firm has committed an awful lot of time, treasure, and transistors toward making the GF100 well suited for GPU computing. Thus, the GF100 has a number of compute-centric capabilities that no other GPU can match. The highlights include improved scheduling with the ability to execute multiple, concurrent kernels; a real, fully coherent L2 cache; robust support for double-precision floating-point math; ECC-protected memories throughout the hierarchy; and a large, unified address space with support for C++-style pointers. Some of these provisionsbetter scheduling and caching, for instancemay have side benefits for consumers, whose GeForce cards have the potential to be especially good at GPU-based video transcoding or in-game physics simulations. Most of them, however, will be practically useless in a desktop PC, particularly since they have no utility in real-time graphics.
After we considered the compute-focused parts of the Fermi architecture, Rys reminded us all that the GF100 is still very much a graphics chip by offering his informed speculation about the specifics of its graphics hardware. Nvidia eventually confirmed many of his hunches when it revealed the details of the GF100’s graphics architecture to us just after CES. As expected, the move to a DX11 feature set means the GF100 adopts nearly every major graphics feature its competitor has, but we were thrown for a loop by how extensively Nvidia’s architects chose to overhaul the GF100’s geometry processing capabilities. Not only does Fermi support DirectX 11’s hardware tessellationby means of which the GPU can amplify the polygon detail in a scene dramaticallybut Nvidia believes it is the world’s first parallel architecture for geometry processing. With quad rasterizers and a host of geometry processing engines distributed across the chip, the GF100 has the potential nearly to quadruple the number of polygons possible in real-time graphics compared to even its fastest contemporaries (GT200 and AMD’s Cypress). In this way, the GF100 is just as audacious an attempt at advancing the state of the art in graphics as it is in computing.
The trouble is that ambitious architectures and major technological advances aren’t easy to achieve. New capabilities add time to the design cycle and complexity to the design itself. Nvidia may well have had both eyes on the potential competition from Intel and its vaunted Larrabee project when conceiving the GF100, with too little focus on the more immediate threat from AMD. Now that the first-generation Larrabee has failed to materialize as a consumer product, the GF100 must face its sole rival in the form of the lean, efficient, and much smaller Cypress chip in AMD’s new Radeons. With a 50% wider path to memory and roughly half again as many transistors as Cypress, the GF100 ought to have no trouble capturing the overall graphics performance title. Yet the GF100 project has been dogged by delays and the inevitable rumors about the problems that have caused them, among them the time-honored classics of chip yield, heat, and power issues.
In this context, we’ve made several attempts at handicapping the key throughput rates of GF100-based products, and we’ve constantly had to revise our expectations downward with each trickle of new information. Now that the flagship GeForce GTX 480 is set to ship soon, we can make one more downward revision that brings us to the final numbers.
Nvidia has elected to disable one of the GF100’s 16 shader multiprocessor groups even in the top-of-the-line GTX 480. That fact suggests some yield issues with this very large chip, and indeed, the company says the concession was needed in order to ensure sufficient initial supplies of the GTX 480. This change reduces the number of ALUs or “CUDA cores” by 32 in the final product, along with one texture unit that would have been good for sampling and filtering four texels per clock. With this modification and the settling of the base GPU clock at 700MHz, the shader ALUs at twice that, and the memory clock at 924MHz, the GTX 480’s key rates become apparent.
|Process node||55 nm @ TSMC||40 nm @ TSMC||40 nm @ TSMC|
|Core clock||648 MHz||700 MHz||850 MHz|
|“Hot” (shader) clock||1476 MHz||1401 MHz||—|
|Memory clock||1300 MHz||924 MHz||1200 MHz|
|Memory transfer rate||2600 MT/s||3696 MT/s||4800 MT/s|
|Memory bus width||512 bits||384 bits||256 bits|
|Memory bandwidth||166.4 GB/s||177.4 GB/s||153.6 GB/s|
|Peak single-precision arithmetic rate||0.708 Tflops||1.35 Tflops||2.72 Tflops|
|Peak double-precision arithmetic rate||88.5 Gflops||168 Gflops||544 Gflops|
|ROP rate||21.4 Gpixels/s||33.6 Gpixels/s||27.2 Gpixels/s|
|INT8 bilinear texel rate
(Half rate for FP16)
|51.8 Gtexels/s||42.0 Gtexels/s||68.0 Gtexels/s|
The GTX 480 is a straightforward near-doubling of peak shader arithmetic and ROP throughput rates versus the GeForce GTX 285, but memory bandwidth is only marginally higher. In theory, the GTX 480 is, amazingly, slower at texturing than the GTX 285, but Nvidia expects the GF100 to deliver higher real-world texturing performance thanks to some texture cache optimizations that should reduce conflicts during sampling.
(Those of you familiar with the Fermi architecture may wonder why double-precision math performance is only doubled versus the GTX 285. In theory, GF100 does DP math at half the single-precision rate. However, Nvidia has elected to reserve all of that double-precision power for its professional-level Tesla products. GeForce cards will get only a quarter of the DP math rate.)
More troubling are the comparisons to the Radeon HD 5870. Yes, the GTX 480 is badly beaten in the FLOPS numbers by the 5870. That was also true of the comparison between the GTX 280 and Radeon HD 4870 in the prior generation, yet it was never a real problem, because Nvidia’s scheduling methods are more efficient than AMD’s. The sheer magnitude of the gap in FLOPS here is unsettling, but the areas of potentially greater concern include memory bandwidth, ROP rate, and texturing. Theoretically, the GTX 480 is only slightly quicker in the former two categories, since relatively low GDDR5 clock rates look to have hampered its memory bandwidth. And it’s substantially slower than the 5870 at texturing. As we’ve said before, the GF100 will have to make up in efficiency what it lacks in brute-force capacity, assuming the competition leaves room for that.
Clearly, the GF100 missed its targets on a number of fronts. The fact that it’s still tops in memory bandwidth and ROP rate illustrates how Nvidia’s strategy of building a very large chip mitigates risk, in a sense. Even when dialed back, the GF100 is in running for the performance title. The question is whether capturing that title will be worth itto Nvidia in terms of manufacturing costs and delays, and to the consumer in terms of power draw, heat, and noise. Last time around, the GT200 wasn’t really a paragon of architectural efficiency, but Nvidia was able to reach some fairly satisfactory compromises on clock speed, power, and performance.
GeForce GTX 480: The big daddy warms up his pipes
The GeForce GTX 480
For what it is, the GeForce GTX 480 is a rather impressive specimen. At first glance, it looks like any other high-end graphics card. Take stock, though, and several items stand out. The four heatpipes menacingly snaking up and back down into its cooling shroud suggest serious cooling horsepowerand the need for it. The plated metal grooves on the side of the card are handsomeand a heatsink surface. Above the dual DVI ports is a rather unusual Mini HDMI connector, surely chosen because it’s compact enough not to impinge on the venting area in the adjacent slot cover. (Nvidia expects cards to ship with adapters to regular-sized HDMI connectors, of course.) In all of these ways, the GTX 480 looks to be designed to expel and radiate heat as efficiently as any video card we can recall.
From left to right: Radeon HD 5850, GeForce GTX 470, GeForce GTX 480, Radeon HD 5870, Radeon HD 5970
In spite of those omens, the GTX 480 is pretty much exactly 10.5″ long, or ever-so-slightly shorter than a Radeon HD 5870. Unlike the 5870, though, the GTX 480 requires both a six-pin auxiliary power connector and an eight-pin one, because the 250W max power rating of the board exceeds the limits of two six-pin connectors.
Remove the cooler, and you can see the source of all of the commotion: the GF100 chip under its expansive metal cap. Nvidia remains mum on the GF100’s exact die area, but it’s gotta be roughly the size of a TARP loan. Flanking the GF100 are 12 pieces of GDDR5 memory, totaling 1536MB.
The GTX 480’s cooler is a nifty bit of engineering in its own right, with five heatpipes that come into direct contact with the metal cap atop the GPU. Hint: do not touch said heatpipes. I measured one at 141° F with an IR thermometer. Moreover, at one point as I was uninstalling a card from our test system, I personally confirmed, with my fingertip, that this temperature is above the threshold of pain.
Nvidia expects the GeForce GTX 480 to sell for $499 at online retailers. That price positions the GTX 480 a step above the Radeon HD 5870, whose list price was supposed to be $399 until the reality of 40-nm supply problems intruded; the 5870’s prevailing street price now looks to be about $419. The closest competition for the GTX 480 may come in the form of upcoming 2GB variants of the Radeon HD 5870, such as the Eyefinity6 edition that’s likely to be priced at $499or the Asus Matrix card we’ll show you on the next page.
Speaking of supply problems, AMD’s fastest graphics card, of course, is the dual-GPU Radeon HD 5970. That is if you can find one. They’re currently out of stock at Newegg, with stated prices ranging from $699 to $739, well above the card’s initially projected $599 price tag.
Product availability is one of the big questions about the GeForce GTX 400 series, as well. When Nvidia first briefed us on the GTX 470 and 480, we were told to expect to see cards selling at online retailers within a week after last Friday’s official product announcement. Late last week, that time frame changed to the week of April 12. That’s well past the self-imposed first-quarter ship date Nvidia had pledged for the GF100-based cards at CES, which doesn’t inspire loads of confidence. Still, Nvidia’s Nick Stam told us last week that the firm is “building tens of thousands of units for initial availability.” Whether that many cards will truly be available in mid-Apriland whether that supply will be sufficient to meet demandis anyone’s guess. We’ll just have to wait and see.
GeForce GTX 470: Fermi attenuated
Ultimately, the card that may arouse more interest in the buying public is the GeForce GTX 470, another GF100 variant that’s been detuned a bit. For this product, Nvidia disables a second SM unit, along with a ROP partition and its associated memory controller. The resulting specs line: 448 ALUs, 56 texture units, 40 ROPs, and a 320-bit path to memory. The GTX 470’s core clock speed is 607MHz, and its 1280MB of GDDR5 memory ticks along at 837MHzor 3348 MT/s.
This is a smaller, 9.5″-long card with a 215W max power rating and only two six-pin aux power plugs. With an expected e-tail price of $349, the GTX 470 should probably slot in between the Radeon HD 5850 and the 5870. However, the 5850 long ago left its initial list price of $259 in the dust and has climbed into the $299-329 range, not far from where the GTX 470 is expected to land.
The GTX 470 sports the same set of display outputs as the GTX 480: two DVI ports and a Mini HDMI connector. We should note, though, that the GF100’s display block can only drive two outputs at a time. Nvidia announced plans at CES to counter AMD’s Eyefinity feature with a triple-monitor Surround Gaming capability, along with 3D Vision Surround, which will incorporate support for 3D glasses to add the impression of depth. Thanks to the GF100’s display output limitations, you’ll need a pair of GTX-400-series graphics cards in SLI in order to drive three monitors. That’s a shame, since a card like this is easily powerful enough to drive at least three multi-megapixel displays competently in modern games, as AMD’s Eyefinity initiative has proven. On the flip side, given our experiences with 3D Vision, we expect you’d need two very fast GPUs in order to get decent performance across three displays with it.
Driver support for both Surround Gaming and 3D Vision Surround is still pending. Nvidia tells us it will add these features in its 256 driver release, slated for “early April.” This driver will purportedly come out of the gate with bezel compensationa capability AMD has only recently added to its Catalyst drivers. Still, given the fact that AMD has a broad lineup of Radeon HD 5000-series cards capable of supporting three monitors and a six-monitor Eyefinity6 card whose release is imminent, Nvidia has miles to go to catch up on the multi-monitor front.
Team AMD’s new ringer
Since it’s had Radeon HD 5870 cards in the market for half a year, AMD has the luxury of meeting the new GeForces with some specially tailored competition. The card pictured above arrived in Damage Labs just as the GF100 cards did, and it’s a Radeon HD 5870 that’s hopped up on Adderall. Part of Asus’ Matrix series, this 5870 has a custom cooler and a pair of eight-pin power connectors. Together, those things should allow a fair bit of additional overclocking headroom. Also, right out of the box, the Matrix card should be a little faster than a vanilla 5870 thanks to its 894MHz core clock44MHz above stockand 2GB of onboard RAM.
Cards like this one should be available for purchase in early April. We don’t have pricing yet, but I’d expect them to be priced at or below the GeForce GTX 480’s $499 mark.
We’ve included the Matrix 5870 2GB in our tests, alongside a regular Radeon HD 5870. The Matrix card is labeled as “Radeon HD 5870 2GB,” but keep in mind that its performance may be more influenced by its bumped-up core clock speed than by the additional video RAM. In fact, memory may be a limiting factor in this card’s performance, since its clock speed hasn’t budged from the usual 1200MHz.
We’ve concentrated our tests of these new GeForces primarily on games this time around, rather than delving into the specifics of the GPU architecture. That’s due in part to limited time with the cards prior to publication, and in part to my dissatisfaction with the current state of the tools we have to measure these things. With luck, we should be able to go back and take a closer look at some architectural specifics at a later date.
For now, in order to give you the broadest comparisons possible, we’ve broken our testing into two rounds. Round one incorporates test data from our Radeon HD 5830 review, which includes results from 13 different graphics cards dating back to the Radeon HD 3870 and the GeForce 7900 GTX. To those results, we’ve added the GeForce GTX 470, 480, and the Matrix 5870. In round two, we’ve concentrated our attention on newer games, higher quality settings, DirectX 11, and a few directed tests. This round includes some higher-end hardware, including dual GTX 480 cards in SLI, 5870s in CrossFire, and some single-card multi-GPU solutions.
You can see our basic test configurations for round one in the testing methods section of our 5830 review. Our test config for the newer cards was the same, with the sole exception of video drivers. We tested the GTX 470/480 with the ForceWare 197.17 drivers and the Matrix 5870 with the Catalyst 10.3a preview driverscomplete with the latest profile updatefor both rounds. The other cards in round one used slightly older drivers. That may be notable because AMD says it’s delivered some general performance improvements for all Radeon HD 4000- and 5000-series GPUs in its latest public driver release. However, I suspect the pre-release drivers we used for the 5830 review already incorporated many of those optimizations, and our results would seem to bear that out.
For round two, our test systems were configured per the information below.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we’ve reported the median result.
Our test systems were configured like so:
|Processor||Core i7-965 Extreme 3.2GHz|
|North bridge||X58 IOH|
|Memory type||Corsair Dominator
DDR3 SDRAMat 1333MHz
|Chipset drivers||INF update 22.214.171.1245
Rapid Storage Technology 126.96.36.1994
with Realtek 188.8.131.5269 drivers
|Graphics|| Radeon HD
with Catalyst 8.712-100313a-097309E (10.3a preview) drivers
Profile update 1.0
EAH5870 Radeon HD 5870 1GB
with Catalyst 8.712-100313a-097309E (10.3a preview) drivers
Profile update 1.0
Radeon HD 5870 1GB
with Catalyst 8.712-100313a-097309E (10.3a preview) drivers
Profile update 1.0
Matrix Radeon HD 5870 2GB
with Catalyst 8.712-100313a-097309E (10.3a preview) drivers
Profile update 1.0
| Radeon HD
with Catalyst 8.712-100313a-097309E (10.3a preview) driver
Profile update 1.0
ENGTX285 TOP GeForce GTX 285 1GB
with ForceWare 197.13 drivers
|Asus ENGTX295 GeForce GTX
with ForceWare 197.13 drivers
with ForceWare 197.17 drivers
with ForceWare 197.17 drivers
GTX 480 1536MB
with ForceWare 197.17 drivers
|Hard drive||WD Caviar SE16 320GB SATA|
|Power supply||PC Power & Cooling Silencer 750 Watt|
|OS||Windows 7 Ultimate x64 Edition
DirectX runtime update February 2010
Thanks to Intel, Corsair, Gigabyte, and PC Power & Cooling for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, XFX, Asus, Diamond, and Gigabyte supplied the graphics cards for testing, as well.
Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.
We used the following test applications:
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Round 1: Running the numbers
|GeForce 8800 GT||11.2||39.2||19.6||64.6||392||588|
|GeForce 9800 GTX||10.8||43.2||21.6||70.4||432||648|
|GeForce GTS 250||12.3||49.3||24.6||71.9||484||726|
|GeForce GTX 260 (216 SPs)||18.2||46.8||23.4||128.8||605||907|
|GeForce GTX 275||17.7||50.6||25.3||127.0||674||1011|
|GeForce GTX 285||21.4||53.6||26.8||166.4||744||1116|
|GeForce GTX 470||24.3||34.0||17.0||133.9||1089||–|
|GeForce GTX 480||33.6||42.0||21.0||177.4||1345||–|
|Radeon HD 3870||13.6||13.6||13.6||73.2||544||–|
|Radeon HD 4850||11.2||28.0||14.0||63.6||1120||–|
|Radeon HD 4870||12.0||30.0||15.0||115.2||1200||–|
|Radeon HD 4890||14.4||36.0||18.0||124.8||1440||–|
|Radeon HD 5750||11.2||25.2||12.6||73.6||1008||–|
|Radeon HD 5770||13.6||34.0||17.0||76.8||1360||–|
|Radeon HD 5830||12.8||44.8||22.4||128.0||1792||–|
|Radeon HD 5850||23.2||52.2||26.1||128.0||2088||–|
|Radeon HD 5870||27.2||68.0||34.0||153.6||2720||–|
We’ve already looked at the GTX 480’s theoretical capacities in some detail, but the table above gives us a wider view and includes the GeForce GTX 470, as well. These numbers are instructive and sometimes helpful for handicapping things, but delivered performance is what matters most, so we won’t fixate on them too much. For instance, note how the GeForce GTX 470’s basic specs compare to its likely chief rival, the Radeon HD 5850. The pixel fill rate and memory bandwidth of the two cards are comparable, with a slight edge to the GeForce in both categories. The 5850 has almost double the peak texturing and shader arithmetic rates, though. The balances here are very different in theory, but I doubt the real-world performance will diverge nearly that dramatically.
We can use some synthetic tests to give us a better sense of things. We’re not especially enamored with 3DMark Vantage overall, but its directed benchmarks will have to do for now.
The results of this test tend to track pretty closely with theoretical memory bandwidth numbers, and the new GeForces appear to follow suit. That puts the GTX 480 at the top of the heap, with the GTX 470 settling in just below the Radeon HD 5850. Of course, ROP rates are crucial for things other than just raw pixel fill rates, most notably for antialiasing performance. We’ll look at that aspect of things in the following pages.
We learned in the process of putting together our Radeon HD 5830 review that 3DMark’s texture fill rate test really is just that and nothing moretextures aren’t even sampled bilinearly. The test does employ FP16 texture formats, but the result is to put GeForce GTX 200-series GPUs (and older Nvidia parts) at a disadvantage, since they can only sample FP16 textures at the rate they filter them, while newer Radeons can sample FP16 textures at full speed. Fortunately, the GF100 can also sample FP16 formats at full speed, so it’s not handicapped by the peculiarities of this test.
In fact, both GF100 cards prove to be quite efficient here, coming close to their projected peak rates. Then again, the Radeons look to be similarly efficient, with markedly higher peak throughput.
These directed shader tests have long been a study in contrasts, with the architectures from rival camps showing their relative strengths on different workloads. If anything, that contrast is heightened by the arrival of the GF100. The Radeons rule the Perlin noise and parallax occlusion mapping tests, while the new GeForces dominate the GPU cloth and particle simulations. If we can learn anything from these results, I’d say the lesson is that the Radeon HD 5870’s gaudy advantage in peak FLOPS rate on paper doesn’t necessarily translate into superior shader performance.
Round 1: DiRT 2
This excellent racer packs a nicely scriptable performance test. For round one, we tested at the game’s “high” quality presets with 4X antialiasing in DirectX 9 mode. (We’ll look at DX11 in round two.) Because this automated test uses computer A.I. and involves some variance, we tested five times at each resolution and have reported the median results.
Our first game test does not produce a happy outcome for the new GeForces. At the highest resolution, the GeForce GTX 480 ties the Radeon HD 5870. The GeForce GTX 470 is in a dead heat with the Radeon HD 5850, too.
Round 1: Borderlands
We tested Gearbox’s post-apocalyptic role-playing shooter by using the game’s built-in performance test. We tested with all of the in-game quality options at their max. We couldn’t enable antialiasing, because the game’s Unreal Engine doesn’t support it.
This sort of result may be what Nvidia was counting on when it set the prices for its new graphics cards. The GTX 480 and 470 take the top two spots at the highest resolution, followed surprisingly enough by the GTX 285.
I’m not entirely sure what to make of the minimum frame rates reported by the Borderlands performance test. I wouldn’t put too much stock into them, though, when the GTX 470 has a higher minimum in a couple of cases than the GTX 480.
Round 1: Left 4 Dead 2
In Left 4 Dead 2, we got our data by recording and playing back a custom timedemo comprised of several minutes of gameplay. We had L4D2‘s image quality settings maxed out, with multi-core rendering enabled, along with 4X AA and 16X anisotropic filtering.
The Radeons are faster at lower resolutions, but as the display resolution climbs and the GPU becomes unquestionably the primary performance constraint, the GTX 480 rises to the top spotonly by a paltry two frames per second, though. The GTX 470 doesn’t appreciably outperform the Radeon HD 5850, either.
Round 1: Modern Warfare 2
Call of Duty: Modern Warfare 2 generally runs pretty well on most modern PC hardware, but it does have some parts where lots of activity and heavy use of shader effects can slow it down. We chose to test performance in one such area, where you’re in a firefight inside of an office building. This close-quarters fight involves lots of flying debris, smoke, and a whole mess of enemy soldiers cooped up with your squad in close proximity.
To test, we played through this scene for 60 seconds while recording frame rates with FRAPS. This firefight is chaotic enough that there’s really no hope of playing through it exactly the same way each time, although we did try the best we could. We conducted five play-throughs on each card and then reported the median of the average and minimum frame rate values from all five runs. The frame-by-frame results come from a single, representative test session.
We had all of MW2’s image quality settings maxed out, with 4X antialiasing enabled, as well. We could only include a subset of the cards at this resolution, since the slower ones couldn’t produce playable frame rates.
The trend we’ve established in our prior round-one tests continues here: the new GeForces just aren’t appreciably faster than their closest Radeon competitorseven though they cost more. Disappointingly, the GeForce GTX 480 really isn’t that much faster than the GeForce GTX 285, either.
Round 2: Unigine Heaven demo
And now we begin round two, with a little more focus on high-end solutions and DirectX 11. As you have hopefully gathered by now, one of the distinguishing characteristics of the Fermi architecture is its potential for geometry processing throughput. Nvidia has made a big bet here on the usage model for real-time graphics shifting in the direction of much, much more geometric detail. Generally speaking, the Radeon HD 5870 seems to be quite good at tessellation, although once you get to a certain point with a very large number of triangles in a scene, the GF100 ought to outperform the 5870 thanks to its quad rasterizers and the like.
Few current games (and quite likely, not many in the near future) will really push on this point. Even if they do use tessellation, they’re not likely to put an inordinate strain on the geometry processing throughput of Radeon HD 5000-series GPUs. To do that, we’ll turn to a synthetic benchmark: Unigine’s Heaven 2.0 demo.
This DirectX 11 demo looks pretty nice, overall. One of the major highlights of the demo is its use of tessellation, and the new 2.0 release takes that even further than earlier versions. I think this demo presents us with a nice opportunity peek into the Fermi architecture’s geometry processing abilities, but I do have some reservations to note.
Although the Heaven demo does push through a lot of triangles, it’s apparently not especially smart about how it does so. To my eye, the three levels of tessellation available in the demo, “moderate,” “normal,” and “extreme,” are visually indistinguishable unless you switch to a wireframe view. That suggests, well, that one shouldn’t use the higher levels of geometry subdivision, since they’re not really helping.
Once you do switch into wireframe view, you can begin to see the problem. Here’s an example from the demo.
That’s a tremendous amount of detaillarge portions of the screen are just white with polygon edges. Nifty, right? But look at the close-up of the hull of the ship, and you can see that its silhouette isn’t a smooth curve as one would expect with a nicely tessellated object. Instead, major transitions between large polygons create visible seams, while all of the detail goes toward subdividing the geometry inside of those large, essentially flat triangles, where it’s pretty much useless.
Toggling tessellation on and off for this object reveals that tessellation imparts a kind of “inflated” look to it, where the white areas covered by lots of detail tend to swell as if they’d been exposed to a Joe Biden speech. This is one use of tessellation, I suppose, but it doesn’t seem to be a very good one. One hopes Nvidia won’t be tempted to persuade game developers to make inappropriate use of tessellation just so it can demonstrate the hidden virtues of the Fermi architecture. True progress on this front will require perceptible and unequivocal visual improvements, not just… inflation.
Nevertheless, we do have a heaping load of triangles, and we can see how the GF100 handles them compared to the competition.
The GF100 cards perform quite well in the Heaven demo, and they become relatively stronger when we move from the “normal” tessellation level to “extreme.” It is nice to see that this attribute of the Fermi architecture pays off in a measurable way. Of course, whether or not that will translate into higher image quality and better performance in real games any time soon is another question, one whose answer may not be so positive.
Round 2: Antialiasing scaling with Far Cry 2
My thought here was to show you how performance scaled at different antialiasing levels, because this has been a point of distinction between AMD and Nvidia GPUs in recent years. Starting with the 4000 series, Radeons have handled 8X multisampled AA with a fairly small performance hit compared to 4X AA. Meanwhile, the GeForce GTX 200 series has seen a steeper drop in frame rates when going from 4X to 8X MSAA. (The obvious solution with GeForce cards has been to use one of Nvidia’s coverage-sampled AA modes, which achieve similar image quality with less of a performance hit. However, doing so complicated direct comparisons between GPU brands, since AMD has no direct analog to CSAA.) Nvidia expects the GF100 to remedy that problem thanks to improved color compression in its ROPs.
Trouble is, I’m not sure these results show us what we expected to see. The GeForce GTX 285 doesn’t slow down inordinately when asked to run in 8X MSAA mode, upending our expectations. The contest between the Radeon HD 5870 2GB and the GeForce GTX 470 may be instructive, though. The 5870 2GB churns out more frames per second without antialiasing and with 2X AA enabled. At 4X and 8X, the GTX 470 proves quicker. I’d like to confirm this with a similar set of directed tests using another game, but perhaps the GF100’s ROPs have been brought up to snuff in this respect.
Round 2: DiRT 2
Our second round of DiRT 2 tests raises the ante by switching to DirectX 11 and cranking up the in-game visual quality settings to the “Ultra” preset. We’re also using a newer version of DiRT 2 with some graphical tweaks, so these results truly aren’t comparable to our round-one set. In DX11, this game tessellates the surfaces of the water puddles on the track so they behave more realistically. Tessellation is also used to increase the detail of the characters in the crowd beside the track.
Once we get to the highest resolution, the GTX 480 once again is no faster than the Radeon HD 5870 in this game, although the GTX 480 SLI config scales better than two 5870s in a CrossFire pairing.
Round 2: Battlefield: Bad Company 2
I considered getting Crysis Warhead into the mix of games we’d test for this reviewuntil I got a good look at Bad Company 2. I think we have a new winner in the ongoing sweeps for best-looking PC game. BC2 looks how the Crysis games wanted to look, before they got sidetracked by all of that on-screen high-frequency noise.
BC2 uses DirectX 11, but according to this interview, DX11 is mainly used to speed up soft shadow filtering. The DirectX 10 rendering path produces the same images.
Since these are all relatively fast graphics cards, we turned up all of the image quality settings in the game. Our test sessions took place in the first 60 seconds of the “Heart of Darkness” level.
Here we are again, with a major new game that really takes good advantage of a modern PC’s capabilities, and the new GeForces are no faster than their less-expensive rivals from AMD. Heck, the GTX 470 performs almost exactly like the GeForce GTX 285.
Contrary to appearances, though, Nvidia gets points for a better multi-GPU implementation in BC2. Although the CrossFire solutions produce higher performance numbers, both the 5970 and the dual 5870s exhibited image quality issues in this game, even with the latest profile update from AMD.
Incidentally, the average frame rates you’re seeing in the graph above may seem relatively low for the slower cards, but notice that the minimum frame rates are fairly high. All of these cards delivered a fluid, playable experience during our testing. Our frame-by-frame plots illustrate that fact visually.
Round 2: Metro 2033
If Bad Company 2 has a rival for the title of best-looking game, it’s gotta be Metro 2033. This game uses DX10 and DX11 to create some of the best visuals on the PC today. You can get essentially the same visuals using either version of DirectX, but with DirectX 11, Metro 2033 offers a couple of additional options: tessellation and a DirectCompute-based depth of field shader. If you have a GeForce card, Metro 2033 will use it to accelerate some types of in-game physics calculations, since it uses the PhysX API. We didn’t enable advanced PhysX effects in our tests, though, since we wanted to do a direct comparison to the new Radeons. Perhaps next time. See here for more on this game’s exhaustively state-of-the-art technology.
If it was capable, we tested each card three ways: with DX10, DX11, and DX11 with tessellation and the advanced depth-of-field shader activated. Also, out of curiosity, I made a late addition, testing the GTX 480 and the 5870 2GB with only tessellation enabled, without the advanced depth-of-field shader. Otherwise, we had Metro 2033‘s best visual quality settings selected, along with the game engine’s built-in adaptive antialiasing scheme.
To get repeatable results with FRAPS, we decided to test this game a little bit differently. We stood in one spot, in the scene shown above, and captured frame rates over a 20-second period. This scene is packed with non-player characters and is relatively intensive compared to many areas in the game, so consider these frame rates more of a likely minimum. Many areas of the game will run faster than this particular spot does.
Without the advanced features enabled, this game runs at about the same speed with DX10 and DX11, and the new GeForces are a smidgen quicker than the closest competing Radeons. Enabling both tessellation and advanced depth of field exacts a hefty performance penalty from every solution we tested, and frankly, the visual differences are tough to detect. I wouldn’t bother with those features.
As I said, out of curiosity, I tried using just tessellation on a couple of cards. In that configuration, only three frames per second separate the GTX 480 from the Radeon HD 5870 2GB, and neither card really manages acceptable frame rates.
We measured total system power consumption at the wall socket using an our fancy new Yokogawa WT210 digital power meter. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The cards were plugged into a motherboard on an open test bench.
The idle measurements were taken at the Windows desktop with the Aero theme enabled. The cards were tested under load running Left 4 Dead at a 2560×1600 resolution with 4X AA and 16X anisotropic filtering. We test power with Left 4 Dead because we’ve found that this game’s fairly simple shaders tend to cause GPUs to draw quite a bit of power, so we think it’s a solidly representative peak gaming workload.
Well, that’s not very good. At idle, the GTX 470’s power draw isn’t too scary, but the GTX 480 pulls more juice than the dual-GPU Radeon HD 5970. Two idle GTX 480s in SLI draw just 20W less than a Radeon HD 5850 does while running a game.
The new GeForces draw substantially more power when running a game, too, than the competing Radeons. You’ve gotta take power circuitry inefficiencies into account, of course, but our GTX 480 system pulls 105W more under load than the same system with a Radeon HD 5870 installed. Wow.
We measured noise levels on our test system, sitting on an open test bench, using an Extech model 407738 digital sound level meter. The meter was mounted on a tripod approximately 8″ from the test system at a height even with the top of the video card. We used the OSHA-standard weighting and speed for these measurements.
You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.
The GF100 cards’ higher power draw numbers translate pretty directly into higher noise level readings on our meter. In spite of some slick engineering in the GTX 480’s cooler, the card is quite a bit noisier than a stock Radeon HD 5870. The only single-GPU solution that’s louder is Asus’ overclocking-oriented Matrix 5870 2GB, which is tuned to keep temperatures down.
I should note that I did all of my GTX 480 SLI testing with the cards installed together in adjacent PCIe x16 slots. Nvidia’s reviewer’s guide suggests separating the cards if possible by installing them spaced apart, but doing so would have sacrificed eight lanes of PCIe connectivity on our X58 motherboardand possibly some performance. The GTX 480 SLI config might have been quieterand slowerif I had taken Nvidia’s advice.
For what it’s worth, in case the graphs don’t really convey this, two GTX 480s in SLI are really frickin’ loud.
We used GPU-Z to log temperatures during our load testing. In the case of multi-GPU setups, we recorded temperatures on the primary card.
Looks like Nvidia has tuned its fan control profiles to achieve lower noise levels at the cost of higher GPU temperatures. The GTX 480 runs 12° hotter than the GTX 285, and the GTX 470 is a tick hotter still. With that said, Nvidia hasn’t really forged any new territory. The Radeon HD 5870 runs at similar temperatures, and it gets even warmer in a CrossFire team.
We don’t usually report idle GPU temperatures because they can vary quite a bit, depending on the conditions, and aren’t always primarily determined by fan speed control profiles. I should point out, though, that both our GTX 470 and 480 cards tended to drop back to much lower temperatures as they idled on our open-air test bench. We didn’t see the sort of constant 90° temperatures that we saw from, say, the first wave of Radeon HD 4850s.
These two new GeForces draw more power, generate more heat and noise, and have higher price tags than the closest competing Radeons, but they’re not substantially faster at running current games. For many, that will be the brutal bottom line on the GeForce GTX 470 and 480. Given the complexity and the rich feature sets of modern graphics processors, that hardly seems fair, but the GF100 is facing formidable competition that made it to market first and is clearly more efficient in pretty much every way that matters. The GF100’s major contribution to real-time graphics, beyond the DirectX 11 features that its competitor also possesses, is an increased geometry processing facility that has little value for today’s games and questionable value for tomorrow’s. As a graphics geek, it’s not hard to admire this aspect of the GF100, but I think it will be difficult for gamers to appreciate for quite some timeperhaps throughout the useful life of these graphics cards.
Then again, given the supply problems and inflated prices that we’ve seen in the graphics card market over the past six months, we’re just glad to see Nvidia back at the table. Even if the value propositions on the GeForce GTX 470 and 480 aren’t spectacular, they’re a darn sight better than zero competition for AMD.
Also, I suspect some folks will still find these graphics cards attractive for a host of pretty decent reasons. If consumer-level GPU computing takes off, the GF100 may be the GPU to have. We haven’t formally tested its compute prowess against the latest Radeons, but the GTX 480’s exceptional performance in 3DMark’s cloth and particle simulations is a postive indicator. Nvidia is also constantly pushing on initiatives that can give its GPUs an exclusive advantage over any competitor, whether it be games with advanced PhysX effects or 3D Visionor just plain old driver solidity and instant compatibility with the newest games, an area where I still think Nvidia has an edge on AMD.
Then again, AMD seems to be making inroads with game developers, which is what happens when you’re first to market with a whole family of DX11 GPUs.
For what it’s worth, the GF100 may not be a disappointment in all markets. With its geometry processing throughput, it should make a fantastic Quadro workstation graphics card. GF100-based Tesla cards could still succeed in the realm of dedicated GPU computing, too. The Fermi architecture really is ahead of any of its competitors there for a number of reasons that can’t be ignored, and the question now is whether Nvidia can build a considerable business around it. The firm seemed to be expecting huge progress in this regard when it revealed the first details of this architecture to us.
We’re curious to see how good a graphics chip this generation of Nvidia’s technology could make when it’s stripped of all the extra fat needed to serve other markets: the extensive double-precision support, ECC, fairly large caches, and perhaps two or three of its raster units. You don’t need any of those things to play gamesor even to transcode video on a GPU. A leaner, meaner mid-range variant of the Fermi architecture might make a much more attractive graphics card, especially if Nvidia can get some of the apparent chip-level issues worked out and reach some higher clock speeds.