As you may know, the GeForce GTX 480 had a troubled childhood. The GF100 chip that powered it was to be Nvidia’s first DirectX 11-class graphics processor, based on the ambitious new Fermi architecture. But the GF100 was famously tardy, hitting the market over six months after the competition’s Radeon HD 5000-series of DX11-capable chips. When it did arrive aboard the GTX 470 and 480, the GF100 had many of the hallmarks of a shaky semiconductor product: clock speeds weren’t as fast as we’d anticipated, power consumption and heat production were right at the ragged edge of what’s acceptable, and some of chip’s processing units were disabled even on the highest-end products. Like Lindsay Lohan, it wasn’t living up to its potential. When we first tested the GTX 480 and saw that performance wasn’t much better than the smaller, cooler, and cheaper Radeon HD 5870, we were decidedly underwhelmed.
Yet like Miss Lohan, the GF100 had some rather obvious virtues, including formidable geometry processing throughput and, as we learned over time, quite a bit of room for performance increases through driver updates. Not only that, but it soon was joined by a potent younger sibling with a different take on the mix of resources in the Fermi architecture, the GF104 chip inside the very competitive GeForce GTX 460 graphics cards.
Little did we know at the time, but back in February of this year, before the first GF100 chips even shipped in commercial products, the decision had been made in the halls of Nvidia to produce a new spin of the silicon known as GF110. The goal: to reduce power consumption while improving performance. To get there, Nvidia engineers scoured each block of the chip, employing lower-leakage transistors in less timing-sensitive logic and higher-speed transistors in critical paths, better adapting the design to TSMC’s 40-nm fabrication process.
At the same time, they made a few targeted tweaks to the chip’s 3D graphics hardware to further boost performance. The first enhancement was also included in the GF104, a fact we didn’t initially catch. The texturing units can filter 16-bit floating-point textures at full speed, whereas most of today’s GPUs filter this larger format at half their peak speed. The additional filtering oomph should improve frame rates in games where FP16 texture formats are used, most prominently with high-dynamic-range (HDR) lighting algorithms. HDR lighting is fairly widely used these days, so the change is consequential. The caveat is that the GPU must have the bandwidth needed to take advantage of the additional filtering capacity. Of course, the GF110 has gobs of bandwidth compared to most.
The second enhancement is unique to GF110: an improvement in Z-culling efficiency. Z culling is the process of ruling out pixels based on their depth; if a pixel won’t be visible in the final, rendered scene because another pixel is in front of it, the GPU can safely neglect lighting and shading the occluded pixel. More efficient Z culling can boost performance generally, although the Z-cull capabilities of current GPUs are robust enough that the impact of this tweak is likely to be modest.
The third change is pretty subtle. In the Fermi architecture, the shader multiprocessors (SMs) have 64KB of local data storage that can be partitioned either as 16KB of L1 cache and 48KB of shared memory or vice-versa. When the GF100 is in a graphics context, the SM storage is partitioned in a 16KB L1 cache/48KB shared memory configuration. The 48KB/16KB config is only available for GPU computing contexts. The GF110 is capable of running with a 48KB L1 cache/16KB shared memory split for graphics, which Nvidia says “helps certain types of shaders.”
Now, barely nine months since the chip’s specifications were set, the GF110 is ready to roll aboard a brand-new flagship video card, the GeForce GTX 580. GPU core and memory clock speeds are up about 10% compared to the GTX 480the GPU core is 772MHz, shader ALUs are double-clocked to 1544MHz, and the GDDR5 memory now runs at 4.0 GT/s. All of the chip’s graphics hardware is enabled, and Nvidia claims the GTX 580’s power consumption is lower, too.
|GeForce GTX 460 1GB 810MHz||25.9||47.6||124.8||1089||1620|
|GeForce GTX 480||33.6||21.0||177.4||1345||2800|
|GeForce GTX 580||37.1||49.4||192.0||1581||3088|
|Radeon HD 5870||27.2||34.0||153.6||2720||850|
|Radeon HD 5970||46.4||58.0||256.0||4640||1450|
On paper, the changes give the GTX 580 a modest boost over the GTX 480 in most categories that matter. The gain in FP16 filtering throughput, though, is obviously more prodigious. Add in the impact of the Z-cull improvement, and the real-world performance could rise a little more.
A question of balance
Notably, other than the increase in FP16 filtering rate, the GF110 retains the same basic mix of graphics resources as the GF100. We’ll raise an eyebrow at that fact because the GF104 is arguably more efficient, yet it hits some very different notes. Versus GF100/110, the GF104’s ROP rate, shader ALUs, and memory interface width are lower by a third, the rasterization rate is cut in half, yet the texturing rate remains constant.
Not reflected in the tables above is another element: the so-called polymorph engines in the Fermi architecture, dedicated hardware units that handle a host of pre-rasterization geometry processing chores (including vertex fetch, tessellation/geometry expansion, viewport transform, attribute setup, and stream output). The GF104 has eight such engines, while the GF100/110 have 16. Only 15 of the 16 are enabled on the GTX 480, while the GTX 580 uses them all, so it possesses even more geometry processing capacity than anything to come before. (If you want to more fully understand the configuration of the different units on the chip and their likely impact on performance, we’d refer you to our Fermi graphics architecture overview for the requisite diagrams and such. Nearly everything we said about the GF100 still applies to the GF110.)
Also still kicking around inside of the GF110 are the compute-focused features of the GF100, such as ECC protection for on-chip storage and the ability to handle double-precision math at half the rate of single-precision. These things are essentially detritus for real-time graphics, and as a consequence of product positioning decisions, GTX 580’s double-precision rate remains hobbled at one-quarter of the chip’s peak potential.
We detected some trepidation on Nvidia’s part about the reception the GF110 might receive, given these facts. After all, there was a fairly widespread perception that GF100’s troubles were caused at least in part by two things: its apparent dual focus on graphics and GPU computing, and its clear emphasis on geometry processing power for graphics. The GF104’s singular graphics mission and improved efficiency in current games only fed this impression.
Nvidia’s counter-arguments are worth hearing, though. The firm contends that any high-end GPU like this one has plenty of throughput to handle today’s ubiquitous console ports, with their Unreal engine underpinnings and all that entails. The GF110’s relative bias toward geometry processing power is consistent with Nvidia’s vision of where future games based on DirectX 11 should be headedwith more complex models, higher degrees of tessellation, and greater geometric realism. In fact, Drew Henry, who runs Nvidia’s GeForce business, told us point blank that the impetus behind the GF110 project was graphics, not GPU computing products. That’s a credible statement, in our estimation, because the GF100-based Tesla cards have essentially zero competition in their domain, while the GeForces will face a capable foe in AMD’s imminent Cayman GPU.
Our sense is that, to some extent, the GF110’s success will depend on whether game developers and end users are buying what Nvidia is selling: gobs and gobs of polygons. If that’s what folks want, the GF110 will deliver in spades. If not, well, it still has 50% more memory bandwidth, shader power, and ROP throughput than the GF104, making it the biggest, baddest GPU on the planet by nearly any measure, at least for now.
The card and cooler
|GeForce GTX 580||772||512||64||48||4.0 Gbps||384||244W||$499.99|
Amazingly, I’ve put enough information in the tables, pictures, and captions on this page that I barely have to write anything. Crafty, no? We’ve already given you some of the GTX 580’s vitals on the previous page, but the table above fills out the rest, including the $500 price tag. Nvidia expects GTX 580 cards to be selling at online retailers for that price starting today.
Although we didn’t find it to be especially loud in a single-card config, the GeForce GTX 480 took some criticisms for the noise and heat it produced. The noise was especially a problem in SLI, when the heat from two cards together had to be dissipated. Nvidia has responded to that criticism by changing its approach on several fronts with the GTX 580. For one, the end of the cooling shroud, pictured above, is angled more steeply in order to allow air into the blower.
Rather than using quad heatpipes, the GTX 580’s heatsink has a vapor chamber in its copper base that is purported to distribute heat more evenly to its aluminum fins. Meanwhile, the blades on the blower have been reinforced with a plastic ring around the outside. Nvidia claims this modification prevents the blades from flexing and causing turbulence that could translate into a rougher sound. The GTX 580 also includes a new adaptive fan speed control algorithm that should reduce its acoustic footprint.
The newest GeForce packs an additional power safeguard, as well. Most GeForces already have temperature-based safeguards that will cause the GPU to slow down if it becames too hot. The GTX 580 adds a power monitoring capability. If the video card is drawing too much current through the 12V rails, the GPU will slow down to keep itself within the limits of the PCIe spec. Amusingly, this mechanism seems to be a response to the problems caused almost solely by the FurMark utility. According to Nvidia, outside of a few apps like that one, the GTX 580 should find no reason to throttle itself based on power delivery.
So who’s the competition?
AMD’s decision a couple of years ago to stop making really large GPUs and instead address the high-end video card market with multi-GPU solutions makes sizing up the competition for the GTX 580 a little bit complicated. We have some candidates, though, with various merits.
If you’re insistent on a single-GPU solution, Asus’ Republic of Gamers Matrix 5870 might be your best bet. This card has 2GB of GDDR5 memory, double that of the standard Radeon HD 5870, and runs at 894MHz, or 44MHz higher than AMD’s stock clocks for the Radeon HD 5870. That’s a modest bump in clock frequency, but Asus has given this card a host of overclocking-centric features, so the user can take it up from there. The ROG Matrix 5870 lists for $499.99 at Newegg, although it’s currently out of stock. We’ve included this card in our test results on the following pages, though it’s looking a little dated these days.
AMD’s true high-end solution is the Radeon HD 5970, pictured above. With dual Cypress GPUs, the 5970 is pretty much a performance titan. The 5970 has always been something of a strange specimen, for various reasons, including the fact that it has been incredibly scarce, and thus pricey, for much of the time since its introduction late last year. The card itself is incredibly long at 12.16″, ruling it out of an awful lot of mid-sized PC enclosures. As a dual-GPU solution based on AMD’s CrossFireX technology, the 5970 has the same potential for compatibility headaches and performance scaling pitfalls as a dual-card CrossFireX config. Also, the 5970 isn’t quite as fast as dual Radeon HD 5870s. Instead, it leaves some of its performance potential on the table, because its clock rates have been held down to keep power consumption in check. If your PSU can handle it, though, AMD expects the 5970 to reach dual-5870 clock speeds with the aid of a third-party overclocking tool with voltage control.
Oddly enough, the 5970 emerged from obscurity in the past week, when AMD notified us of the availability of cards like this Sapphire 5970 at Newegg at a new, lower price. Not coincidentally, that price is $499.99just the same as the GeForce GTX 580. There’s a $30 mail-in rebate attached, too, for those who enjoy games of chance. That’s a heckuva welcome for Nvidia’s latest, don’t you think?
We tested the 5970 at it stock speed, and in hindsight, perhaps we should have tested it overclocked to dual 5870 speeds, as well, since that’s part of its appeal. However, we had our eyes set on a third, more interesting option.
The introduction of the Radeon HD 6870 largely obsoleted the Radeon HD 5870, and we’d argue that a pair of 6870 cards might be AMD’s best current answer to the GTX 580. Gigabyte’s version of the 6870 is going for $244.99 at Newegg. (You can grab a 6870 for $239 if you’re willing to try your luck with PowerColor or HIS.) Two of these cards will set you back about as much as a single GTX 580. If you have the expansion slot real estate and PSU leads to accommodate them, they might be a good pick. Heck, they’re what we chose for the Double-Stuff build in our most recent system guide.
Naturally, we’ve tested both single- and dual-6870 configurations, along with some in-house competition for the GTX 580 in the form of a pair of GeForce GTX 460 1GB cards clocked at over 800MHz. These GTX 460 cards are the most direct competitors for the 6870, as well.
Our testing methods
Many of our performance tests are scripted and repeatable, but for some of the games, including Battlefield: Bad Company 2, we used the Fraps utility to record frame rates while playing a 60-second sequence from the game. Although capturing frame rates while playing isn’t precisely repeatable, we tried to make each run as similar as possible to all of the others. We raised our sample size, testing each FRAPS sequence five times per video card, in order to counteract any variability. We’ve included second-by-second frame rate results from Fraps for those games, and in that case, you’re seeing the results from a single, representative pass through the test sequence.
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we’ve reported the median result.
Our test systems were configured like so:
|Processor||Core i7-965 Extreme 3.2GHz|
|North bridge||X58 IOH|
|Memory size||12GB (6 DIMMs)|
|Memory type||Corsair Dominator CMD12GX3M6A1600C8
DDR3 SDRAMat 1600MHz
|Memory timings||8-8-8-24 2T|
|Chipset drivers||INF update 18.104.22.1685
Rapid Storage Technology 22.214.171.1244
with Realtek R2.51 drivers
|Asus Radeon HD
with Catalyst 10.10c drivers
Matrix Radeon HD
with Catalyst 10.10c drivers
| Radeon HD 5970
with Catalyst 10.10c drivers
|XFX Radeon HD
with Catalyst 10.10c drivers
Radeon HD 670 1GB + XFX Radeon HD
with Catalyst 10.10c drivers
Talon Attack GeForce GTX 460 1GB 810MHz
with ForceWare 260.99 drivers
Talon Attack GeForce GTX 460 1GB 810MHz +
EVGA GeForce GTX 460 FTW 1GB 850MHz
with ForceWare 260.99 drivers
Galaxy GeForce GTX 470 1280MB GC
with ForceWare 260.99 drivers
GeForce GTX 480 1536MB
with ForceWare 260.99 drivers
GeForce GTX 580 1536MB
with ForceWare 262.99 drivers
|Hard drive||WD RE3 WD1002FBYS 1TB SATA|
|Power supply||PC Power & Cooling Silencer 750 Watt|
|OS||Windows 7 Ultimate x64 Edition
DirectX runtime update June 2010
Thanks to Intel, Corsair, Western Digital, Gigabyte, and PC Power & Cooling for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.
Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.
We used the following test applications:
- D3D RightMark beta 4
- Unigine Heaven 2.1
- ShaderToyMark 0.1.0
- TessMark 0.2.2.
- 3DMark Vantage 1.0.2
- Aliens vs. Predator benchmark
- Battlefield: Bad Company 2
- HAWX 2
- DiRT 2
- Left 4 Dead 2
- Sid Meier’s Civilization V
- Metro 2033
- StarCraft II
- Fraps 3.2.3
- GPU-Z 0.4.7
Some further notes on our methods:
We measured total system power consumption at the wall socket using a Yokogawa WT210 digital power meter. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The cards were plugged into a motherboard on an open test bench.
The idle measurements were taken at the Windows desktop with the Aero theme enabled. The cards were tested under load running Left 4 Dead 2 at a 1920×1080 resolution with 4X AA and 16X anisotropic filtering. We test power with Left 4 Dead 2 because we’ve found that the Source engine’s fairly simple shaders tend to cause GPUs to draw quite a bit of power, so we think it’s a solidly representative peak gaming workload.
We measured noise levels on our test system, sitting on an open test bench, using an Extech 407738 digital sound level meter. The meter was mounted on a tripod approximately 10″ from the test system at a height even with the top of the video card.
You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.
- We used GPU-Z to log GPU temperatures during our load testing.
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
We’ll kick off our testing with power and noise, to see whether the GTX 580 lives up to its promise in these areas. Notice that the cards marked with asterisks in the results below have custom cooling solutions that may perform differently than the GPU maker’s reference solution.
The GTX 580’s power use when idling at the Windows desktop is quite reasonable for such a big chip. Our test system requires 13W less with a GTX 580 than with a GTX 480, a considerable drop. When we fire up Left 4 Dead 2, a game we’ve found causes GPUs to draw quite a bit of power, then the GTX 580 pulls a little more juice than its predecessor. That’s not bad considering the higher clock rates and additional enabled units, but Nvidia claims the GTX 580 draws less power than the 480, even at peak.
When we asked Nvidia what the story was, they suggested we try a DX11-class workload, such as the Unigine Heaven demo, to see the difference between the GTX 480 and GTX 580. So we did:
Peak power draw is lower for both video cards, but the GTX 580 uses substantially less than the 480. Obviously, much depends on the workload involved. We may have to consider using multiple workloads for power testing going forward.
Noise levels and GPU temperatures
In spite of its relatively high peak power draw, the GTX 580 is among the quietest cards we tested. That’s a testament to the effectiveness of this card’s revised cooling design. Our positive impression of the cooler is further cemented by the fact that the GTX 580 runs 10°C cooler than the GTX 480, though its power use is comparable. The Radeon HD 5970 draws slightly less power than the GTX 580 but runs hotter and generates substantially more noise.
We do have some relatively boisterous contestants lined up here. MSI has tuned its dual-fan cooler on the GTX 460 to shoot for very low GPU temperatures, making it rather noisy. That dynamic gets worse when we slap another card next to it in SLI, blocking the intake for both fans. GPU temperatures shoot up to match the rest of the pack, and noise levels climb to a dull roar. Unfortunately, many GTX 460 cards have similar custom coolers that don’t fare well in SLI. The Radeon HD 6870’s blower is a better bet for multi-GPU use, but it’s still rather loud, especially for a single-card config. Cards that draw substantially more power are quieter and run at comparable temperatures.
Pixel fill and texturing performance
|GeForce GTX 460 1GB 810MHz||25.9||47.6||47.6||124.8|
|GeForce GTX 470 GC||25.0||35.0||17.5||133.9|
|GeForce GTX 480||33.6||42.0||21.0||177.4|
|GeForce GTX 580||37.1||49.4||49.4||192.0|
|Radeon HD 5870||27.2||68.0||34.0||153.6|
|Radeon HD 6870||28.8||50.4||25.2||134.4|
|Radeon HD 5970||46.4||116.0||58.0||256.0|
We’ve already looked at some of the theoretical peak numbers above, but we’ve reiterated them here a little more fully. These figures aren’t destiny for a video card. Different GPU architectures will deliver on their potential in different ways, with various levels of efficiency. However, these numbers do matter, especially among chips with similar architectural DNA.
You’d think 3DMark’s color fill rate test would track with the first column of scores above, but it turns out delivered performance is more directly affected by memory bandwidth. That’s why, for instance, the Radeon HD 6870 trails the 5870 here, in spite of a higher ROP rate. The GTX 580 is the fastest single-GPU solution, though it can’t keep up with the similarly priced multi-GPU options.
We’ve shunned 3DMark’s texture fill test recently because it doesn’t involve any sort of texture filtering. That’s tragic and sad, since texture filtering rates are almost certainly more important than sampling rates in the grand scheme of things. Still, this is a decent test of FP16 texture sampling rates, so we’ll use it to consider that aspect of GPU performance. Texture storage is, after all, essentially the way GPUs access memory, and unfiltered access speeds will matter to routines that store data and retrieve it without filtering.
AMD’s samplers are very fast indeed, as the Radeon HD 6870 keeps pace with Nvidia’s biggest, baddest GPU. The Radeon HD 5970 is more than twice as fast in this specific case.
Here’s a more proper test of texture filtering, although it’s focused entirely on integer texture formats, not FP16. Texture formats like these are still widely used in games.
AMD’s texture filtering hardware is generally quite a bit faster than Nvidia’s with integer formats. The deficit narrows as we move to higher quality filtering levels, but the year-old Radeon HD 5870 remains faster than the GeForce GTX 580.
Happily, after struggling in the dark for a while, we finally have a proper test of FP16 filtering rates, courtesy of the guys at Beyond3D. Nvidia says the GF104 and GF110 can filter FP16 textures are their full rates rather than half. What kind of performance can they really achieve?
The GTX 580 comes pretty darned close to its theoretical peak rate, and it’s nearly twice the speed of the Radeon HD 5870. That quite the reversal. The GeForce GTX 460 moves up the chart, too, but doesn’t come anywhere near as close as the GTX 580 to reaching its peak potential. The GTX 580’s additional memory bandwidth and larger L2 cache50% better on both frontslikely account for the difference in delivered performance.
|GeForce GTX 460 1GB 810MHz||1089||1620||124.8|
|GeForce GTX 470 GC||1120||2500||133.9|
|GeForce GTX 480||1345||2800||177.4|
|GeForce GTX 580||1581||3088||192.0|
|Radeon HD 5870||2720||850||153.6|
|Radeon HD 6870||2016||900||134.4|
|Radeon HD 5970||4640||1450||256.0|
In recent months, our GPU reviews have been missing a rather important element: tests of GPU shader performance or processing power outside of actual games. Although some of today’s games use a fairly rich mix of shader effects, they also stress other parts of the GPU at the same time. We can better understand the strengths and weakeness of current GPU architectures by using some targeted shader benchmarks. The trouble is: what tests are worth using?
Fortunately, we have several answers today thanks to some new entrants. The first of those is ShaderToyMark, a pixel shader test based on six different effects taken from the nifty ShaderToy utility. The pixel shaders used are fascinating abstract effects created by demoscene participants, all of whom are credited on the ShaderToyMark homepage. Running all six of these pixel shaders simultaneously easily stresses today’s fastest GPUs, even at the benchmark’s relatively low 960×540 default resolution.
You may be looking between the peak arithmetic rates in table at the top of the page and the results above and scratching your head, but the outcome will be no surprise to those familiar with these GPU architectures. The vast SIMD arrays on AMD’s chips do indeed have higher peak theoretical rates, but their execution units can’t always be scheduled as efficiently as Nvidia’s. In this case, the GTX 580 easily outperforms the single-GPU competition. Unfortunately, this test isn’t multi-GPU compatible, so we had to leave out those configs.
Incidentally, this gives us our first good look at the shader performance differences between the Radeon HD 5870 and 6870. The 6870 is based on the smaller Barts GPU and performs nearly as well as the 5870 in many games, but it is measurably slower in directed tests, as one might expect.
Up next is a compute shader benchmark built into Civilization V. This test measures the GPU’s ability to decompress textures used for the graphically detailed leader characters depicted in the game. The decompression routine is based on a DirectX 11 compute shader. The benchmark reports individual results for a long list of leaders; we’ve averaged those scores to give you the results you see below.
These results largely mirror what we saw above in terms of relative performance, with the added spice of multi-GPU outcomes. Strangely, the Radeon HD 5970 stumbles a bit here.
Finally, we have the shader tests from 3DMark Vantage.
Clockwise from top left: Parallax occlusion mapping, Perlin noise,
GPU cloth, and GPU particles
These first two tests use pixel shaders to do their work, and the Radeons fare relatively well in both. The Perlin noise test, in particular, is very math intensive, and this looks to be a case in which the Radeons’ stratospheric peak arithmetic rates actually pay off.
These two tests involve simulations of physical phenomena using vertex shaders and the DirectX 10-style stream output capabilities of the GPUs. In both cases, the GeForces are substantially faster, with the GTX 580 again at the top of the heap.
Geometry processing throughput
The most obvious area of divergence between the current GPU architectures from AMD and Nvidia is geometry processing, which has become a point of emphasis with the advent of DirectX 11’s tessellation feature. Both GPU brands support tessellation, which allows for much higher geometric detail than usual to be generated and processed on the GPU. The extent of that support is the hot-button issue. With Fermi, Nvidia built the first truly parallel architecture for geometry processing, taking one of the last portions of the graphics pipeline that was processed serially and distributing it across multiple hardware units. AMD took a more traditional, serial approach with less peak throughput.
We can measure geometry processing speeds pretty straightforwardly with a couple of tools. The first is the Unigine Heaven demo. This demo doesn’t really make good use of additional polygons to increase image quality at its highest tessellation levels, but it does push enough polys to serve as a decent synthetic benchmark.
Notice that the multi-GPU solutions scale nicely in terms of geometry processing power; the alternate-frame rendering method most commonly used for load balancing between GPUs offers nearly perfect scaling on this front. Even so, the GTX 580 is still roughly a third faster than the Radeon HD 5970. Among the AMD solutions, only the dual Radeon HD 6870s can challenge the GTX 580 here, in part because of some tessellation optimizations AMD put into the Barts GPU.
TessMark’s multiple tessellation levels give us the chance to push the envelope even further, down to, well, insanely small polygons, and past the Radeons’ breaking point. This vast difference in performance once polygon counts get to a certain level will help inform our understanding of some important issues ahead. We can already see how Nvidia’s architectural choices have given the GTX 580 a distinct advantage on this front.
Now we enter into disputed territory, and in doing so, we put some of those architectural differences we’ve discussed into play. The developers of HAWX 2 have made extensive use of DirectX 11 tessellation for the terrain in this brand-new game, and they’ve built a GPU benchmark tool based on the game, as well. HAWX 2 is slated for release this coming Friday, and in advance of that, Nvidia provided us with a stand-alone copy of the benchmark tool. We like to test with new games that take advantage of the latest features, but we’ve been hearing strong objections to the use of this gamefrom none other than AMD. Here’s AMD’s statement on the matter, first released prior to the introduction of the Radeon HD 6800 series and then sent out to us again yesterday:
It has come to our attention that you may have received an early build of a benchmark based on the upcoming Ubisoft title H.A.W.X. 2. I’m sure you are fully aware that the timing of this benchmark is not coincidental and is an attempt by our competitor to negatively influence your reviews of the AMD Radeon HD 6800 series products. We suggest you do not use this benchmark at present as it has known issues with its implementation of DirectX 11 tessellation and does not serve as a useful indicator of performance for the AMD Radeon HD 6800 series. A quick comparison of the performance data in H.A.W.X. 2, with tessellation on, and that of other games/benchmarks will demonstrate how unrepresentative H.A.W.X. 2 performance is of real world performance.
AMD has demonstrated to Ubisoft tessellation performance improvements that benefit all GPUs, but the developer has chosen not to implement them in the preview benchmark. For that reason, we are working on a driver-based solution in time for the final release of the game that improves performance without sacrificing image quality. In the meantime we recommend you hold off using the benchmark as it will not provide a useful measure of performance relative to other DirectX 11 games using tessellation.
Interesting, no? I don’t need to tell you that Nvidia objects. With six days to produce our Radeon HD 6800 series review, we simply didn’t have time to look at HAWX 2 back then. I’m not sure we can come up with a definitive answer about who’s right, because that would require more knowledge about the way future games will use tessellationand we won’t know that until, you know, later. But we can explore some of the issues briefly.
With tessellation, they’re much more complex
First, above is a close-up look at the impact of HAWX 2‘s tessellation on some of the mountains over which the entire benchmark scene takes place. This is just a small portion of a higher-resolution screen capture that hasn’t been resized. Clearly, tessellation adds tremendous complexity to the shape of the mountains.
Click the image for a larger version
Above are some images, provided by Nvidia, that reveal the sort of geometric complexity involved in this scene. The lower shot, in wireframe mode, gives us a sense of the polygon sizes. Unfortunately, these images were obviously resized by Nvidia before they came to us, so we can’t really estimate the number of pixels per polygon by looking at them. Still, we have a clear sense that many, many polygons are in usemore than most of today’s games.
Is this an egregious overuse of polygons, as AMD contends? I’m not sure, but I’d say it’s less than optimal, for a few reasons. One oddity is demonstrated, albeit poorly, in the image on the right. Although everything is turned at a nearly 45° angle, what you’re seeing at the center of the picture is something important: essentially flat ground. That flat surface is covered with large number of polygons, all finely subdivided into a complex mesh. A really good dynamic tessellation algorithm wouldn’t find any reason to subdivide a flat area into so many triangles. We’ve seen such routines in operation before, or we wouldn’t know to point this out. And there’s a fair amount of flat ground in the HAWX 2 benchmark map, as shown between the mountains in the image below.
Click the image for the full-sized version
The scene above shows us another potential issue, too, which is especially apparent in the full-resolution screenshot: the silhouettes of the mountains off in the distance appear to be just as jagged and complex as those up close, yet the texture resolution on those distant peaks is greatly reduced. Now, there are reasons to do things this wayincluding, notably, the way light behaves as it’s being filtered through the atmospherebut a more conventional choice would be to use dynamic LOD to reduce both the texture resolution and geometric complexity of the far-away peaks.
Finally, although close-up mountains in HAWX 2 look amazing and darn-near photorealistic, very little effort has been spent on the hostile airplanes in the sky. The models have very low poly counts, with obvious polygon edges visible. The lighting is dead simple, and the surfaces look flat and dull. Again, that’s an odd choice.
For its part, Nvidia freely acknowledged the validity of some of our criticisms claimed the game’s developers had the final say on what went into it. Ubisoft, they said, took some suggestions from both AMD and Nvidia, but refused some suggestions from them both.
On the key issue of whether the polygon counts are excessive, Nvidia contends Ubisoft didn’t sabotage its game’s performance on Radeon hardware. Instead, the developers set their polygon budget to allow playable frame rates on AMD GPUs. In fact, Nvidia estimates HAWX 2 with tessellation averages about 18 pixels per polygon. Interestingly, that’s just above the 16 pixel/polygon limit that AMD Graphics CTO Eric Demers argued, at the Radeon HD 6800 series press day, is the smallest optimal polygon size on any conventional, quad-based GPU architecture.
With all of these considerations in mind, let’s have a look at HAWX 2 performance with DX11 terrain tessellation enabled.
Having seen these tests run, I’d say we’re getting fluid enough frame rates out of all of the cards to keep the game playable. The GeForces are much faster, though, which you might have guessed.
We don’t have any great lessons to draw from all of this controversy, but we hope you understand the issues a little better, at least. We expect to see more skirmishes of this type in the future, especially if AMD’s Cayman GPU doesn’t move decidedly in the direction of higher polygon throughput. We’re also curious to see exactly how AMD addresses this one via its promised “driver-based solution” and whether or not that solution alters the game’s image quality.
Lost Planet 2
Our next stop is another game with a built-in benchmark that makes extensive use of tessellation, believe it or not. We figured this and HAWX 2 would make a nice bridge from our synthetic tessellation benchmark and the rest of our game tests. This one isn’t quite so controversial, thank goodness.
Here’s a quick look at the subject of this benchmark, a big, slimy slug/tank character from the game. Like a lot of other current games, LP2 has its DX11 effects tacked on in a few places, mostly in the level-end boss characters like this one. Tessellation adds some detail to the slug thing, mostly apparent in the silhouette of the its tail and its, uh, knees. Whatever they’re doing here works, because wow, that thing is creepy looking. I just don’t understand why the guts don’t ooze out immediately when the guys shoot it.
This benchmark emphasizes the game’s DX11 effects, as the camera spends nearly all of its time locked onto the tessellated giant slug. We tested at two different tessellation levels to see whether it made any notable difference in performance. The difference in image quality between the two is, well, subtle.
The Radeons don’t fare too poorly here, all things considered. The only solution that’s faster than the GeForce GTX 580 is a pair of 6870s in CrossFire, in fact. Unfortunately, the low clock speeds on the 5970’s individual GPUs keep it from performing as well as the 6870s in geometry-limited scenarios.
In addition to the compute shader test we’ve already covered, Civ V has several other built-in benchmarking modes, including two we think are useful for testing video cards. One of them concentrates on the world leaders presented in the game, which is interesting because the game’s developers have spent quite a bit of effort on generating very high quality images in those scenes, complete with some rather convincing material shaders to accent the hair, clothes, and skin of the characters. This benchmark isn’t necessarily representative of Civ V‘s core gameplay, but it does measure performance in one of the most graphically striking parts of the game. As with the earlier compute shader test, we chose to average the results from the individual leaders.
The Radeons pretty much clean up here, quite possibly due to their pixel shading prowess. The GTX 580 only manages to match the Radeon HD 5870.
Another benchmark in Civ V focuses, rightly, on the most taxing part of the core gameplay, when you’re deep into a map and have hundreds of units and structures populating the space. This is when an underpowered GPU can slow down and cause the game to run poorly. Unfortunately, this test doesn’t present its results in terms of frames per second. We can use the scores it produces to compare between video cards, but we can’t really tell whether performance will be competent based on these numbers alone. For what it’s worth, I’m pretty confident most of these cards are capable of producing tolerable frame rates at this very high resolution with 8X multisampling.
In this measure of actual gameplay performance, the GTX 580 comes in just behind the Radeon HD 5970. The only single-GPU solution that comes close is the GTX 480, and it’s more than 10% slower than the 580.
Up next is a little game you may have heard of called StarCraft II. We tested SC2 by playing back a match from a recent tournament using the game’s replay feature. This particular match was about 10 minutes in duration, and we captured frame rates over that time using the Fraps utility. Thanks to the relatively long time window involved, we decided not to repeat this test multiple times, like we usually do when testing games with Fraps.
We tested at the settings shown above, with the notable exception that we also enabled 4X antialiasing via these cards’ respective driver control panels. SC2 doesn’t support AA natively, but we think this class of card can produce playable frame rates with AA enabledand the game looks better that way.
Once more, the GTX 580 is the fastest single-GPU solution, but it can’t quite catch the dual 6870s or GTX 460s. The Radeon HD 5970 struggles yet again, oddly enough, while the 6870 CrossFireX config performs quite well.
Battlefield: Bad Company 2
BC2 uses DirectX 11, but according to this interview, DX11 is mainly used to speed up soft shadow filtering. The DirectX 10 rendering path produces the same images.
We turned up nearly all of the image quality settings in the game. Our test sessions took place in the first 60 seconds of the “Heart of Darkness” level.
The GTX 480 is in a virtual tie with the Radeon HD 5870, and the GTX 580 breaks that deadlock. Still, all of the dual-GPU solutions are faster, including the Radeon HD 5970.
This time around, we decided to test Metro 2033 at multiple image quality levels rather than multiple resolutions, because there’s quite a bit of opportunity to burden these GPUs simply using this game’s more complex shader effects. We used three different quality presets built into the game’s benchmark utility, with the performance-destroying advanced depth-of-field shader disabled and tessellation enabled in each case.
Interestingly, the Radeons grow relatively stronger as the complexity of the shader effects rises. By the time we reach the highest quality settings, the 5970 has matched the GTX 580and the dual 6870s are faster still.
Aliens vs. Predator
AvP uses several DirectX 11 features to improve image quality and performance, including tessellation, advanced shadow sampling, and DX11-enhanced multisampled anti-aliasing. Naturally, we were pleased when the game’s developers put together an easily scriptable benchmark tool. This benchmark cycles through a range of scenes in the game, including one spot where a horde of tessellated aliens comes crawling down the floor, ceiling, and walls of a corridor.
For these tests, we turned up all of the image quality options to the max, with two exceptions. We held the line at 2X antialiasing and 8X anisotropic filtering simply to keep frame rates in a playable range with most of these graphics cards.
Once more, the GTX 580 is the fastest single-GPU solution, supplanting the GTX 480 while solidly improving on its performance.
DiRT 2: DX9
This excellent racer packs a scriptable performance test. We tested at DiRT 2‘s “ultra” quality presets in both DirectX 9 and Direct X 11. The big difference between the two is that the DX11 mode includes tessellation on the crowd and water. Otherwise, they’re hardly distinguishable.
DiRT 2: DX11
When DiRT 2 is at its most strenuous, in the last few graphs above, the GTX 580 handily outperforms the GTX 480. This is one of the games where Nvidia says the GF110’s architectural enhancements, particularly the faster FP16 filtering, pay off.
On the whole, the GeForce GTX 580 delivers on much of what it promises. Power draw is reduced versus the GTX 480, at least at idle, and the card runs cooler while generating less noise than its predecessor. Performance is up substantially, between about 10 and 20%, depending on the application, which is more than enough to cement the GTX 580’s position as the fastest single-GPU graphics card on the planet.
We said earlier that the GTX 580’s competitive picture is a bit murky, but our test results have provided some additional clarity. The Radeon HD 5970 is faster than the GTX 580 across most games we tested, but we still don’t like its value proposition for several reasons. One of those reasons is the 5970’s relatively weak performance with high degrees of tessellation, a consequence of its low default clock speeds. The 5970 is also a very long card and relatively noisy.
Most of all, though, we simply prefer the option of grabbing a pair of Radeon HD 6870s. The 6870 CrossFireX setup was easily the best overall performer of any solution we tested across our suite of games. If you’re willing to live with the multi-GPU compatibility issues of the 5970, you might as well step up to two discrete cards for the same price. Heck, even though the 6870’s cooler isn’t very quiet, the dual 6870 config generated less noise under load than the 5970. We’re also much more comfortable with running two 6870s sandwiched together than we are with two of any flavor of GeForce GTX 460 we’ve yet seen. The custom coolers on various GTX 460 cards are often fine for single-card setups, but they mostly don’t play well with SLI.
With that said, we don’t entirely consider any current multi-card solution a truly direct competitor for the GeForce GTX 580. For one thing, the GTX 580 would make a heckuva building block for an SLI setup itself, and that would be one very powerful solution, likely with decent acoustics, too. (Note to self: test ASAP.) For another, we know that AMD’s Cayman GPU is coming very soon, and it should present more formidable competition for Nvidia’s new hotness.
In fact, we don’t yet have enough information to evaluate the GTX 580 properly just yet. This card is a nice advancement over the GTX 480, but is it improved enough to hold off Cayman? If the rumors are true, we may know the answer before the month is out.