So I guess our work is cut out for us.
By most measures, the introduction of a card like the Radeon HD 7990 should be simple, because it is unreservedly the most powerful graphics card the world has ever seen. The formula is straightforward enough: two Tahiti GPUs, like those driving the Radeon HD 7970, working together in CrossFire on one graphics card. That’s like… twin Clydesdales pulling your wagon, a pair of Ferrari V12s driving all four wheels, like various other poor analogies involving large-scale parallelism and testosterone. Point is, the 7990’s hardware is world-class, second-to-none stuff, capable of crunching more flops, bits, and texels than anything else you can plug into a PCIe slot.
Reviewing this thing should be, you know, fun.
But we have some difficult questions to ask about the 7990’s true performance along the way. I think that makes our task interesting, at least. Although we have loads of data collected by multiple tools, our goal is to take a very practical approach that should yield some definitive answers to the questions at hand. Let’s have a closer look at the 7990’s formidable hardware, and then we’ll dive into the performance results.
From Tahiti to New Zealand to Malta
The Radeon HD 7990’s introduction is more interesting than usual because it’s either really late, already over, or nearly didn’t happen. I’m not sure which, entirely. You see, back when AMD unveiled the Radeon HD 7970 at the end of 2011, the firm let slip a code name, New Zealand, for an upcoming dual-GPU graphics card and said it was “coming soon.” Since we in the media are given to fits of speculation, we pretty much expected to see a dual-Tahiti graphics card from AMD at some point in early 2012. That product didn’t arrive as anticipated, and we nearly gave up hope that it ever would.
Eventually, several board makers, including Asus and PowerColor, slapped two Tahiti chips onto a single card, but those products didn’t ship until late last year, in extremely limited volumes. We tried to get our hands on one of the water-cooled Asus ARES II cards for review but were told the cards were completely sold out practically as soon as the product was introduced.
We figured that was it for the 7990, but then the news broke that AMD would be extending the tenure of the Radeon HD 7000 series until the end of 2013. At that time, the company told us it had more 7000-series products on the way. Then came GDC last month, when we got our first peek at the 7990. Now here we are, well over a year since the Radeon HD 7970 was introduced, looking at an official Radeon HD 7990 reference card.
This thing even has its own code name, “Malta.” AMD tells us New Zealand is an umbrella code name that refers to all dual-Tahiti products from itself and its partners, including those in the FirePro lineup, while Malta refers specifically to this reference design, proving once and for all that codenames are almost infinitely malleable. The fact Malta exists as a reference design from AMD matters, though. AMD tells us this card will be widely available through all of its partners, a true mass-market product. Also, the level of refinement evident in this card and cooler goes well beyond what we’d expect out of a science project from a board maker.
The biggest revelation about this reference design comes courtesy of those three fans spread atop a massive, board-long array of heatpipes and fins. This card is much, much quieter than its predecessor, the Radeon HD 6990, which set some records on the Damage Labs decibel meter. Heck, the 7990 seems like a faint whisper next to the reference 7970’s cooler. AMD has put some work into searching out a quieter cooling solution. They claim air turbulence, not fan noise, creates noise in most coolers; this cooling setup reduces turbulence by pushing air down directly through the heatsink fins. The payoff will be obvious to your ears.
Pretty much anything else you can say about the 7990 requires big numbers. With dual Tahiti GPUs, it has a total of 4096 shader ALUs providing 8.2 teraflops of compute power. The board packs 6GB of GDDR5 memory. The true memory capacity is half that, with 3GB allocated to each GPU, but the memory interface is effectively 768 bits wide, with all the bandwidth that entails.
The twin GPUs are joined by a bridge chip from PLX with 48 lanes of PCIe bandwidth, 16 lanes to each GPU and 16 to the PCIe slot. AMD claims the board has “96 GB/s” of “inter-GPU bandwidth,” but do the math with me. Each PCIe x16 link can transfer 16 GB/s in one direction or 32 GB/s bidirectionally. That means GPU 1 can transfer, say, a big texture to GPU 2 at a peak rate of 16 GB/s. That’s, you know, considerably less than the claimed 96 GB/s. It should be more than sufficient, regardless.
The card can drive up to five displays simultaneously, four via DisplayPort and one via DVI, though plug adapters of various types are available. As you can see above, the board has a CrossFire connector, so you can double up on 7990s if you’re feeling a little unsure whether just two Tahiti chips will suffice. AMD says a quad CrossFire config would be ideal for driving 4K display resolutions. (Note to self: test this claim ASAP.)
Poking up out of the top of the 7990 are two eight-pin auxiliary power plugs, to the surprise of no one. The board requires 375W of power, which is 25% less than the power requirements for two separate Radeon HD 7970 GHz Edition cards. Some of the power savings likely come from clock frequencies that are a smidgen lower. The 7970 GHz Edition has a 1GHz base clock and a 1050MHz “boost” frequency. The 7990 clocks in at 950MHz base and 1000MHz boost.
At 12″, the 7990 is a full inch longer than its most direct competitor, the GeForce GTX 690, and at this point, the jokes just write themselves. The Radeon’s additional endowment may prove to be inconvenient, though, if you’re trying to install the board into any sort of mid-sized PC enclosure. You’ll want to check to see whether your case has enough room before ordering up a 7990.
Speaking of which, perhaps the largest number of all associated with this card is its price: $999.99. Gulp.
Yep. Nvidia started it with the GTX 690 and Titan, and AMD is following suit by pricing its latest premium graphics card at one shiny penny shy of a grand. I was initially surprised by this move, since the 7990 doesn’t have the distinctive industrial design touches that the GTX 690 and Titan do, such as magnesium-and-aluminum cooling shrouds, LED-lit logos, and blowers with Crisco-and-gold-dust bearing lube. The 7990 is handsome—and I’ve already told you the cooler is quiet—but it mostly just looks like another Radeon covered in shiny plastic. Asking this much is also a bit of a risk because you can buy a Radeon HD 7970 for 400 bucks at Newegg right now, so two of them presumably would be 800 bucks, which I understand is less than a grand.
However, AMD has a couple of awfully decent justifications for charging as much as it does. First of all, it’s taking this whole Never Settle game bundle concept to its terrifyingly wondrous logical conclusion. The 7990 will come with a coupon—right there in the box, to prevent redemption hassles—for the following games: BioShock Infinite, Tomb Raider, Crysis 3, Far Cry 3, Far Cry 3 Blood Dragon, Hitman: Absolution, Sleeping Dogs, and Deus Ex: Human Revolution. That’s an eleventy billion dollar value, purchased separately. But the 7990 comes with all of ’em.
The other reason AMD can get away with asking a grand for this card is simply that the specs justify it. Have a look at the 7990 versus the competition:
|GeForce GTX 680||34||135||135||3.3||4.2||192|
|Radeon HD 7970
As I’ve said, the GeForce GTX 690 is the 7990’s nearest competitor, and the 7990 has substantially higher peak rates in the two most critical categories for modern GPU hardware: shader flops and memory bandwidth. Granted, the GK104 chips driving the GTX 690 have proven to be formidably efficient performers, but Tahiti’s larger shader array and 384-bit memory interface cannot be denied. One can see why AMD would price the 7990 directly opposite the GTX 690 and its theoretically less powerful single-GPU sibling, the GTX Titan. Add in the value of that stupendously stuffed game bundle, and the 7990 practically looks like a bargain by comparison. Also, I think there’s some funky psychology at work at the ultra high end of the market: lower prices communicate inferiority, and any signal that carries that message is directly at odds with the 7990’s whole mission.
Here’s the deal
Last time out, by using the FCAT tools that measure how frames are delivered to the display, we found some troubling problems with Radeon CrossFire multi-GPU configs.
We’ve known for a while about multi-GPU microstuttering, a timing problem related to the alternate frame rendering method of load-balancing employed by both CrossFire and SLI. Frames are doled out to one GPU and then the other in interleaved fashion, but sometimes, the GPUs can get out pretty far out of sync. The result is that frames are dispatched in an uneven manner, introducing a tight pattern of jitter into the animation. Here’s an example from my original Inside the second article.
We can detect such problems with software tools like Fraps, which can detect when the game engine signals to the DirectX API that it has handed off a completed frame for processing. That’s relatively early in the frame rendering process—at the orange line marked “Present()” in the simplified diagram below.
We learned using frame capture tools that the microstuttering patterns in CrossFire solutions can become exaggerated by the time frames reach the display. Instead of something like the mild case of jitter in the plot above, the true pattern of frames arriving onscreen could look more like this:
In this example, the “short” frames in the sequence arrive only a fraction of a millisecond behind the “long” frames. With vsync disabled, those short or “runt” frames may only occupy a handful of horizontal lines across the screen, adding virtually no additional visual information to the picture. Here’s a zoomed-in example from BF3 with the FCAT overlay on the left showing a different color for each individual frame rendered by the GPU:
Yes, a slice the height of that olive-colored bar is all you see of a fully rendered GPU frame. In other cases, we found that CrossFire simply dropped frames entirely, never showing even a portion of them onscreen. Yet those runt and dropped frames are counted by software benchmark tools as entirely valid, inflating FPS averages and the like. Nvidia’s SLI solutions don’t have this problem, interestingly enough, because they employ a frame-metering technique to even out the delivery of frames to the display.
All of that seemed like quite an indictment of CrossFire, but we had lingering questions about the practical impact of microstuttering on real-world performance. Does it impact the smoothness of in-game animation in a meaningful way? We couldn’t tell conclusively from our first set of test results. In the example from Skyrim shown above, the “long” frame times are so quick—less than 15 milliseconds—that the display would be getting new frames even faster than its typical 60 Hz refresh cycle. In a situation like that, you’re getting plenty of new information each time the screen is painted, so there’s really not much of a practical issue. We needed more data.
So, for this article, we set out to test the Radeon HD 7990 and friends with a very practical question in mind: in a truly performance constrained scenario, where one GPU struggles to get the job done, does adding a second GPU help? If so, how much does it help?
To answer that question, we had to find scenarios where two of today’s top GPUs would struggle to produce smooth animation—and we had to find them within the limits of our FCAT setup, which captures frames from a single monitor at up to 2560×1440 at 60Hz. (In theory, one could use the colored FCAT overlay with a multi-display config, but that complicates things quite a bit.) Fortunately, via a little creative tinkering with image quality settings, we were able to tune up five of the latest, most graphically advanced games to push the limits of these cards. All we need to do now is step through the results from each game and ask our very practical questions about the impact of adding a second GPU to the mix.
Testing with FCAT at 2560×1440 and 60Hz requires capturing uncompressed video at a constant rate of 422 MB/s. Your storage array can’t miss a beat, or you’ll get dropped frames and invalidate the test run. As before, our solution to this problem was our RAID 0 array of four Corsair Neutron 256GB SSDs, which holds nearly a terabyte of data and writes at nearly a gigabyte per second. This array is held together with my patented hillbilly rigging:
Hey, it works.
Trouble is, the SSD array offers less than a terabyte of storage, and that just won’t do. A single 60-second test session produces a 25GB video. For this article, we planned to test six different configs in five games, with three test sessions per card and game. When the reality of the storage requirements began to dawn on us, we reached out to Western Digital, who kindly agreed to provide the class of storage we needed in the form of two WD Black 7,200-RPM 4TB hard drives.
The Black is the fastest 4TB drive on the market, and thank goodness it exists. We put two of them into a RAID 1 array for additional data integrity, and we were able to store all of our video in one place. We’re already contemplating a RAID 10 array with four of these drives in order to improve transfer speeds and total capacity.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:
|Memory size||16GB (4 DIMMs)|
DDR3 SDRAM at 1600MHz
|Chipset drivers||INF update
Rapid Storage Technology Enterprise 184.108.40.2069
with Realtek 220.127.116.1162 drivers
Deneva 2 240GB SATA
Service Pack 1
|915||1020||1502||2 x 2048|
Radeon HD 7970 GHz
13.5 beta 2
13.5 beta 2
|950||1000||1500||2 x 3072|
Thanks to Intel, Corsair, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.
Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.
In addition to the games, we used the following test applications:
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Crysis 3 easily stresses these video cards at its highest image quality settings with 4X MSAA. You can see a video of our test run and the exact settings we used below.
Here’s where things become a little bit (more?) complicated. You’ll see in the graphs below that I’ve included two results from two different tools, Fraps and FCAT. As I noted last time out, both tools are essential to understanding animation smoothness, particularly in multi-GPU configs where the possibility of microstuttering looms in the backdrop.
Fraps does its sampling early in the frame production pipeline, so its timing info should correspond pretty closely to what the game engine sees—and the game engine determines the content of the frames by advancing its physical simulation of the game world. An interruption in Fraps frame times, if it’s large enough, will yield a perceptible interruption in animation, even if buffering and frame metering later in the pipeline smooths out the delivery of frames to the display. We saw an example of this phenomenon in our last article, and we’ve seen others in our own testing.
Meanwhile, FCAT tells you exactly when a frame arrives at the display. An interruption there, if it’s large enough to be perceptible, will obviously cause a problem, as well. Because it monitors the very end of the frame production pipeline, FCAT may show us problems that software-based tools don’t see.
One complication we ran into is that, in most newer games, Fraps and the FCAT overlay will not play well together. You can’t use them both at the same time. That means we can’t provide you with neat, perfectly aligned data from the same five test runs per card, captured with both Fraps and FCAT. What we’ve done instead is conduct three test runs with FCAT and three with Fraps. We’ve then mashed the two together. The plots you’ll see come from the second of the three test runs for each card and test type. Since the Fraps and FCAT plots come from different test sessions, conducted manually, the data in them won’t align perfectly. Both tools should be producing results that are correct for what they measure, but they are measuring different test runs and at different points in the pipeline.
Since we’re looking at the question of microstuttering, I’ve included zoomed-in snippets of our frame time plots, so that we can look at those jitter patterns up close. Note that you can switch between all three plots for each GPU by pressing one of the buttons below.
If you click through the different GPUs, several trends become apparent. Even though they’re from different test runs, the Fraps and FCAT data for the single-GPU products tend to be quite similar. As we’ve seen before, there’s a little more variability in the Fraps frame times, but the timing gets smoothed out by buffering by the time FCAT does its thing.
For the multi-GPU cards like the GTX 690 and 7990, though, the Fraps and FCAT results diverge. That familiar microstuttering jitter is apparent in the Fraps results from the 7990, and the swing from frame to frame grows even larger by the time those frames reach the display. This is the sort of situation we’d hoped to avoid. The “long” frame times on the 7990 reach to 70 ms and beyond—the equivalent of 14 FPS—while the shorter frame times in FCAT are well under 10 ms. When you’re waiting nearly 70 milliseconds for every other frame to come along, you’re not talking about creamy smooth animation.
The GTX 690 isn’t immune to microstuttering issues, either. A pronounced jitter shows up in its Fraps results, even though it’s smoothed out by SLI frame metering before the frames reach the display. The impact of frame metering here is pretty remarkable, but it’s not a cure-all. Frames are still being generated according to the game engine’s timing. As I understand it, the vast majority of game engines tend to sample the time from a Windows timer and use that to decide how to advance their simulations. As a result, the content of frames generated unevenly would advance unevenly, even if the frames hit the display at more consistent intervals. The effect would likely be subtle, since the stakes here are tens of milliseconds, but it’s still less than ideal.
The results for both of the multi-GPU cards illustrate an undesirable trait of the microstuttering problem. As frame times grow, so does the opportunity for frame-to-frame jitter. The 7990 seems to be taking especially full advantage of that opportunity. This is one reason why we wanted to test SLI and CrossFire in truly tough conditions. Looks like, when you need additional performance the most, multi-GPU configs may fail most spectacularly to deliver.
The 7990 wins the FPS beauty contest, confirming its copious raw GPU power, if nothing else. The 99th percentile frame time focuses instead on frame latencies and, because it denotes the threshold below which all but 1% of the frames were produced, offers a tougher assessment of animation smoothness. All of the cards are over the 50-ms mark here, so they’re all producing that last 1% of frames at under 20 FPS. (Told you this was a tough scenario.)
Also, the 99th percentile numbers for Fraps and FCAT tend to differ, which makes sense in light of the variations in the two sets of plots above. What do we make of them? My sense is that a good solution should avoid slowdowns at both points in the pipeline, at frame generation and delivery. If so, then true performance will be determined by the slower of the two sets of results for each GPU.
Picking out the correct points of focus in the graph above made my eyes cross, though. We can tweak the colors to highlight the lower-performance number for each card:
I think that’s helpful. By this standard, the single-GPU GeForce GTX Titan is the best performer. The GTX 690 and Radeon HD 7990 aren’t far behind, but they’re not nearly as far ahead of the GTX 690 and 7970 as the FPS numbers suggest.
Here’s a look at the entire latency curve. You can see how the 7990 FCAT curve starts out strangely low, thanks to the presence of lots of unusually short frame times in that jitter sequence, and then ramps up aggressively. By the last few percentage points, the 7990’s FCAT frame times catch up to the 7970’s. Although the 7990 is generally faster than the 7970, it’s not much better when dealing with the most difficult portions of the test run. The GTX 690’s Fraps curve suffers a similar fate compared to the Titan’s, but by any measure, the GTX 690 still performs quite a bit better than the GTX 680.
This last set of results gives us a look at “badness,” at those worst-case scenarios when rendering is slowest. What we’re doing is adding up any time spent working on frames that take longer than 50 ms to render—so a 70-ms frame would contribute 20 ms to the total, while a 53-ms frame would only add 3 ms. We’ve picked 50 ms as our primary threshold because it seems to be something close to a good perceptual marker. A steady stream of 50-ms frames would add up to a 20 FPS frame rate. Go slower than that, and animation smoothness will likely be compromised.
I’ve taken the liberty of coloring the slower of the two results for each card here, as well, to draw our focus. The outcomes are intriguing. The 7990 spends about 40% less time above the 50-ms threshold than its single-GPU sibling, the 7970. That’s a respectable improvement, in spite of everything. The gains from a single GTX 680 to the 690 are even more dramatic, in part because the single 680 performs so poorly. The Titan again comes out on top.
You’re probably wondering what all of these numbers really mean to animation smoothness. We’ve captured every frame of animation during our FCAT testing, and I’ve spent some time watching the videos from the different cards back to back, trying to decide what I think. I quickly learned that being precise in subjective comparisons like these is incredibly difficult. To give you a sense of things, I’ve included short snippets from several of the video cards below. These are FCAT recordings slowed down to half speed (30 FPS) and compressed for YouTube. They won’t give you any real sense of image quality, but they should demonstrate the fluidity of the animation. We’ll start with the Radeon HD 7970:
And the 7990:
And now the GTX 690:
Finally, the Titan:
Like I said, making clear distinctions can be difficult, both with these half-speed online videos and with the source files (or while playing). I do think we can conclude that the FPS results suggesting the multi-GPU solutions are twice as fast as the single-GPU equivalents appear to be misleading. Watching the videos, you’d never guess you were seeing a “17 FPS” solution versus a “30 FPS” one; the 7990 is an improvement over the 7970, but the difference is much subtler. The same is true for the GTX 690 versus the 680. I do think the Titan comes out looking the smoothest overall. In fact, I’m more confident than ever that our two primary metrics track well with human perception after this little exercise. The basic sorting that they have done—with the Titan in the lead, the multi-GPU offerings next, and their single-GPU counterparts last—fits with my impressions.
So, are the dual-GPU cards better than their single-GPU versions? Yes, in this scenario, I believe they are. How much better? Not nearly enough to justify paying over twice the price of a 7970 for a 7990.
Getting the new Tomb Raider to stress these high-end graphics cards was easy once we found the option to invoke 4X supersampled antialiasing. Supersampling looks great, but it essentially requires rendering every pixel on the screen four times, so it’s fairly GPU-intensive.
Flip over to the Radeon HD 7970’s plots to start, if you will. This is a classic example of a noteworthy phenomenon that affects even single-GPU configs. The 7970’s Fraps plot shows a troubling series of frame latency spikes throughout, many above 50 ms and a few even above 60 ms. Whenever one of those upward spikes occurs, a downward spike for the next frame follows. What Fraps is detecting here is back-pressure in the frame production pipeline. Due to delay somewhere later in the process, the submission queue for frames is filling up, and the game has to wait to submit the next frame. Hence the upward spike. Once the delay is resolved, the submission queue quickly drains, since work has continued on queued frames during the wait. The game is able to submit one or two more frames in quick succession before the queue fills back up. Hence the downward spike. FCAT doesn’t even notice these hiccups—its plot is much smoother—since a few frames are buffered at the end of the pipeline, and none of the delays are large enough to exhaust the buffer and interrupt frame delivery.
We’ve seen this pattern many times while testing, and it can be accompanied by perceptible jerkiness in game animation, as in our example from Skyrim. The interruption in Fraps alone is sufficient to affect the timing of the game engine, and thus the content of the frames, disrupting the fluidity of motion. The distinctive thing about the 7970 in Tomb Raider is that these spikes in Fraps do not translate into perceptible stuttering. The 7970’s output isn’t exactly a model of fluidity, with frame times averaging around 40 milliseconds (or 25 FPS), but there aren’t any perceptible pauses or hitches. In fact, the 7970 cranks out clearly smoother animation than the GTX 680.
We’ve seen a similar situation with Hitman: Absolution where, with the Catalyst 13.2 drivers, there were spikes in Fraps and small but noticeable stutters in our test scenario. AMD addressed the issue with the Cat 13.3 beta driver. The spikes in Fraps were reduced in size—from ~60 ms to ~40 ms—but not eliminated. The game then seemed to run perfectly smoothly.
I’ve had some conversations with David Nalasco, technical marketing guru at AMD, who has cited scenarios like this one in a presentation critical of Fraps-based benchmarking. He has a point. Some games appear to tolerate ~40ms and longer delays in Fraps without issue. However, others do not. The difference probably has to do with how each game advances it internal simulation timing. Other factors, such as the way the camera is moving through the game world and what’s taking place onscreen may affect how much the player perceives any slowdowns. Trouble is, those issues with stuttering in Hitman with Catalyst 13.2 are very real and don’t show up at all in FCAT results. That’s why I keep asserting that we need to use both Fraps and FCAT to see what’s really happening. In fact, we need to use every tool at our disposal, including subjective evaluations of gameplay and videos, to understand what the numbers mean. We also need to see more instances like this one, so we can get a better handle on what factors contribute to whether a delay is a perceptible.
The 7990’s situation here isn’t nearly so complicated, since both Fraps and FCAT agree that it’s plagued with some serious microstuttering issues. The plots show that, in FCAT, frame times frequently approach zero, indicating the presence of “runt” frames that likely occupy only a small portion of the screen.
By contrast, the GTX 690’s plot doesn’t look too bad at all, with a small amount of jitter in Fraps—on the order of about 10 ms—that’s removed before the frames reach the display. This plot looks more like what we’ve come to expect from frame-metered SLI setups; the pronounced jitter we saw with Fraps in Crysis 3 is unusual. My crackpot theory is that, in many cases, the small delays inserted by frame metering must exert pressure that makes it way back up the pipeline, keeping the Fraps samples from developing too much of a see-saw pattern. Something certainly seems to be keeping them in check here compared to the huge swings in the 7990’s Fraps plot.
The 7990 once again cranks out the top FPS average, but it puts in a relatively poor showing in our latency-sensitive performance metrics. In fact, if you look at the latency curve, you’ll see the 7990’s FCAT curve rising past the 7970’s at about the 95th percentile. That’s not a problem for the GTX 690, whose curves both indicate markedly lower frame latencies than the GTX 680.
I’m going to pull out some half-speed videos again, because I think these illustrate things more plainly than the videos from Crysis 3.
And the 7990:
And now the GTX 690:
I see little difference in terms of fluidity between the 7970 and 7990. They both seem to deliver about the same experience. The GTX 690, though, is easily the best. You can particularly see the added fluidity in the motion of Lara’s arms and at the edges of the screen, where the terrain is moving by rapidly.
Back to our questions. Is the 7990 better than its single-GPU counterpart? Not here. What about the GTX 690. Yes, in fact, it’s probably the best overall solution.
Far Cry 3
The thing to note about the Far Cry 3 results is that the Fraps and FCAT numbers should align perfectly, since we were able to use both tools simultaneously with this game. Click through the results from the different GPUs, and you can see how they correspond, generally with a little more variance from Fraps than FCAT. The 7990 is something of an exception to this rule, since it has a lot going on, including latency spikes in Fraps that don’t show up in FCAT and multi-GPU jitter visible to both tools. By contrast, the GTX 690 has microstuttering almost completely under control.
Just low FPS numbers alone tell us this scenario is tough going for the 7970 and GTX 680. None of the other metrics are kind to those cards, either. The two top solutions in the latency-focused measurements are the GTX 690 and the Titan. The GTX 690 looks to be faster generally, but it suffers from a much larger hiccup about 40% of the way into the test run, which pushes up its time spent beyond 50 ms. The Titan handles that same speed bump much better.
After that, well, it’s complicated. By the numbers, the 7990 appears to split with the 7970; one or the other comes out on top, depending. What you don’t see in the numbers, though, is the herky-jerky motion that the 7990 produces, apparently as the result of some kind of CrossFire issue. We noted this problem with dual 7970s in our Titan review months ago, and AMD hasn’t fixed it yet. We don’t need to slow down the video in order to illustrate this problem. Just have a look at the 7970 versus the 7990 below. We’ll start with the 7970:
And now the 7990:
And the GTX 690:
Whatever’s wrong with the 7990, the 7970 isn’t affected, and neither is the dual-GPU GTX 690.
Does the 7990’s second GPU give it an advantage over the 7970? Not in Far Cry 3, not with that awful motion. I’d take a single 7970 over that anytime. The GTX 690, on the other hand, comes out looking pretty solid, definitely better than the GTX 680 or the 7970.
The plots for most of the cards look nice and clean in this case, with only a handful of occasional frame time spikes as we drive through the cityscape and the game streams in new data. Many of those spikes appear only in the Fraps results and are evened out in FCAT. The 7990’s plot is something else entirely.
Although the 7990 produces the most total frames of any of the cards, and thus wins the FPS sweeps, it cranks out those frames unevenly. As a result, the 7990’s latency curves push above the single 7970’s in the last five percent or so of frames produced. Also, notice how the 7990’s FCAT latency curve starts out, running along the axis near zero. That indicates the presence of some “runt” frames that aren’t likely to have much visual impact.
Does the 7990 provide a better gaming experience in this scenario than the 7970? The numbers, again, are mixed.
Let’s look at some videos. First, the 7970:
And the GTX 690:
When I watch the source videos at 60 FPS, I see something similar to what we saw in Tomb Raider, where the 7970 and 7990 are hard to differentiate, but the GTX 690 is visibly smoother. I’m not so sure that sense comes through in these 30-FPS YouTube videos, though.
Going back again to our practical question, I think we can say the Radeon HD 7990 isn’t appreciably superior to the 7970 in our Sleeping Dogs test scene.
You might not think of BioShock Infinite as a game that would put much stress on the latest graphics cards, and generally speaking, you’d be right. However, I played through almost the entire game on the Radeon HD 7990 at the settings below, and I encountered intermittent stutters while moving around Columbia much more often than one would hope. I figured this game could offer a nice test of a different sort of performance question: how well does adding a second GPU mitigate those occasional hiccups and hitches?
You can see the intermittent spikes on the 7990’s plot, from both Fraps and FCAT, into the 60-80 ms range. Flip over to the Radeon HD 7970 plot and… uh oh. The 7970 only spikes beyond 60 ms once in the entire test run. Overall, the hitches on the single-GPU card are much smaller.
The 7990 manages the highest FPS average and and the lowest 99th percentile frame time, but all of the cards are under 35 ms for that latter figure, so they’re all plenty quick, generally. The trouble is those intermittent slowdowns, and there’s a distinct pecking order in the time spent beyond 50 ms: the Radeon HD 7970 is the best of the bunch, followed by the GTX 690, the Titan, and the GTX 680. The 7990 takes up the rear, with the worst case of stuttering of the group.
As usual, we’ll begin the videos with the 7970:
And the GTX 690:
Watch right after the player has exited through the gate. That’s where the game tends to stutter the most, and the problem is worst on the 7990.
I expect this stuttering issue could perhaps be fixed via a driver update. However, at present, the Radeon HD 7990 actually offers a worse experience in this game than the 7970. A single Tahiti GPU outperforms any of the GeForces, but adding a second GPU spoils the soup.
A ray of hope: a prototype driver with frame pacing capability
As you’ve seen, the 7990’s multi-GPU microstuttering issues can translate into very real performance problems. The folks at AMD are aware of this fact, and since our initial FCAT article, have been working on a software driver with a potential remedy. AMD calls this feature “frame pacing,” and the principle is the same as Nvidia’s frame metering: the driver attempts to even out the pace of frame delivery by inserting a small amount of delay when needed.
Happily, AMD already has a very early driver, which it classifies as a “prototype,” that it shared with us for use in 7990 testing. I should emphasize that this is still a developmental piece of software, and we don’t yet have any firm timetable from AMD about when to expect a final—or even a beta—driver with this feature present. Also, this prototype driver is based on an older version of the Catalyst code, so it doesn’t incorporate the latest optimizations. We had to switch our test system’s OS to Windows 8 in order to test this prototype driver, since it wouldn’t install on Windows 7.
Does it work? Well, have a look.
The multi-GPU jitter pattern is reduced substantially with the frame-pacing prototype. There are still some spikes to around 40ms in Fraps, but as we noted with the 7970, those aren’t anything to be concerned about. The FCAT results still show something of a jitter pattern, but it’s largely kept in check. As a result, peak frames times in FCAT are much lower overall.
The latency curve for the prototype driver tells the tale. Although the regular driver’s FCAT line hugs the lower axis for the first seven or eight percent of frames, the new driver virtually eliminates those runts. Better yet, the frame pacing driver achieves lower latencies for roughly 50% of the frames rendered. The 7990’s 99th percentile frame time, as measured by FCAT, drops a full 15 ms, from 49.7 ms to 34.7 ms with the prototype.
I think the difference is fairly easy to perceive on video. The 7990 with the regular Catalyst driver:
And with the frame-pacing prototype:
Much better. Let’s see what this driver does for Crysis 3.
Microstuttering again is vastly reduced. There’s still some jitter present, especially in the Fraps results, as we saw with the GTX 690. However, with this driver, the 7990 looks to have less jitter than the GTX 690 does. Both the frame time plots and the latency curve for the 7990 with the prototype look similar to a single card result—only with quite a bit lower latencies than the 7970.
The 7990 with Catalyst 13.5 beta 2:
And with the frame-pacing prototype:
Again, I think we have a visible improvement in the fluidity of the in-game animation with the prototype.
The prototype driver:
The frame-pacing driver isn’t as much of an unequivocal win in our Sleeping Dogs test. The latency curve from FCAT has improved, but not the one from Fraps. I’m not seeing as much difference between the two videos, as well.
Unfortunately, frame pacing does not appear to resolve the 7990’s issues with uneven animation in Far Cry 3 or the stuttering problems in BioShock Infinite. Still, these are early days for this still-in-the-works feature, and already, frame pacing appears to be effective in improving the 7990’s behavior in several games. Kudos to AMD’s driver guys for making this happen in software in such a short time window.
I would caution anyone looking to plunk down a grand for this video card not to consider the CrossFire microstuttering a solved problem on the basis of this handful of tests, though. The prototype driver we tested isn’t available to the public and likely won’t be in its current form. AMD still has work to do, and it has evidently been content to sell CrossFire-based solutions to its customers for years without addressing this problem until now.
One more thing. I understand AMD plans on making frame pacing an option in future Catalyst drivers but leaving it disabled by default. The rationale has to do with the fact that frame pacing injects a small amount of lag into the overall chain from user input to visual response. AMD says it wants to avoid adding even a few milliseconds to overall input lag. That seems to me like something of a pose, a way of avoiding the admission that Nvidia’s frame metering technology is preferable to AMD’s original approach. You’ve seen the results in the videos above. I think the vast majority of consumers would prefer to have frame pacing enabled, granting them perceptibly smoother animation. Disabling this feature should be an option reserved for extreme twitch gamers whose reflexes are borderline superhuman. Here’s hoping AMD does the right thing with this option when the time comes to release a public driver with frame pacing.
Oh, right. I also tested power, noise, and temperatures. The 7990’s numbers are new, but the rest of the results have been shamelessly poached from my GTX Titan review. They should suffice for a quick comparison. Have a look.
Noise levels and GPU temperatures
The 7990 comes out looking spiffy across all of these tests. Don’t get too hung up on its slightly higher noise levels at idle and with the display off. The card turns its fans completely off and makes zero noise when the display goes into power-save mode. What you’re seeing there is just a little fluctuation in the noise floor in Damage Labs.
Beyond that, everything about the 7990’s power consumption and acoustics is exemplary for a high-end card, especially the part where it registers lower on the decibel meter under load than the GeForce GTX 690. The 7990 is dissipating an additional 50+ watts of power versus the GTX 690 and is still quieter. This is huge progress for AMD, and it’s only fitting for a thousand-dollar graphics card to have a cooling solution this effective.
AMD has built a mighty fine piece of hardware in the Radeon HD 7990. Consistently high FPS averages attest to its potential as the single most powerful graphics card in the world. At the same time, the 7990 is exceptionally quiet under load. Throw in the fact that it ships with a ridiculous bundle packed with some of the most notable games of the past year, and it’s easy to see how the 7990 could grab the crown in the $1K graphics card market.
However, we’ve deployed some advanced tools and metrics to answer some very practical questions about the benefits of the 7990’s second GPU, and the answers haven’t turned out like one would hope. The card just doesn’t hold up well under the weight of really tough scenarios where smooth gameplay is threatened. The 7990 does perform a little bit better than its single-GPU counterpart, the Radeon HD 7970, in our Crysis 3 test, but not by a broad margin. The 7990 doesn’t offer an appreciable benefit over the 7970 in our Tomb Raider and Sleeping Dogs test scenarios. In each of these cases, the 7990’s FPS averages scale to nearly twice the 7970’s, but the uneven frame delivery caused by multi-GPU microstuttering blunts the impact of those additional frames. Worse still, the 7990 runs into some apparent CrossFire compatibility snafus in Far Cry 3 and BioShock Infinite, both AAA titles that AMD has co-marketed and bundled with the card itself. Yikes. In those two games, you’re literally better off playing with a Radeon HD 7970.
Sure, we’re only talking about five games, tested under specific conditions. But we created these conditions in order to answer a pressing question about the impact of multi-GPU microstuttering. We’ve had the tools to detect its presence for a little while, but does microstuttering really have a negative impact on gameplay? The answer appears to be yes. Also, microstuttering tends to grow worse as frame rates drop, calling into question the true value of multi-GPU schemes like CrossFire and SLI.
Nvidia has mitigated the effects of multi-GPU jitter via its frame metering capability, and that feature appears to work reasonably well most of the time. The GeForce GTX 690 is tangibly superior to the single-GPU GeForce GTX 680 in each of our test scenes, although the difference is pretty minor in Crysis 3. That case is a reminder that frame metering and pacing schemes aren’t perfect. The 690 has near-pristine frame delivery in Crysis 3, but the smoothness of the animation is compromised by a see-saw pattern of frame dispatch, as we measured with Fraps. Fortunately, such early-in-the-pipeline jitter is usually confined to small spans of time—just a handful of milliseconds—on the GTX 690 and similar frame-metered SLI solutions.
AMD is now following suit with the development of a frame-pacing feature in its drivers, and the early returns look promising. If the firm can follow through and deliver a production driver that includes this capability, the 7990 has the potential to become a more appealing product. The thing is, that driver isn’t here yet, and we don’t know when it’s coming. Radeon HD 7990 cards are due to hit store shelves a little more than a week from now. My advice to would-be buyers is to hold out until a final driver with frame pacing has been released, tested thoroughly, and found to be effective. In its current form, there’s no way the 7990 is deserving of its $1K price tag.
Maybe I’ll post this whole review on Twitter, bit by bit.