Here's the deal
Last time out, by using the FCAT tools that measure how frames are delivered to the display, we found some troubling problems with Radeon CrossFire multi-GPU configs.
We've known for a while about multi-GPU microstuttering, a timing problem related to the alternate frame rendering method of load-balancing employed by both CrossFire and SLI. Frames are doled out to one GPU and then the other in interleaved fashion, but sometimes, the GPUs can get out pretty far out of sync. The result is that frames are dispatched in an uneven manner, introducing a tight pattern of jitter into the animation. Here's an example from my original Inside the second article.
We can detect such problems with software tools like Fraps, which can detect when the game engine signals to the DirectX API that it has handed off a completed frame for processing. That's relatively early in the frame rendering process—at the orange line marked "Present()" in the simplified diagram below.
We learned using frame capture tools that the microstuttering patterns in CrossFire solutions can become exaggerated by the time frames reach the display. Instead of something like the mild case of jitter in the plot above, the true pattern of frames arriving onscreen could look more like this:
In this example, the "short" frames in the sequence arrive only a fraction of a millisecond behind the "long" frames. With vsync disabled, those short or "runt" frames may only occupy a handful of horizontal lines across the screen, adding virtually no additional visual information to the picture. Here's a zoomed-in example from BF3 with the FCAT overlay on the left showing a different color for each individual frame rendered by the GPU:
Yes, a slice the height of that olive-colored bar is all you see of a fully rendered GPU frame. In other cases, we found that CrossFire simply dropped frames entirely, never showing even a portion of them onscreen. Yet those runt and dropped frames are counted by software benchmark tools as entirely valid, inflating FPS averages and the like. Nvidia's SLI solutions don't have this problem, interestingly enough, because they employ a frame-metering technique to even out the delivery of frames to the display.
All of that seemed like quite an indictment of CrossFire, but we had lingering questions about the practical impact of microstuttering on real-world performance. Does it impact the smoothness of in-game animation in a meaningful way? We couldn't tell conclusively from our first set of test results. In the example from Skyrim shown above, the "long" frame times are so quick—less than 15 milliseconds—that the display would be getting new frames even faster than its typical 60 Hz refresh cycle. In a situation like that, you're getting plenty of new information each time the screen is painted, so there's really not much of a practical issue. We needed more data.
So, for this article, we set out to test the Radeon HD 7990 and friends with a very practical question in mind: in a truly performance constrained scenario, where one GPU struggles to get the job done, does adding a second GPU help? If so, how much does it help?
To answer that question, we had to find scenarios where two of today's top GPUs would struggle to produce smooth animation—and we had to find them within the limits of our FCAT setup, which captures frames from a single monitor at up to 2560x1440 at 60Hz. (In theory, one could use the colored FCAT overlay with a multi-display config, but that complicates things quite a bit.) Fortunately, via a little creative tinkering with image quality settings, we were able to tune up five of the latest, most graphically advanced games to push the limits of these cards. All we need to do now is step through the results from each game and ask our very practical questions about the impact of adding a second GPU to the mix.
Testing with FCAT at 2560x1440 and 60Hz requires capturing uncompressed video at a constant rate of 422 MB/s. Your storage array can't miss a beat, or you'll get dropped frames and invalidate the test run. As before, our solution to this problem was our RAID 0 array of four Corsair Neutron 256GB SSDs, which holds nearly a terabyte of data and writes at nearly a gigabyte per second. This array is held together with my patented hillbilly rigging:
Hey, it works.
Trouble is, the SSD array offers less than a terabyte of storage, and that just won't do. A single 60-second test session produces a 25GB video. For this article, we planned to test six different configs in five games, with three test sessions per card and game. When the reality of the storage requirements began to dawn on us, we reached out to Western Digital, who kindly agreed to provide the class of storage we needed in the form of two WD Black 7,200-RPM 4TB hard drives.
The Black is the fastest 4TB drive on the market, and thank goodness it exists. We put two of them into a RAID 1 array for additional data integrity, and we were able to store all of our video in one place. We're already contemplating a RAID 10 array with four of these drives in order to improve transfer speeds and total capacity.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:
|Chipset||Intel X79 Express|
|Memory size||16GB (4 DIMMs)|
DDR3 SDRAM at 1600MHz
|Memory timings||9-9-9-24 1T|
|Chipset drivers||INF update
Rapid Storage Technology Enterprise 126.96.36.1999
with Realtek 188.8.131.5262 drivers
|Hard drive||OCZ Deneva 2 240GB SATA|
|Power supply||Corsair AX850|
|OS||Windows 7 Service Pack 1|
|GeForce GTX 680||GeForce 314.22 beta||1006||1059||1502||2048|
|GeForce GTX 690||GeForce 314.22 beta||915||1020||1502||2 x 2048|
|GeForce GTX Titan||GeForce 314.22 beta||837||876||1502||6144|
|Radeon HD 7970 GHz||Catalyst 13.5 beta 2||1000||1050||1500||3072|
|Radeon HD 7990||Catalyst 13.5 beta 2||950||1000||1500||2 x 3072|
Thanks to Intel, Corsair, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.
Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.
In addition to the games, we used the following test applications:
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
|Are retail Radeon R9 290X cards slower than press samples?||196|
|Valve joins the Linux Foundation||39|
|USB group designing slim, orientation-independent connector||59|
|Cherry intros MX RGB key switch; first keyboard due from Corsair||52|
|MSI's latest Z87 motherboard, GeForce GTX 760 graphics card have Mini-ITX dimensions||32|
|Tuesday Night Shortbread||21|
|HP unveils two Tegra 4-powered tablets||50|
|Unofficial AMD roadmap details desktop plans through 2015||131|
|It's official: Toshiba will snatch up OCZ's SSD business||38|