A testing conundrum
As you might recall, we've been skeptical about the merits of multi-GPU solutions like the GeForce GTX 690 since we published this article last fall. That piece introduced some new ways to think about gaming performance, and the methods we proposed immediately highlighted some problems with SLI and CrossFire.
Multi-GPU schemes generally divide the work by asking a pair of GPUs to render frames in alternating fashion—frame 1 to GPU 0, frame 2 to GPU 1, frame 3 to GPU 0, and so on. The trouble is, the two GPUs aren't always in sync with one another. Instead of producing a series of relatively consistent frame delivery times, a pair of GPUs using alternate frame rendering will sometimes oscillate between low-latency frames and high-latency frames.
To illustrate, we can zoom in on a very small chunk of one of our test runs for this review. First, here's how the frame times look on a single-GPU solution:
Although frame times vary slightly on the single-GPU setup, the differences are pretty small during this short window of time. Meanwhile, look what happens on a CrossFire setup using two of the same GPU:
You can see that alternating pattern, with a short frame time followed by a long one. That's micro-stuttering, and it's a potentially serious performance issue. If you were simply to measure this solution's performance in average frames per second, of course, it would look pretty good. Lots of frames are being produced. However, our sense is that the smoothness of the game's animation will be limited by those longer frame times. In this short window, adding a second GPU appears to reduce long-latency frames from about 29 ms to about 23 ms. Although the FPS average might be nearly doubled by the presence of all of those low-latency frames, the real, perceived impact of adding a second card would be much less than a doubling of performance.
This problem affects both SLI and CrossFire, including multi-GPU graphics cards like the GTX 690. How much micro-stuttering you find can vary from one moment to the next. In this example, we can see a little bit of jitter from the GTX 690, but it's fairly minimal.
However, it appears that the degree of jitter tends to grow as multi-GPU solutions become more performance-constrained. That's bad news in our example for the older dual-GPU graphics cards:
Ouch. If this trend holds up, the more you need higher performance from a multi-GPU solution, the less likely it is to deliver. Kind of calls the value proposition into question, eh?
Things get even trickier from here, for several reasons. Both AMD and Nvidia acknowledged the multi-GPU micro-stuttering problem when we asked them about it, but Nvidia's Tom Petersen threw us for a loop by asserting that Nvidia's GPUs have had, since "at least" the G80, a built-in provision called frame metering that attempts to counteract the problem.
The diagram above shows the frame rendering pipeline, from the game engine through to the display. Frame metering attempts to smooth out the delivery of frames by monitoring frame times and, as necessary, adding a slight delay between a couple of points on the timeline above, T_render and T_display. In other words, the GPU may try to dampen the oscillating pattern characteristic of micro-stuttering by delaying the display of completed frames that come "early" in the sequence.
We think frame metering could work, in theory, with a couple of caveats. One obvious trade-off is the slight increase in input lag caused by delaying roughly half of the frames being rendered, although the impact of that should be relatively tiny. The other problem is the actual content of the delayed frames, which is timing-dependent. The question here is how a game engine decides what time is "now." When it dispatches a frame, the game engine will create the content of that image—the underlying geometry and such—based on its sense of time in the game world. If the game engine simply uses the present time, then delaying every other frame via metering will cause visual discontinuities, resulting in animation that is less smooth than it should be. However, Petersen tells us some game engines use a moving average of the last several frame times in order to determine the "current" time for each frame. If so, then it's possible frame metering at the other end of the graphics pipeline could work well.
A further complication: we can't yet measure the impact of frame metering—or, really of any multi-GPU solution—with any precision. The tool we use to capture our performance data, Fraps, writes a timestamp for each frame at a relatively early point in the pipeline, when the game hands off a frame to the Direct3D software layer (T_ready in the diagram above). A huge portion of the work, both in software and on the GPU, happens after that point.
We're comfortable with using Fraps for single-GPU solutions because it captures frame times at a fixed point in what is essentially a feedback loop. When one frame is dispatched, the system continues through the process and moves on to the next, stopping at the same point in the loop each time to record a timestamp.
That feedback loop loses its integrity when two GPUs handle the work in alternating fashion, and things become particularly tricky with other potential delays in play. Fraps has no way of knowing when a buffer flip has happened at the other end of the pipeline, especially if there's a variable metering wait involved—so frame delivery could be much smoother in reality than it looks in our Fraps data. By the same token, multi-GPU schemes tend to have some additional latency built into them. With alternate frame rendering, for instance, a frame completed on the secondary GPU must be transferred to the primary GPU before it can be displayed. As a result, it's possible that the disparity between frame display times could be much worse than our Fraps data show, as well.
So, what to do if you're us, and you have a multi-GPU video card to review? The best we can say for our Fraps data is that we believe it's accurate for what it measures, the point when the game engine presents a frame to Direct3D, and that we believe the frames times it captures are at least loosely correlated to the actual display times at the other end of the pipeline. We can also say with confidence that any analysis of multi-GPU performance based solely on FPS averages is at least as wrong as what we're about to show you. We had hoped to have some new tools at our disposal for this article, including a high-speed camera we ordered, but the camera didn't arrive in time for this review, unfortunately. We will have to follow up with it at a later date. For now, we'll have to march ahead with some big, hairy caveats attached to all of our performance results. Please keep those caveats in mind as you read the following pages.
|Thermaltake View 27 case offers a birds-eye view of builds||15|
|Nixxes turns out another Deus Ex: Mankind Divided patch||9|
|Upcoming Samsung CF791 is a high-contrast FreeSync ultrawide||10|
|Deals of the week: an unlocked Skylake CPU for cheap and more||14|
|PCIe 4.0 won't actually deliver 300 watts from the slot||53|
|iOS 9.3.5 fixes serious zero-day vulnerabilities||11|
|Intel 600P Series SSDs bring NVMe into the M.2 mainstream||37|
|Canon EOS 5D Mark IV offers more pixels and better autofocus||59|
|Adata Ultimate SU800 SSDs use floating-gate 3D NAND||10|
|Stupid physics getting in the way of all our fun.||+30|