Multi-GPU issues: Micro-stuttering, runt frames, and more
So what exactly was going on with those strangely "cloudy" frame time plots for the 7970 CrossFire, especially in Skyrim? Let's look more closely at a small snippet of time from each test run, starting with the Fraps results.
The plots for the two multi-GPU solutions above both show obvious evidence of multi-GPU micro-stuttering, with frame times oscillating in that familiar sawtooth pattern. This timing issue is caused by the preferred method of balancing the load between two GPUs, alternate frame rendering (AFR), in which one chip renders the even-numbered frames and the other renders odd-numbered frames, in interleaved fashion. When the two GPUs aren't exactly in sync, frame delivery becomes uneven. I have a hard time getting excited about this problem in this particular instance simply because the frame times involved are very short, so the difference between them is small. Still, a multi-GPU config with this sort of jitter is really no quicker than those longer frame times in the pattern. In some cases, with too much jitter, multi-GPU solutions may not be much faster than a single GPU of the same type.
So that's that, but now look what happens with the two multi-GPU configs in the FCAT results, where we're measuring frame delivery times, not frame dispatch times.
The jitter pattern is eliminated on the GeForce GTX 680 SLI setup—evidence that Nvidia's frame metering technology for SLI is doing its job. This technology tracks frame delivery times and inserts very small delays as needed in order to ensure even spacing of the frames that are displayed. The FCAT tools give us the ability to confirm that frame metering works as advertised. Remember how I said the FCAT release was a bit of enlightened self-interest? Yeah, here's where the self-interest comes into the picture. Nvidia gets to show off its frame metering tech.
Meanwhile, our FCAT results suggest the jitter on the Radeon HD 7970 CrossFire setup is much more severe than Fraps detected. The shorter frame times in the pattern are literally a fraction of a millisecond, while the longer frame times are effectively twice what an FPS average of this section might suggest. How does a 0.3 millisecond frame look onscreen? Something like this:
In that example, there are portions of five GPU frames onscreen at once, as the overlay indicates. The little aqua and silver snippets are tiny portions of what are, presumably, fully rendered frames from the GPU, but the timing is so off-kilter than only a few scan lines of them are shown onscreen. Here's a close-up of one of these "runt" frames (Nvidia's term for them) causing tearing in BF3.
The runt frame appears to have real content; it just isn't onscreen long enough to add any substantial new information to the picture.
Above is an illustration of another snag we encountered with the 7970 CrossFire config in multiple games. This is a three-video frame sequence from BF3 with the FCAT overlay enabled. The expected color sequence for the overlay here is red, teal, navy, green, and aqua. What you see displayed, though, is red immediately followed by navy and then aqua. The teal and green frames aren't even runts here—they're simply not displayed at all. They're just dropped.
Here's another funky anomaly we encountered intermittently with the CrossFire setup. The FCAT analysis script was reporting two-pixel-tall "frames" that were out of the expected sequence, which was a bit of a puzzle. If you page through the video sequence shown above, everything looks correct at first glance, with the proper sequence of fuchsia, yellow, orange, white, and lime. However, if you zoom in on the top-left corner of the last video frame of the sequence, you'll see this:
That's a two-pixel yellow overlay bar and, to its right, apparently the other content of an out-of-sequence frame. When this happens, the out-of-place imagery always shows up at the top of the screen like this. Based on lots of zooming and squinting, I believe the content of the two scanlines here matches the timing of the yellow-marked GPU frame from the first video frame in the sequence. Somehow, it's "leaking" into the top of this video frame. Not a huge problem, frankly, but it's an apparent bug in CrossFire frame delivery.
So what do we make of the problems of runt and dropped frames? They're troublesome for performance testing, because they get counted by benchmarking tools, helping to raise FPS averages and all the rest, but they have no tangible visual benefit to the end user.
Nvidia's FCAT scripts offer the option of filtering out runt and dropped frames, so that they aren't counted in the final performance results. That seems sensible to me, so long as it's done the right way. The results you've seen from us on the preceding pages were not filtered in this fashion, but we can apply the filters to show you how they affect things. By default, the script's definition of a "runt frame" is one that occupies 20 scan lines or less, or one that comprises less than 25% of the length of the prior frame. I think the 20-scan-line limit may be a reasonable rule of thumb, but I'm dubious about the 25% cutoff. What if the prior frame represented a big spike in frame rendering times?
Fortunately, the filtering rules in the FCAT scripts are easily tweakable, so we can define our own thresholds for these things. I expect you'll see lots of results today and in the coming weeks that accept FCAT's default filtering rules, though, so let's take a look at how they affect some test data. Here are the Fraps and FCAT results for the Radeon HD 7970 CrossFire setup in Skyrim, followed by the filtered version from FCAT.
Filtered in this way, the CrossFire config loses lots of frames from its output. You can imagine what that does to its FPS average:
Interestingly, even the 99th percentile frame time is affected slightly by the removal of so many super-short-time frames, whose presence shifts the cutoff point for 99% of frames rendered.
So, yeah, accounting for these frame delivery problems with filtering really alters the relative performance picture. By contrast, the SLI setup is barely touched by the filters in this case. We did see a few runt frames from the SLI rig in both Skyrim and Guild Wars 2, but they never amounted to much.
|The TR Podcast 166 is now available on YouTube||19|
|Chromebooks now come with 1TB of cloud storage for two years||17|
|Deal of the week: Devil's Canyon starting at $179.99, Intel 730 Series for $0.42/GB, and more||32|
|AMD prolongs A-series software deal; price cuts still a work in progress||20|
|Report: Valve lays out new rules for Early Access games||52|
|Intel's 2015 revenue outlook beats Street expectations||53|
|Intel's 3D NAND has 32 layers and 256Gb per die||60|
|Telltale's Game of Thrones game looks pretty good||12|
|Sounds like a good way to conceal the terrible financial performance of the mobile business unit.||+35|