Multi-GPU micro-stuttering: Real... and really complicated
We didn't set out to hunt down multi-GPU micro-stuttering. We just wanted to try some new methods of measuring performance, but those methods helped us identify an interesting problem. I think that means we're on the right track, but the micro-stuttering issue complicates our task quite a bit.
Naturally, we contacted the major graphics chip vendors to see what they had to say about the issue. Somewhat to our surprise, representatives from both AMD and Nvidia quickly and forthrightly acknowledged that multi-GPU micro-stuttering is a real problem, is what we measured in our frame-time analysis, and is difficult to address. Both companies said they've been studying this problem for some time, too. That's intriguing, because neither firm saw fit to inform potential customers about the issue when introducing its most recent multi-GPU product, say the Radeon HD 6990 or the GeForce GTX 590. Hmm.
AMD's David Nalasco identified micro-stuttering as an issue with the rate at which frames are dispatched to GPUs, and he said the problem is not always an easy one to reproduce. Nalasco noted that jitter can come and go as one plays a game, because the relative timings between frames can vary.
We'd mostly agree with that assessment, but with several caveats based on our admittedly somewhat limited test data. For one, although jitter varies over time, multi-GPU setups that are prone to jitter in a given test scenario tend to return to it throughout each test run and from one run to the next. Second, the degree of jitter appears to be higher for systems that are more performance-constrained. For instance, when tested in the same game at the same settings, the mid-range Radeon HD 6870 CrossFireX config generally showed more frame-to-frame variance than the higher-end Radeon HD 6970 CrossFireX setup. The same is true of the GeForce GTX 560 Ti SLI setup versus dual GTX 580s. If this observation amounts to a trait of multi-GPU systems, it's a negative trait. Multi-GPU rigs would have the most jitter just when low frame times are most threatened. Third, in our test data, multi-GPU configs based on Radeons appear to exhibit somewhat more jitter than those based on GeForces. We can't yet say definitively that those observations will consistently hold true across different workloads, but that's where our data so far point.
|More intriguing is another possibility Nalasco mentioned: a "smarter" version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion.|
Nalasco told us there are several ideas for dealing with the jitter problem. As you probably know, vsync, or vertical refresh synchronization, prevents the GPU from flipping to a different source buffer (in order to show a new frame) while the display is being painted. Instead, frame buffer flips are delayed to happen between screen redraws. Many folks prefer to play games with vsync enabled to prevent the tearing artifacts caused by frame buffer flips during display updates. Nalasco noted that enabling vsync could "probably sometimes help" with micro-stuttering. However, we think the precise impact of vsync on jitter is tough to predict; it adds another layer of timing complexity on top of several other such layers. More intriguing is another possibility Nalasco mentioned: a "smarter" version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion. We think that approach has potential, but Nalasco was talking only of a future prospect, not a currently implemented technology. He admitted AMD can't say it has "a water-tight solution yet."
Nalasco did say AMD may be paying more attention to these issues going forward because of its focus on exotic multi-GPU configurations like the Dual Graphics feature attached to the Llano APU. Because such configs involve asymmetry between GPUs, they're potentially even more prone to jitter issues than symmetrical CrossFireX or SLI solutions.
Nvidia's Tom Petersen mapped things out for us with the help of a visual aid.
The slide above shows the frame production pipeline, from the game engine through to the display, and it's a useful refresher in the context of this discussion. Things begin with the game engine, which has its own internal timing and tracks a host of variables, from its internal physics simulation to graphics and user input. When a frame is ready for rendering, the graphics engine hands it off to the DirectX API. According to Petersen, it's at this point that Fraps records a timestamp for each frame. Next, DirectX translates high-level API calls and shader programs into lower-level DirectX instructions and sends those to the GPU driver. The graphics driver then compiles DirectX instructions into machine-level instructions for the GPU, and the GPU renders the frame. Finally, the completed frame is displayed onscreen.
Petersen defined several terms to describe the key issues. "Stutter" is variation between the game's internal timing for a frame (t_game) and the time at which the frame is displayed onscreen (t_display). "Lag" is a long delay between the game time and frame time, and "slide show" is a large total time for each frame, where the basic illusion of motion is threatened. These definitions are generally helpful, I think. You'll notice that we've been talking quite a bit about stutter (or jitter) and the slide-show problem (or long frame times) already.
Stutter is, in Petersen's view, "by far the most significant" of these three effects that people perceive in games.
In fact, in a bit of a shocking revelation, Petersen told us Nvidia has "lots of hardware" in its GPUs aimed at trying to fix multi-GPU stuttering. The basic technology, known as frame metering, dynamically tracks the average interval between frames. Those frames that show up "early" are delayed slightly—in other words, the GPU doesn't flip to a new buffer immediately—in order to ensure a more even pace of frames presented for display. The lengths of those delays are adapted depending on the frame rate at any particular time. Petersen told us this frame-metering capability has been present in Nvidia's GPUs since at least the G80 generation, if not earlier. (He offered to find out exactly when it was added, but we haven't heard back yet.)
Poof. Mind blown.
Now, take note of the implications here. Because the metering delay is presumably inserted between T_render and T_display, Fraps would miss it entirely. That means all of our SLI data on the preceding pages might not track with how frames are presented to the user. Rather than perceive an alternating series of long and short frame times, the user would see a more even flow of frames at an average latency between the two.
Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem as it inserts that delay between the completion of the rendered frame and its exposure to the display. In most cases, we're talking about tens of milliseconds or less; that sort of contribution to lag probably isn't perceptible. Still, this is an interesting and previously hidden trade-off in SLI systems that gamers will want to consider.
|Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem.|
So long as the lag isn't too great, metering frame output in this fashion has the potential to alleviate perceived jitter. It's not a perfect solution, though. With Fraps, we can measure the differences between presentation times, when frames are presented to the DirectX API. A crucial and related question is how the internal timing of the game engine works. If the game engine generally assumes the same amount of time has passed between one frame and the next, metering should work beautifully. If not, then frame metering is just moving the temporal discontinuity problem around—and potentially making it worse. After all, the frames have important content, reflecting the motion of the underlying geometry in the game world. If the game engine tracks time finely enough, inserting a delay for every other frame would only exacerbate the perceived stuttering. The effect would be strange, like having a video camera that captures frames in an odd sequence, 12--34--56--78, and a projector that displays them in an even 1-2-3-4-5-6-7-8 fashion. Motion would not be smooth.
When we asked Petersen about this issue, he admitted metering might face challenges with different game engines. We asked him to identify a major game engine whose internal timing works well in conjunction with GeForce frame metering, but he wasn't able to provide any specific examples just yet. Still, he asserted that "most games are happy if we present frames uniformly," while acknowledging there's more work to be done. In fact, he said, echoing Nalasco, there is a whole area of study in graphics about making frame delivery uniform.
|Apple's A9 impresses and the Nexus strikes back: The TR Podcast 188||28|
|Microsoft acquires Havok physics engine from Intel||80|
|AMD unleashes mobile Tonga with the FirePro W7170M||13|
|Deals of the week: Crucial's MX200 500GB SSD and more||10|
|Report: TSMC makes around 6 in 10 Apple A9 SoCs||19|
|Mobile Quadros bring Maxwell to 15" and 17" workstations||3|
|Report: Amazon to halt sales of Chromecast and Apple TV||41|
|The Tech Report Podcast is live on Twitch||2|
|A billion Android devices could be vulnerable to Stagefright 2.0 bug||50|