Home Inside the second: A new look at game benchmarking

Inside the second: A new look at game benchmarking

Scott Wasson
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

I suppose it all started with a brief conversation. Last fall, I was having dinner with Ramsom Koay, the PR rep from Thermaltake. He’s an inquisitive guy, and he wanted to know the answer to what seemed like a simple question: why does anyone need a faster video card, so long as a relatively cheap one will produce 30 frames per second? And what’s the deal with more FPS, anyway? Who needs it?

I’m ostensibly the expert in such things, but honestly, I wasn’t prepared for such a question right at that moment. Caught off guard, I took a second to think it through and gave my best answer. I think it was a good one, as these things go, with some talk about avoiding slowdowns and maintaining a consistent illusion of motion. But I realized something jarring as I was giving it—that the results we provide our readers in our video card reviews don’t really address the issues I’d just identified very well.

That thought stuck with me and began, slowly, to grow. I was too busy to do much about it as the review season cranked up, but I did make one simple adjustment to my testing procedures: ticking the checkbox in Fraps—the utility we use to record in-game frame rates—that tells it to log individual frame times to disk. In every video card review that followed, I quietly collected data on how long each frame took to render.

Finally, last week, at the end of a quiet summer, I was able to take some time to slice and dice all of the data I’d collected. What the data showed proved to be really quite enlightening—and perhaps a bit scary, since it threatens to upend some of our conclusions in past reviews. Still, I think the results are very much worth sharing. In fact, they may change the way you think about video game benchmarking.

Why FPS fails
As you no doubt know, nearly all video game benchmarks are based on a single unit of measure, the ubiquitous FPS, or frames per second. FPS is a nice instant summary of performance, expressed in terms that are relatively easy to understand. After all, your average geek tends to know that movies happen at 24 FPS and television at 30 FPS, and any PC gamer who has done any tuning probably has a sense of how different frame rates “feel” in action.

Of course, there are always debates over benchmarking methods, and the usual average FPS score has come under fire repeatedly over the years for being too broad a measure. We’ve been persuaded by those arguments, so for quite a while now, we have provided average and low FPS rates from our benchmarking runs and, when possible, graphs of frame rates over time. We think that information gives folks a better sense of gaming performance than just an average FPS number.

Still, even that approach has some obvious weaknesses. We’ve noticed them at times when results from our FRAPS-based testing didn’t seem to square with our seat-of-the-pants experience. The fundamental problem is that, in terms of both computer time and human visual perception, one second is a very long time. Averaging results over a single second can obscure some big and important performance differences between systems.

To illustrate, let’s look at an example. It’s contrived, but it’s based on some real experiences we’ve had in game testing over the years. The charts below show the times required, in milliseconds, to produce a series of frames over a span of one second on two different video cards.

GPU 1 is obviously the faster solution in most respects. Generally, its frame times are in the teens, and that would usually add up to an average of about 60 FPS. GPU 2 is slower, with frame times consistently around 30 milliseconds.

However, GPU 1 has a problem running this game. Let’s say it’s a texture upload problem caused by poor memory management in the video drivers, although it could be just about anything, including a hardware issue. The result of the problem is that GPU 1 gets stuck when attempting to render one of the frames—really stuck, to the tune of a nearly half-second delay. If you were playing a game on this card and ran into this issue, it would be a huge show-stopper. If it happened often, the game would be essentially unplayable.

The end result is that GPU 2 does a much better job of providing a consistent illusion of motion during the period of time in question. Yet look at how these two cards fare when we report these results in FPS:

Whoops. In traditional FPS terms, the performance of these two solutions during our span of time is nearly identical. The numbers tell us there’s virtually no difference between them. Averaging our results over the span of a second has caused us to absorb and obscure a pretty major flaw in GPU 1’s performance.

Let’s say GPU 1 had similar but slightly smaller delays in other places during the full test run, but this one second was still its worst overall. If so, GPU 1’s average frame rate for the whole run could be upwards of 50 FPS, and its minimum frame rate would be 35 FPS—quite decent numbers, according to the conventional wisdom. Yet playing the game on this card might be almost completely unworkable.

If we saw these sorts of delays during our testing for a review, we’d likely have noted the occasional hitches in GPU 1’s performance, but some folks probably would have simply looked at the numbers and flipped to the next game without paying attention to our commentary. (Ahem.)

Frame time
in milliseconds
8.3 120
10 100
16.7 60
20 50
25 40
33 30
42 24
50 20
60 17
70 14

By now, I suspect you see where we’re headed. FPS isn’t always bad as a summary of performance, but it has some obvious shortcomings due to the span of time involved. One way to overcome this weakness is to look inside the second, as we have just done, at the time it takes to produce individual frames. Doing so isn’t all that difficult. Heck, game developers have done it for years, tuning against individual frame times and also delving into how much GPU time each API call occupies when producing a single frame.

We will need to orient ourselves to a new way of thinking, though. The table on the right should help. It shows a range of frame times in milliseconds and their corresponding FPS rates, assuming those frame times were to remain constant over the course of a full second. Notice that in the world of individual frame times, lower is better, so a time of 30 ms is more desirable than a time of 60 ms.

We’ve included several obvious thresholds on the table, among them the 16.7 ms frame time that corresponds to a steady rate of 60 frames per second. Most LCD monitors these days require input at 60 cycles per second, or 60Hz, so going below the 16.7-ms threshold may be of limited use for some folks.

With that said, I am not a believer in the popular myth that speeds above 60 FPS are pointless. Somehow, folks seem to have conflated the limits of current display technologies (which are fairly low) with the limits of the human visual system (which are much higher). If you don’t believe me, you need only to try this simple test. Put two computers side by side, one with a 60Hz display and the other with a 120Hz display. Go to the Windows desktop and drag a window around the screen on each. Wonder in amazement as the 120Hz display produces an easily observable higher fluidity in the animation. In twitch games, steady frame rates of 90Hz or higher are probably helpful to the quickest (and perhaps the youngest) among us.

At the other end of the scale, we have the intriguing question of what sorts of frame times are acceptable before the illusion of motion begins to break down. Movies in the theater are one of the slower examples we have these days, with a steady frame rate of just 24 FPS—or 42 ms per frame. For graphical applications like games that involve interaction, I don’t think we’d want frame times to go much higher than that. I’m mostly just winging it here, but my sense is that a frame time over 50 ms is probably worthy of note as a mark against a gaming system’s performance. Stay above that for long, and your frame rate will drop to 20 FPS or lower—and most folks will probably start questioning whether they need to upgrade their systems.

With those considerations in mind, let’s have a look at some frame time data from an actual game, to see what we can learn.

New methods: some examples
Our first set of example data comes from our GeForce GTX 560 Ti review. We published that review early this year, so the drivers we used in it are rather dated by now, but results should serve to help us test out some new methods, regardless. We’ll be looking at results from Battlefield: Bad Company 2; the image quality settings we used for testing are shown below.

Interestingly enough, although it’s a boatload of data, we can plot the frame times from each video card fairly easily, just as we have plotted FPS over time in the past. One big difference is that lower frame times are more desirable, so we’ll read these plots differently. For example, the bright green line from the GeForce GTX 570 is the most desirable result of any of the GeForce cards.

As you can see, even though the data are scrunched together pretty tightly, outliers like especially high frame times can be picked out easily. Also, notice that faster cards tend to produce more frames, so there’s more data for the higher-performance solutions.

We can still line up a couple of direct competitors for a head-to-head comparison graph, too. Overall, I think this result illustrates nicely how very closely matched these two video cards are. The GTX 560 Ti is markedly faster only in a short span from frames 2150 to 2250 or so—well, except for that one outlier from the Radeon at around frame 500. Let’s put on our magnification glasses and take a closer look at it.

This real-world result nicely parallels our theoretical example from earlier, although the frame rate delay on the Radeon isn’t nearly as long. Obviously, an FPS average won’t catch the difference between the two cards here.

In fact, have a look at the two frame times following the 58 ms delay; they’re very low. That’s likely because the video card is using triple buffering, so the rendering of those two later frames wasn’t blocked by the wait for the one before them. Crazily enough, if you consider just those three frames together, the average frame time is 23 ms. Yet that 58 ms frame happened, and it potentially interrupted the flow of the game.

Now, we don’t want to overstate the importance of a single incident like that, but with all of these frame time data at our disposal, we can easily ask whether it’s part of a larger pattern.

We’re counting through all five of our 60-second Fraps sessions for each card here. As you may have inferred by reading the plots at the top of the page, the Radeons aren’t plagued with a terrible problem, but they do run into a minor hiccup about once in each 60-second session—with the notable exception of the Radeon HD 6970. By contrast, the Nvidia GPUs deliver more consistent results. Not even the older GeForce GTX 260 produces a single hitch.

If you’re looking to do some multiplayer gaming where reaction times are paramount, you may want to ensure that your frame times are consistently low. By cranking our threshold down to 20 ms (or the equivalent of 50 FPS), we can separate the silky smooth solutions from the pretenders.

Only two cards, the GeForce GTX 570 and Radeon HD 6970, produce nearly all of their frames in under 20 ms. If you’re an aspiring pro gamer, you’ll need to pick up a relatively fast video card—or just do what they all do anyway: crank the display quality down as much as possible to ensure solid performance.

Counting up frames over a 50-ms or whatever threshold is nice, but it doesn’t really capture everything we’d like. We do want to know about those outliers, but what we really need to understand is how well a video card maintains that steady illusion of motion.

One way to address that question is to rip a page from the world of server benchmarking. In that world, we often measure performance for systems processing lots of transactions. Oftentimes the absolute transaction rate is less important than delivering consistently low transaction latencies. For instance, here is an example where the cheaper Xeons average more requests per second, but the pricey “big iron” Xeons maintain lower response times under load. We can quantify that reality by looking at the 99th percentile response time, which sounds like a lot of big words but is a fundamentally simple concept: for each system, 99% of all requests were processed within X milliseconds. The lower that number is, the quicker the system is overall. (Ruling out that last 1% allows us to filter out any weird outliers.)

Oddly enough, we want to ask the same basic question about gaming performance. We want our systems to ensure consistently low frame times, and doing so is arguably more important than achieving the highest FPS rate.

Happily, in this case, our 99th percentile frame time results pretty closely mirror the average FPS rates. The cards are ranked in the same order, with similar gaps between them. That fact tells us several things. Most notably, the cards with relatively high frames rates are also producing relatively low frame times with some consistency. The reverse is also true: the cards with lower FPS averages are delivering higher frame times. None of the cards are returning a bunch of strangely high or low frame times that would throw off the FPS average. As a result, we can say that these cards are doing a nice job of maintaining the illusion of motion at something close to their advertised FPS averages.

Also, I think this outcome somewhat validates our use of the 99th percentile frame times as a sort of complement to the usual FPS average. If all goes as it should, a video card delivering high frame rates ought to achieve predominantly low frame times, as well. Granted, this is a pretty geeky way to analyze the data, but you’ll see why it matters shortly.

Applying our methods to multi-GPU solutions
Our multi-GPU data comes from our review of the GeForce GTX 590. We’ll start off with results from Bad Company 2 again, this time tested at a higher resolution more appropriate to these higher-end solutions.

Forgive me, but I’m about to throw a load at info at you, visualized in various ways. We’ll begin with frame time plots, as we did on the last page. However, you’ll notice that a number of those plots look kind of strange.

The frame time info for a number of the setups looks more like a cloud than a line, oddly enough. What’s going on? Let’s slide on those magnification glasses again and take a closer look.

Zooming in on the problem

The three single-GPU solutions look fairly normal, but then they’re the ones that don’t look odd in the earlier plots.

A funny thing happens, though, when we get into the multi-GPU results.

The frames times oscillate between relatively long and short delays in an every-other-frame pattern. To one degree or another, all of the multi-GPU solutions follow this template. With apologies for the scrolling involved, let’s illustrate more fully:

Wow. We’ve heard whispers from time to time about micro-stuttering problems with multi-GPU solutions, but here it is in captivity.

To be clear, what we’re seeing here is quite likely an artifact of the way today’s multi-GPU solutions work. Both AMD and Nvidia prefer a technique for divvying up the workload between two GPUs known as alternate frame rendering (AFR). As the name indicates, AFR involves assigning the first GPU in a team to render the even-numbered frames, while the second GPU handles the odd-numbered ones, so frames are produced by the two GPUs in an interleaved fashion. (AFR can also be employed with three or four GPUs, with one frame being assigned to each GPU in sequence.) SLI and CrossFire support other load-balancing methods, such as split-frame rendering, but those methods aren’t favored, because they don’t scale as well in terms of FPS averages.

Although the fundamentals are fairly straightforward, load-balancing between multiple GPUs isn’t a simple job. Graphics workloads vary from frame to frame. Heck, graphics workloads vary down to the block and pixel levels; a frame is essentially a massively parallel problem in itself. The nature of that huge parallel workload can change substantially from one frame to the next, so keeping the timing synchronized between two GPUs doing AFR isn’t easy.

Then, because only the primary video card is connected to the display, the frames rendered by the second GPU in an AFR team must be copied over to the primary GPU for output. Those frame data are typically transferred between the GPUs via the proprietary SLI and CrossFire interfaces built into high-end video cards. Dual-GPU cards like the Radeon HD 6990 and GeForce GTX 590 include an onboard version of that same interface. (Some low-end multi-GPU solutions do without a custom data channel and transfer frame data via PCIe.) Copying those frames from the secondary GPU to the primary takes time. In addition to completed frames, the contents of other buffers must oftentimes be transferred between GPUs, especially when advanced rendering techniques (like render-to-texture) are in use. Such information is usually shared over PCIe, as I understand it, which has its own transfer latencies.

The charts above demonstrate that the synchronization for some of the multi-GPU solutions is far from perfect. This revelation opens up a huge, messy set of issues that will take some time to cover in full. For now, though, we can start by identifying several basic problems raised by the jitter inherent in multi-GPU systems.

Obviously, one implication is that using an FPS average to summarize multi-GPU performance is perilous. In cases of extreme jitter, an FPS average may give too much credit to a solution that’s suffering from relatively high latencies for every other frame. My sense is those longer frame times in the pattern would be the gating factor for the illusion of motion. That is, a solution producing frame times of roughly 20 ms and 50 ms in interleaved fashion would provide no better illusion of motion than a system with constant 50-ms frame times—at best.

The reality is probably somewhat worse than that, for reasons we’ve already discussed. The human visual system is very good at picking up visual irregularities, and too much jitter in the flow of frames could easily cause a form of visual interference that would be rather distracting or annoying. I can’t say I’ve personally experienced this sensation in any acute way—and I do play games on our multi-GPU test rigs with some regularity—but I know some folks with multi-GPU systems have complained about micro-stuttering sullying their gaming sessions. (I am one of those folks who can see the rainbow effect in older DLP displays, so I know such things can be disruptive.)

That’s just the tip of the iceberg, too. The complexity of this problem is even deeper than our frame time data alone can indicate. We’ll address more of the issues shortly, but first, I still think a closer look at our frame time data can prove instructive.

Slicing and dicing the numbers

Here’s one reason why I wanted to press on with our look at frame time data. I believe the issue we’re seeing here is largely independent of the micro-stuttering problem. If you look back at the individual frame time plots over the span of a whole test run, you can see the trend clearly. The multi-GPU solutions just tend to run into especially long frame latencies more often than the single-GPU offerings. There is some overhead associated with keeping two GPUs fed and synchronized in the same system, and it seems to lead to occasional trouble—in the form of frame times over our 50-ms threshold.

Having said that, if we lower the threshold to 20 ms, the multi-GPU solutions still look pretty good—especially the SLI pairs. That’s true in spite of the jitter we know to be present. Micro-stuttering may cause other forms of visual pain, but the raw performance implications of it in terms of frame latency aren’t always devastating, so long as the higher frame times in that oscillating pattern don’t grow too large. Multi-GPU systems can still outperform a comparable single-GPU setup in an impactful way, in many cases.

Here’s where I think my crazy metric imported from the world of server benchmarking begins to pay off. By asking how quickly the vast majority of frames were returned, we can sort the different GPU solutions in an order of merit that means something important, and we avoid the average-inflating effect of those low-latency frames in those multi-GPU jitter patterns.

Notice that with jitter present, the 99th percentile frame times now no longer neatly mirror the FPS averages. With our more latency-sensitive percentile metric, some of the rankings have changed. The Radeon HD 6970 CrossFireX config drops from second place in FPS to fourth place in 99th percentile latency, and the GeForce GTX 580 rises from ninth to sixth. Most dramatically, the Radeon HD 6870 CrossFireX team drops from a relatively strong FPS finish, well above the GeForce GTX 580, to last place, virtually tied with a single GeForce GTX 570. The 6870 CrossFireX team is paying the penalty for relatively high latency on every other frame caused by its fairly pronounced jitter.

I’m not sure our 99th-percentile rankings are the best way to quantify video card performance, but I am persuaded that, as an indicator of consistently low frame times, they are superior to traditional FPS averages. In other words, I think this metric is potentially more relevant to gamers, even if it is kind of eggheaded.

We should also pause to note that all of these solutions perform pretty well in this test. The lowest 99th-percentile frame times are just over 25 ms, which translates to 40 FPS at a steady rate. A video card slinging out frames every 25 ms is no bad thing.

To give you a sense of how this crazy percentile-based method filters for jitter, allow me pull out a breathtakingly indulgent bar chart with a range of different percentiles on it, much like we’d use for servers. I’m not proposing that we always produce these sorts of monsters in future GPU reviews, but it is somewhat educational.

You can see how some of the multi-GPU solutions look really good at the 50th percentile. They’re churning out half of their frames at really low latencies. However, as we ramp up the percentage of frames in our tally, we capture more of those long-latency frames that constitute the other half of the oscillating pattern. The frame latencies for those multi-GPU solutions rise dramatically, even at the 66th percentile.

Heck, we could probably even create a “jitter index” by looking at the difference between the 50th and 99th percentile frame times.

Multi-GPU solutions with Bulletstorm
We still need to address some additional issues related to multi-GPU stuttering, but before we do so, I’d like to get the results from a couple more games under our belts.

Next up is Bulletstorm, which we tested in a different fashion than Bad Company 2. Rather than attempting to duplicate all of the exact same motions in each test run, we simply played the same level of this game through five times, in 60-second sessions. That means we have more variance in our results for this game, but with five runs worth of data, we should be able to make some assessments anyhow.

Here’s a look at frame time data from a single test run for each config.

You’ll notice that in this game, most of the multi-GPU solutions’ results look more like lines than clouds. The biggest exception is the Radeon HD 6870 CrossFireX setup, which still appears to exhibit quite a bit of jitter.

The other big difference here is that there are quite a few more spikes up to 40 ms or more during the test runs. The worst offender is the GeForce GTX 560 Ti SLI config, which performs poorly throughout. Our guess is that these 1GB cards are running low on total memory, which is why they struggle so mightily. Yes, the 6870 cards also have 1GB, but AMD has long been better at avoiding memory size constraints.

Get your mouse wheel ready, and we’ll zoom in on a snippet of this test run from each config.

For whatever reason, multi-GPU jitter is much less pronounced here. In fact, it’s nearly banished entirely in the case of the GeForce GTX 580 SLI and Radeon HD 6970 CrossFireX configs. Remember, this is just a small selection of frames from a single test run, so it’s not the whole story. Still, from looking at the full data set, my sense is that these samples are fairly representative of the amount of jitter throughout each run and from one run to the next. Of course, frame times will vary depending on what’s happening in the game.

Slicing and dicing: Bulletstorm

The GeForce GTX 560 Ti SLI setup is a mess, with over 500 frames north of 50 ms; we’ve excluded it from the chart above so it doesn’t throw off the scale. Beyond that, the multi-GPU solutions do a reasonably decent job of avoiding the longest frame times, generally speaking. The obvious exception is the Radeon HD 6870 CrossFireX config, which may be feeling a bit of memory size pressure, though not as acutely as the GTX 560 Ti SLI pair.

Linger a little longer on these numbers, and you’ll see that the multi-GPU solutions still face some minor challenges. The GTX 570 SLI setup, for example, has the same number of over-50-ms frames as a single card. Upgrading to dual cards gets you squat for avoidance of those long-latency frames, it would seem. Also, the Radeon HD 6990 and the 6970 CrossFireX team both match a single GeForce GTX 580, although both Radeon configs are ostensibly faster.

I really don’t like the results in the chart above, but I’ve included them in order to demonstrate a potential problem with our frame time count. Look at how the GeForce GTX 580 produces substantially more frames above 20 ms than the GTX 570. That’s because nearly every frame produced by both cards is over 20 ms. The GTX 580’s count is higher because it’s faster and therefore produces more total frames during our test period. We really shouldn’t let that fact count against it.

The lesson: you have to be careful where you set your thresholds when you’re doing this sort of a count. I’m comfortable our 50-ms threshold will avoid such problems in the vast majority of cases, so long as we test games at reasonably playable quality settings. The 20-ms threshold, however, is potentially problematic.

Without much jitter in the picture, our 99th-percentile rankings track pretty closely with the FPS averages for Bulletstorm. The big loser here is the Radeon HD 6870 CrossFireX rig, whose 55 FPS average is a mirage; its latency picture is no better than a single 6970’s. I’d say the 99th-percentile result for the GTX 560 Ti SLI better captures how much of a basket case that config is, too.

You can see the big gap between the 50th and 99th percentile frame times for the 6870 CrossFireX setup here. With the exception of the troubled GTX 560 Ti SLI, the rest of the multi-GPU solutions don’t look to have much jitter, just as our short samples had indicated.

Multi-GPU solutions with StarCraft II
SC2 is a little different than the other games. We recorded frame times with Fraps, but we played back a pre-recoded demo of a multiplayer match on each card rather than playing ourselves. We captured 33 minutes of play time for each demo, so we didn’t bother with five runs per card. Since 33 minutes of frame time data is nearly impossible to reproduce on a graph, we’ve included just 6500 frames in the plots below. Some cards produced as many as 140,000 frames, so be aware that we’re looking at a small snippet of the total data plotted below.

Well, that was unexpected. Look at the results for the fastest single-GPU cards, like the Radeon HD 6970 and the GeForce GTX 580. Why do those appear to have a bunch of jitter in them? Hmm.

Yep, both the GTX 580 and 6970 have quite a bit of variance over the span of three to four frames. The stair-step pattern on the GTX 580 is very regular. The pattern on the 6970 is a little different, but similar. I’m hesitant even to hazard a guess about what’s going on here. It could have something to do with triple-buffering, the way the game engine’s main timing loop works, or some interaction between the two. I suppose this is evidence that delivering consistent frame times isn’t just a challenge for SLI and CrossFireX setups.

The SLI configs don’t appear to be affected by the same brand of variance we saw in the single-GPU GTX 580 results. Instead, they show a small to moderate amount of garden-variety multi-GPU jitter. The CrossFireX configs, though, have that same regular three- to four-frame bounce we saw from a single 6970. I wish I could tell you more about what’s happening here, but it’s tough to say.

Slicing and dicing: StarCraft II

The GeForces are the champs at avoiding long frame latencies in SC2, in relative terms. However, we’re still seeing hundreds of frames over 50 ms during our 33-minute test session, even from the fastest solutions. There may be a CPU- or system-level performance limitation coming into play here. Still, a faster graphics subsystem will help quite a bit in avoiding slowdowns.

Even with all of the funkiness going on with frame time variance for some of single- and multi-GPU solutions, our 99th-percentile frame latency results track pretty closely with the FPS averages. Several corrections come with the change of focus, though. The latency-sensitive metric drops the Radeon HD 6970 to the bottom of the heap, for one. Also, in spite of its funky stair-step pattern, the GTX 580 rises in the rankings, moving ahead of even the Radeon HD 6990, whose own funny pattern of variance includes more frames at higher latencies. Finally, the GTX 580 SLI rig retains its overall performance lead, but its margin of victory over the rest of the field narrows substantially.

Multi-GPU micro-stuttering: Real… and really complicated
We didn’t set out to hunt down multi-GPU micro-stuttering. We just wanted to try some new methods of measuring performance, but those methods helped us identify an interesting problem. I think that means we’re on the right track, but the micro-stuttering issue complicates our task quite a bit.

Naturally, we contacted the major graphics chip vendors to see what they had to say about the issue. Somewhat to our surprise, representatives from both AMD and Nvidia quickly and forthrightly acknowledged that multi-GPU micro-stuttering is a real problem, is what we measured in our frame-time analysis, and is difficult to address. Both companies said they’ve been studying this problem for some time, too. That’s intriguing, because neither firm saw fit to inform potential customers about the issue when introducing its most recent multi-GPU product, say the Radeon HD 6990 or the GeForce GTX 590. Hmm.

AMD’s David Nalasco identified micro-stuttering as an issue with the rate at which frames are dispatched to GPUs, and he said the problem is not always an easy one to reproduce. Nalasco noted that jitter can come and go as one plays a game, because the relative timings between frames can vary.

We’d mostly agree with that assessment, but with several caveats based on our admittedly somewhat limited test data. For one, although jitter varies over time, multi-GPU setups that are prone to jitter in a given test scenario tend to return to it throughout each test run and from one run to the next. Second, the degree of jitter appears to be higher for systems that are more performance-constrained. For instance, when tested in the same game at the same settings, the mid-range Radeon HD 6870 CrossFireX config generally showed more frame-to-frame variance than the higher-end Radeon HD 6970 CrossFireX setup. The same is true of the GeForce GTX 560 Ti SLI setup versus dual GTX 580s. If this observation amounts to a trait of multi-GPU systems, it’s a negative trait. Multi-GPU rigs would have the most jitter just when low frame times are most threatened. Third, in our test data, multi-GPU configs based on Radeons appear to exhibit somewhat more jitter than those based on GeForces. We can’t yet say definitively that those observations will consistently hold true across different workloads, but that’s where our data so far point.

More intriguing is another possibility Nalasco mentioned: a “smarter” version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion.

Nalasco told us there are several ideas for dealing with the jitter problem. As you probably know, vsync, or vertical refresh synchronization, prevents the GPU from flipping to a different source buffer (in order to show a new frame) while the display is being painted. Instead, frame buffer flips are delayed to happen between screen redraws. Many folks prefer to play games with vsync enabled to prevent the tearing artifacts caused by frame buffer flips during display updates. Nalasco noted that enabling vsync could “probably sometimes help” with micro-stuttering. However, we think the precise impact of vsync on jitter is tough to predict; it adds another layer of timing complexity on top of several other such layers. More intriguing is another possibility Nalasco mentioned: a “smarter” version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion. We think that approach has potential, but Nalasco was talking only of a future prospect, not a currently implemented technology. He admitted AMD can’t say it has “a water-tight solution yet.”

Nalasco did say AMD may be paying more attention to these issues going forward because of its focus on exotic multi-GPU configurations like the Dual Graphics feature attached to the Llano APU. Because such configs involve asymmetry between GPUs, they’re potentially even more prone to jitter issues than symmetrical CrossFireX or SLI solutions.

Nvidia’s Tom Petersen mapped things out for us with the help of a visual aid.

The slide above shows the frame production pipeline, from the game engine through to the display, and it’s a useful refresher in the context of this discussion. Things begin with the game engine, which has its own internal timing and tracks a host of variables, from its internal physics simulation to graphics and user input. When a frame is ready for rendering, the graphics engine hands it off to the DirectX API. According to Petersen, it’s at this point that Fraps records a timestamp for each frame. Next, DirectX translates high-level API calls and shader programs into lower-level DirectX instructions and sends those to the GPU driver. The graphics driver then compiles DirectX instructions into machine-level instructions for the GPU, and the GPU renders the frame. Finally, the completed frame is displayed onscreen.

Petersen defined several terms to describe the key issues. “Stutter” is variation between the game’s internal timing for a frame (t_game) and the time at which the frame is displayed onscreen (t_display). “Lag” is a long delay between the game time and frame time, and “slide show” is a large total time for each frame, where the basic illusion of motion is threatened. These definitions are generally helpful, I think. You’ll notice that we’ve been talking quite a bit about stutter (or jitter) and the slide-show problem (or long frame times) already.

Stutter is, in Petersen’s view, “by far the most significant” of these three effects that people perceive in games.

In fact, in a bit of a shocking revelation, Petersen told us Nvidia has “lots of hardware” in its GPUs aimed at trying to fix multi-GPU stuttering. The basic technology, known as frame metering, dynamically tracks the average interval between frames. Those frames that show up “early” are delayed slightly—in other words, the GPU doesn’t flip to a new buffer immediately—in order to ensure a more even pace of frames presented for display. The lengths of those delays are adapted depending on the frame rate at any particular time. Petersen told us this frame-metering capability has been present in Nvidia’s GPUs since at least the G80 generation, if not earlier. (He offered to find out exactly when it was added, but we haven’t heard back yet.)

Poof. Mind blown.

Now, take note of the implications here. Because the metering delay is presumably inserted between T_render and T_display, Fraps would miss it entirely. That means all of our SLI data on the preceding pages might not track with how frames are presented to the user. Rather than perceive an alternating series of long and short frame times, the user would see a more even flow of frames at an average latency between the two.

Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem as it inserts that delay between the completion of the rendered frame and its exposure to the display. In most cases, we’re talking about tens of milliseconds or less; that sort of contribution to lag probably isn’t perceptible. Still, this is an interesting and previously hidden trade-off in SLI systems that gamers will want to consider.

Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem.

So long as the lag isn’t too great, metering frame output in this fashion has the potential to alleviate perceived jitter. It’s not a perfect solution, though. With Fraps, we can measure the differences between presentation times, when frames are presented to the DirectX API. A crucial and related question is how the internal timing of the game engine works. If the game engine generally assumes the same amount of time has passed between one frame and the next, metering should work beautifully. If not, then frame metering is just moving the temporal discontinuity problem around—and potentially making it worse. After all, the frames have important content, reflecting the motion of the underlying geometry in the game world. If the game engine tracks time finely enough, inserting a delay for every other frame would only exacerbate the perceived stuttering. The effect would be strange, like having a video camera that captures frames in an odd sequence, 12–34–56–78, and a projector that displays them in an even 1-2-3-4-5-6-7-8 fashion. Motion would not be smooth.

When we asked Petersen about this issue, he admitted metering might face challenges with different game engines. We asked him to identify a major game engine whose internal timing works well in conjunction with GeForce frame metering, but he wasn’t able to provide any specific examples just yet. Still, he asserted that “most games are happy if we present frames uniformly,” while acknowledging there’s more work to be done. In fact, he said, echoing Nalasco, there is a whole area of study in graphics about making frame delivery uniform.

So.. what now?
We have several takeaways after considering our test data and talking with Nalasco and Petersen about these issues. One of the big ones: ensuring frame-rate smoothness is a new frontier in GPU “performance” that’s only partially related to raw rendering speeds. Multi-GPU solutions are challenged on this front, but single-GPU graphics cards aren’t entirely in the clear, either. New technologies and clever algorithms may be needed in order to conquer this next frontier. GPU makers have some work to do, especially if they wish to continue selling multi-GPU cards and configs as premium products.

Meanwhile, we have more work ahead of us in considering the impact of jitter on how we test and evaluate graphics hardware. Fundamentally, we need to measure what’s being presented to the user via the display. We may have options there. Petersen told us Nvidia is considering creating an API that would permit third-party applications like Fraps to read display times from the GPU. We hope they do, and we’ll lobby AMD to provide the same sort of hooks in its graphics drivers. Beyond that, high-speed cameras might prove useful in measuring what’s happening onscreen with some precision. (Ahem. There’s a statement that just cost us thousands of dollars and countless hours of work.)

Ultimately, though, the user experience should be paramount in any assessment of graphics solutions. For example, we still need to get a good read on a basic question: how much of a problem is micro-stuttering, really? (I’m thinking of the visual discontinuities caused by jitter, not the potential for longer frame times, which are easer to pinpoint.) The answer depends very much on user perception, and user perception will depend on the person involved, on his monitor type, and on the degree of the problem.

Ultimately, though, the user experience should be paramount in any assessment of graphics solutions.

Presumably, a jitter pattern alternating between five- and 15-millisecond frame times would be less of an annoyance than a 15- and 45-millisecond pattern. The worst example we saw in our testing alternated between roughly six and twenty milliseconds, but it didn’t jump out at me as a problem during our original testing. Just now, I fired up Bad Company 2 on a pair of Radeon HD 6870s with the latest Catalyst 11.8 drivers. Fraps measures the same degree of jitter we saw initially, but try as I might, I can’t see the problem. We may need to spend more time with (ugh) faster TN panels, rather than our prettier and slower IPS displays, in order to get a better feel for the stuttering issue.

At the same time, we’re very interested in getting reader feedback on this matter. If you have multi-GPU setup, have you run into micro-stuttering problems? If so, how often do you see it and how perceptible is it? Please let us know in the comments.

Although they’ve been a little bit overshadowed by the issues they’ve helped uncover, we’re also cautiously optimistic about our proposed methods for measuring GPU performance in terms of frame times. Yes, Nvidia’s frame metering technology complicates our use of Fraps data with SLI setups. But for single-GPU solutions, at least, we think our new methods, with their focus on frame latencies, offer some potentially valuable insights into real-world performance that traditional FPS measurements tend to miss. We’ll probably have to change the way we review GPUs in the future as a result. These methods may be helpful in measuring CPU performance, as well. Again, we’re curious to get some reader feedback about which measures make sense to use and how they might fit alongside more traditional FPS averages. Our sense is that once you’ve gone inside the second, it may be difficult to look at things the same way once you zoom back out again.

Latest News

Spot Bitcoin ETFs Listed In Hong Kong Could Receive $25B From Investors – Crypto Firm Reports
Crypto News

Spot Bitcoin ETFs Listed In Hong Kong Could Receive $25B From Investors – Crypto Firm Reports

Roku Hit By Second Cyber Attack Insider Two Months, 576,000 Accounts Breached

Roku Hit By Second Cyber Attack Inside Two Months, 576,000 Accounts Breached

Popular streaming service Roku has fallen prey to a major cyberattack that has compromised around 576,000 Roku accounts. This is the second time the company has been hit by a...

Ripple Warns XRP Holders Against Potential Scam Tactics
Crypto News

Ripple Warns XRP Holders Against Potential Scam Tactics

Ripple, the blockchain company behind XRP, recently alerted holders about the latest scamming tactics of bad actors in the industry. According to the San Francisco-based firm, some scammers are flaunting...

WhatsApp Reduces Minimum Age from 16 to 13 in the UK and EU

WhatsApp under Fire for Reducing Minimum Age from 16 to 13 in the UK and EU

Bitcoin ETF Net Total Inflow Surpasses $91.30 m Data from Coinglass Reveals
Crypto News

Bitcoin ETF Net Total Inflow Surpasses $91.30 m Data from Coinglass Reveals

Coinbase Honoured by FinCEN for Combating Criminal Activities
Crypto News

Coinbase Honoured by FinCEN for Combating Criminal Activities

Apple Macs Will Get an Update with the New AI-Focused M4 Chips

Apple’s Beloved Macs Will Get an Update with the New AI-Focused M4 Chips