Inside the second: A new look at game benchmarking

I suppose it all started with a brief conversation. Last fall, I was having dinner with Ramsom Koay, the PR rep from Thermaltake. He’s an inquisitive guy, and he wanted to know the answer to what seemed like a simple question: why does anyone need a faster video card, so long as a relatively cheap one will produce 30 frames per second? And what’s the deal with more FPS, anyway? Who needs it?

I’m ostensibly the expert in such things, but honestly, I wasn’t prepared for such a question right at that moment. Caught off guard, I took a second to think it through and gave my best answer. I think it was a good one, as these things go, with some talk about avoiding slowdowns and maintaining a consistent illusion of motion. But I realized something jarring as I was giving it—that the results we provide our readers in our video card reviews don’t really address the issues I’d just identified very well.

That thought stuck with me and began, slowly, to grow. I was too busy to do much about it as the review season cranked up, but I did make one simple adjustment to my testing procedures: ticking the checkbox in Fraps—the utility we use to record in-game frame rates—that tells it to log individual frame times to disk. In every video card review that followed, I quietly collected data on how long each frame took to render.

Finally, last week, at the end of a quiet summer, I was able to take some time to slice and dice all of the data I’d collected. What the data showed proved to be really quite enlightening—and perhaps a bit scary, since it threatens to upend some of our conclusions in past reviews. Still, I think the results are very much worth sharing. In fact, they may change the way you think about video game benchmarking.

Why FPS fails

As you no doubt know, nearly all video game benchmarks are based on a single unit of measure, the ubiquitous FPS, or frames per second. FPS is a nice instant summary of performance, expressed in terms that are relatively easy to understand. After all, your average geek tends to know that movies happen at 24 FPS and television at 30 FPS, and any PC gamer who has done any tuning probably has a sense of how different frame rates “feel” in action.

Of course, there are always debates over benchmarking methods, and the usual average FPS score has come under fire repeatedly over the years for being too broad a measure. We’ve been persuaded by those arguments, so for quite a while now, we have provided average and low FPS rates from our benchmarking runs and, when possible, graphs of frame rates over time. We think that information gives folks a better sense of gaming performance than just an average FPS number.

Still, even that approach has some obvious weaknesses. We’ve noticed them at times when results from our FRAPS-based testing didn’t seem to square with our seat-of-the-pants experience. The fundamental problem is that, in terms of both computer time and human visual perception, one second is a very long time. Averaging results over a single second can obscure some big and important performance differences between systems.

To illustrate, let’s look at an example. It’s contrived, but it’s based on some real experiences we’ve had in game testing over the years. The charts below show the times required, in milliseconds, to produce a series of frames over a span of one second on two different video cards.

GPU 1 is obviously the faster solution in most respects. Generally, its frame times are in the teens, and that would usually add up to an average of about 60 FPS. GPU 2 is slower, with frame times consistently around 30 milliseconds.

However, GPU 1 has a problem running this game. Let’s say it’s a texture upload problem caused by poor memory management in the video drivers, although it could be just about anything, including a hardware issue. The result of the problem is that GPU 1 gets stuck when attempting to render one of the frames—really stuck, to the tune of a nearly half-second delay. If you were playing a game on this card and ran into this issue, it would be a huge show-stopper. If it happened often, the game would be essentially unplayable.

The end result is that GPU 2 does a much better job of providing a consistent illusion of motion during the period of time in question. Yet look at how these two cards fare when we report these results in FPS:

Whoops. In traditional FPS terms, the performance of these two solutions during our span of time is nearly identical. The numbers tell us there’s virtually no difference between them. Averaging our results over the span of a second has caused us to absorb and obscure a pretty major flaw in GPU 1’s performance.

Let’s say GPU 1 had similar but slightly smaller delays in other places during the full test run, but this one second was still its worst overall. If so, GPU 1’s average frame rate for the whole run could be upwards of 50 FPS, and its minimum frame rate would be 35 FPS—quite decent numbers, according to the conventional wisdom. Yet playing the game on this card might be almost completely unworkable.

If we saw these sorts of delays during our testing for a review, we’d likely have noted the occasional hitches in GPU 1’s performance, but some folks probably would have simply looked at the numbers and flipped to the next game without paying attention to our commentary. (Ahem.)

Frame time

in milliseconds

FPS
rate
8.3 120
10 100
16.7 60
20 50
25 40
33 30
42 24
50 20
60 17
70 14

By now, I suspect you see where we’re headed. FPS isn’t always bad as a summary of performance, but it has some obvious shortcomings due to the span of time involved. One way to overcome this weakness is to look inside the second, as we have just done, at the time it takes to produce individual frames. Doing so isn’t all that difficult. Heck, game developers have done it for years, tuning against individual frame times and also delving into how much GPU time each API call occupies when producing a single frame.

We will need to orient ourselves to a new way of thinking, though. The table on the right should help. It shows a range of frame times in milliseconds and their corresponding FPS rates, assuming those frame times were to remain constant over the course of a full second. Notice that in the world of individual frame times, lower is better, so a time of 30 ms is more desirable than a time of 60 ms.

We’ve included several obvious thresholds on the table, among them the 16.7 ms frame time that corresponds to a steady rate of 60 frames per second. Most LCD monitors these days require input at 60 cycles per second, or 60Hz, so going below the 16.7-ms threshold may be of limited use for some folks.

With that said, I am not a believer in the popular myth that speeds above 60 FPS are pointless. Somehow, folks seem to have conflated the limits of current display technologies (which are fairly low) with the limits of the human visual system (which are much higher). If you don’t believe me, you need only to try this simple test. Put two computers side by side, one with a 60Hz display and the other with a 120Hz display. Go to the Windows desktop and drag a window around the screen on each. Wonder in amazement as the 120Hz display produces an easily observable higher fluidity in the animation. In twitch games, steady frame rates of 90Hz or higher are probably helpful to the quickest (and perhaps the youngest) among us.

At the other end of the scale, we have the intriguing question of what sorts of frame times are acceptable before the illusion of motion begins to break down. Movies in the theater are one of the slower examples we have these days, with a steady frame rate of just 24 FPS—or 42 ms per frame. For graphical applications like games that involve interaction, I don’t think we’d want frame times to go much higher than that. I’m mostly just winging it here, but my sense is that a frame time over 50 ms is probably worthy of note as a mark against a gaming system’s performance. Stay above that for long, and your frame rate will drop to 20 FPS or lower—and most folks will probably start questioning whether they need to upgrade their systems.

With those considerations in mind, let’s have a look at some frame time data from an actual game, to see what we can learn.

New methods: some examples

Our first set of example data comes from our GeForce GTX 560 Ti review. We published that review early this year, so the drivers we used in it are rather dated by now, but results should serve to help us test out some new methods, regardless. We’ll be looking at results from Battlefield: Bad Company 2; the image quality settings we used for testing are shown below.

Interestingly enough, although it’s a boatload of data, we can plot the frame times from each video card fairly easily, just as we have plotted FPS over time in the past. One big difference is that lower frame times are more desirable, so we’ll read these plots differently. For example, the bright green line from the GeForce GTX 570 is the most desirable result of any of the GeForce cards.

As you can see, even though the data are scrunched together pretty tightly, outliers like especially high frame times can be picked out easily. Also, notice that faster cards tend to produce more frames, so there’s more data for the higher-performance solutions.

We can still line up a couple of direct competitors for a head-to-head comparison graph, too. Overall, I think this result illustrates nicely how very closely matched these two video cards are. The GTX 560 Ti is markedly faster only in a short span from frames 2150 to 2250 or so—well, except for that one outlier from the Radeon at around frame 500. Let’s put on our magnification glasses and take a closer look at it.

This real-world result nicely parallels our theoretical example from earlier, although the frame rate delay on the Radeon isn’t nearly as long. Obviously, an FPS average won’t catch the difference between the two cards here.

In fact, have a look at the two frame times following the 58 ms delay; they’re very low. That’s likely because the video card is using triple buffering, so the rendering of those two later frames wasn’t blocked by the wait for the one before them. Crazily enough, if you consider just those three frames together, the average frame time is 23 ms. Yet that 58 ms frame happened, and it potentially interrupted the flow of the game.

Now, we don’t want to overstate the importance of a single incident like that, but with all of these frame time data at our disposal, we can easily ask whether it’s part of a larger pattern.

We’re counting through all five of our 60-second Fraps sessions for each card here. As you may have inferred by reading the plots at the top of the page, the Radeons aren’t plagued with a terrible problem, but they do run into a minor hiccup about once in each 60-second session—with the notable exception of the Radeon HD 6970. By contrast, the Nvidia GPUs deliver more consistent results. Not even the older GeForce GTX 260 produces a single hitch.

If you’re looking to do some multiplayer gaming where reaction times are paramount, you may want to ensure that your frame times are consistently low. By cranking our threshold down to 20 ms (or the equivalent of 50 FPS), we can separate the silky smooth solutions from the pretenders.

Only two cards, the GeForce GTX 570 and Radeon HD 6970, produce nearly all of their frames in under 20 ms. If you’re an aspiring pro gamer, you’ll need to pick up a relatively fast video card—or just do what they all do anyway: crank the display quality down as much as possible to ensure solid performance.

Counting up frames over a 50-ms or whatever threshold is nice, but it doesn’t really capture everything we’d like. We do want to know about those outliers, but what we really need to understand is how well a video card maintains that steady illusion of motion.

One way to address that question is to rip a page from the world of server benchmarking. In that world, we often measure performance for systems processing lots of transactions. Oftentimes the absolute transaction rate is less important than delivering consistently low transaction latencies. For instance, here is an example where the cheaper Xeons average more requests per second, but the pricey “big iron” Xeons maintain lower response times under load. We can quantify that reality by looking at the 99th percentile response time, which sounds like a lot of big words but is a fundamentally simple concept: for each system, 99% of all requests were processed within X milliseconds. The lower that number is, the quicker the system is overall. (Ruling out that last 1% allows us to filter out any weird outliers.)

Oddly enough, we want to ask the same basic question about gaming performance. We want our systems to ensure consistently low frame times, and doing so is arguably more important than achieving the highest FPS rate.

Happily, in this case, our 99th percentile frame time results pretty closely mirror the average FPS rates. The cards are ranked in the same order, with similar gaps between them. That fact tells us several things. Most notably, the cards with relatively high frames rates are also producing relatively low frame times with some consistency. The reverse is also true: the cards with lower FPS averages are delivering higher frame times. None of the cards are returning a bunch of strangely high or low frame times that would throw off the FPS average. As a result, we can say that these cards are doing a nice job of maintaining the illusion of motion at something close to their advertised FPS averages.

Also, I think this outcome somewhat validates our use of the 99th percentile frame times as a sort of complement to the usual FPS average. If all goes as it should, a video card delivering high frame rates ought to achieve predominantly low frame times, as well. Granted, this is a pretty geeky way to analyze the data, but you’ll see why it matters shortly.

Applying our methods to multi-GPU solutions

Our multi-GPU data comes from our review of the GeForce GTX 590. We’ll start off with results from Bad Company 2 again, this time tested at a higher resolution more appropriate to these higher-end solutions.

Forgive me, but I’m about to throw a load at info at you, visualized in various ways. We’ll begin with frame time plots, as we did on the last page. However, you’ll notice that a number of those plots look kind of strange.

The frame time info for a number of the setups looks more like a cloud than a line, oddly enough. What’s going on? Let’s slide on those magnification glasses again and take a closer look.

Zooming in on the problem

The three single-GPU solutions look fairly normal, but then they’re the ones that don’t look odd in the earlier plots.

A funny thing happens, though, when we get into the multi-GPU results.

The frames times oscillate between relatively long and short delays in an every-other-frame pattern. To one degree or another, all of the multi-GPU solutions follow this template. With apologies for the scrolling involved, let’s illustrate more fully:

Wow. We’ve heard whispers from time to time about micro-stuttering problems with multi-GPU solutions, but here it is in captivity.

To be clear, what we’re seeing here is quite likely an artifact of the way today’s multi-GPU solutions work. Both AMD and Nvidia prefer a technique for divvying up the workload between two GPUs known as alternate frame rendering (AFR). As the name indicates, AFR involves assigning the first GPU in a team to render the even-numbered frames, while the second GPU handles the odd-numbered ones, so frames are produced by the two GPUs in an interleaved fashion. (AFR can also be employed with three or four GPUs, with one frame being assigned to each GPU in sequence.) SLI and CrossFire support other load-balancing methods, such as split-frame rendering, but those methods aren’t favored, because they don’t scale as well in terms of FPS averages.

Although the fundamentals are fairly straightforward, load-balancing between multiple GPUs isn’t a simple job. Graphics workloads vary from frame to frame. Heck, graphics workloads vary down to the block and pixel levels; a frame is essentially a massively parallel problem in itself. The nature of that huge parallel workload can change substantially from one frame to the next, so keeping the timing synchronized between two GPUs doing AFR isn’t easy.

Then, because only the primary video card is connected to the display, the frames rendered by the second GPU in an AFR team must be copied over to the primary GPU for output. Those frame data are typically transferred between the GPUs via the proprietary SLI and CrossFire interfaces built into high-end video cards. Dual-GPU cards like the Radeon HD 6990 and GeForce GTX 590 include an onboard version of that same interface. (Some low-end multi-GPU solutions do without a custom data channel and transfer frame data via PCIe.) Copying those frames from the secondary GPU to the primary takes time. In addition to completed frames, the contents of other buffers must oftentimes be transferred between GPUs, especially when advanced rendering techniques (like render-to-texture) are in use. Such information is usually shared over PCIe, as I understand it, which has its own transfer latencies.

The charts above demonstrate that the synchronization for some of the multi-GPU solutions is far from perfect. This revelation opens up a huge, messy set of issues that will take some time to cover in full. For now, though, we can start by identifying several basic problems raised by the jitter inherent in multi-GPU systems.

Obviously, one implication is that using an FPS average to summarize multi-GPU performance is perilous. In cases of extreme jitter, an FPS average may give too much credit to a solution that’s suffering from relatively high latencies for every other frame. My sense is those longer frame times in the pattern would be the gating factor for the illusion of motion. That is, a solution producing frame times of roughly 20 ms and 50 ms in interleaved fashion would provide no better illusion of motion than a system with constant 50-ms frame times—at best.

The reality is probably somewhat worse than that, for reasons we’ve already discussed. The human visual system is very good at picking up visual irregularities, and too much jitter in the flow of frames could easily cause a form of visual interference that would be rather distracting or annoying. I can’t say I’ve personally experienced this sensation in any acute way—and I do play games on our multi-GPU test rigs with some regularity—but I know some folks with multi-GPU systems have complained about micro-stuttering sullying their gaming sessions. (I am one of those folks who can see the rainbow effect in older DLP displays, so I know such things can be disruptive.)

That’s just the tip of the iceberg, too. The complexity of this problem is even deeper than our frame time data alone can indicate. We’ll address more of the issues shortly, but first, I still think a closer look at our frame time data can prove instructive.

Slicing and dicing the numbers

Here’s one reason why I wanted to press on with our look at frame time data. I believe the issue we’re seeing here is largely independent of the micro-stuttering problem. If you look back at the individual frame time plots over the span of a whole test run, you can see the trend clearly. The multi-GPU solutions just tend to run into especially long frame latencies more often than the single-GPU offerings. There is some overhead associated with keeping two GPUs fed and synchronized in the same system, and it seems to lead to occasional trouble—in the form of frame times over our 50-ms threshold.

Having said that, if we lower the threshold to 20 ms, the multi-GPU solutions still look pretty good—especially the SLI pairs. That’s true in spite of the jitter we know to be present. Micro-stuttering may cause other forms of visual pain, but the raw performance implications of it in terms of frame latency aren’t always devastating, so long as the higher frame times in that oscillating pattern don’t grow too large. Multi-GPU systems can still outperform a comparable single-GPU setup in an impactful way, in many cases.

Here’s where I think my crazy metric imported from the world of server benchmarking begins to pay off. By asking how quickly the vast majority of frames were returned, we can sort the different GPU solutions in an order of merit that means something important, and we avoid the average-inflating effect of those low-latency frames in those multi-GPU jitter patterns.

Notice that with jitter present, the 99th percentile frame times now no longer neatly mirror the FPS averages. With our more latency-sensitive percentile metric, some of the rankings have changed. The Radeon HD 6970 CrossFireX config drops from second place in FPS to fourth place in 99th percentile latency, and the GeForce GTX 580 rises from ninth to sixth. Most dramatically, the Radeon HD 6870 CrossFireX team drops from a relatively strong FPS finish, well above the GeForce GTX 580, to last place, virtually tied with a single GeForce GTX 570. The 6870 CrossFireX team is paying the penalty for relatively high latency on every other frame caused by its fairly pronounced jitter.

I’m not sure our 99th-percentile rankings are the best way to quantify video card performance, but I am persuaded that, as an indicator of consistently low frame times, they are superior to traditional FPS averages. In other words, I think this metric is potentially more relevant to gamers, even if it is kind of eggheaded.

We should also pause to note that all of these solutions perform pretty well in this test. The lowest 99th-percentile frame times are just over 25 ms, which translates to 40 FPS at a steady rate. A video card slinging out frames every 25 ms is no bad thing.

To give you a sense of how this crazy percentile-based method filters for jitter, allow me pull out a breathtakingly indulgent bar chart with a range of different percentiles on it, much like we’d use for servers. I’m not proposing that we always produce these sorts of monsters in future GPU reviews, but it is somewhat educational.

You can see how some of the multi-GPU solutions look really good at the 50th percentile. They’re churning out half of their frames at really low latencies. However, as we ramp up the percentage of frames in our tally, we capture more of those long-latency frames that constitute the other half of the oscillating pattern. The frame latencies for those multi-GPU solutions rise dramatically, even at the 66th percentile.

Heck, we could probably even create a “jitter index” by looking at the difference between the 50th and 99th percentile frame times.

Multi-GPU solutions with Bulletstorm

We still need to address some additional issues related to multi-GPU stuttering, but before we do so, I’d like to get the results from a couple more games under our belts.

Next up is Bulletstorm, which we tested in a different fashion than Bad Company 2. Rather than attempting to duplicate all of the exact same motions in each test run, we simply played the same level of this game through five times, in 60-second sessions. That means we have more variance in our results for this game, but with five runs worth of data, we should be able to make some assessments anyhow.

Here’s a look at frame time data from a single test run for each config.

You’ll notice that in this game, most of the multi-GPU solutions’ results look more like lines than clouds. The biggest exception is the Radeon HD 6870 CrossFireX setup, which still appears to exhibit quite a bit of jitter.

The other big difference here is that there are quite a few more spikes up to 40 ms or more during the test runs. The worst offender is the GeForce GTX 560 Ti SLI config, which performs poorly throughout. Our guess is that these 1GB cards are running low on total memory, which is why they struggle so mightily. Yes, the 6870 cards also have 1GB, but AMD has long been better at avoiding memory size constraints.

Get your mouse wheel ready, and we’ll zoom in on a snippet of this test run from each config.

For whatever reason, multi-GPU jitter is much less pronounced here. In fact, it’s nearly banished entirely in the case of the GeForce GTX 580 SLI and Radeon HD 6970 CrossFireX configs. Remember, this is just a small selection of frames from a single test run, so it’s not the whole story. Still, from looking at the full data set, my sense is that these samples are fairly representative of the amount of jitter throughout each run and from one run to the next. Of course, frame times will vary depending on what’s happening in the game.

Slicing and dicing: Bulletstorm

The GeForce GTX 560 Ti SLI setup is a mess, with over 500 frames north of 50 ms; we’ve excluded it from the chart above so it doesn’t throw off the scale. Beyond that, the multi-GPU solutions do a reasonably decent job of avoiding the longest frame times, generally speaking. The obvious exception is the Radeon HD 6870 CrossFireX config, which may be feeling a bit of memory size pressure, though not as acutely as the GTX 560 Ti SLI pair.

Linger a little longer on these numbers, and you’ll see that the multi-GPU solutions still face some minor challenges. The GTX 570 SLI setup, for example, has the same number of over-50-ms frames as a single card. Upgrading to dual cards gets you squat for avoidance of those long-latency frames, it would seem. Also, the Radeon HD 6990 and the 6970 CrossFireX team both match a single GeForce GTX 580, although both Radeon configs are ostensibly faster.

I really don’t like the results in the chart above, but I’ve included them in order to demonstrate a potential problem with our frame time count. Look at how the GeForce GTX 580 produces substantially more frames above 20 ms than the GTX 570. That’s because nearly every frame produced by both cards is over 20 ms. The GTX 580’s count is higher because it’s faster and therefore produces more total frames during our test period. We really shouldn’t let that fact count against it.

The lesson: you have to be careful where you set your thresholds when you’re doing this sort of a count. I’m comfortable our 50-ms threshold will avoid such problems in the vast majority of cases, so long as we test games at reasonably playable quality settings. The 20-ms threshold, however, is potentially problematic.

Without much jitter in the picture, our 99th-percentile rankings track pretty closely with the FPS averages for Bulletstorm. The big loser here is the Radeon HD 6870 CrossFireX rig, whose 55 FPS average is a mirage; its latency picture is no better than a single 6970’s. I’d say the 99th-percentile result for the GTX 560 Ti SLI better captures how much of a basket case that config is, too.

You can see the big gap between the 50th and 99th percentile frame times for the 6870 CrossFireX setup here. With the exception of the troubled GTX 560 Ti SLI, the rest of the multi-GPU solutions don’t look to have much jitter, just as our short samples had indicated.

Multi-GPU solutions with StarCraft II
SC2 is a little different than the other games. We recorded frame times with Fraps, but we played back a pre-recoded demo of a multiplayer match on each card rather than playing ourselves. We captured 33 minutes of play time for each demo, so we didn’t bother with five runs per card. Since 33 minutes of frame time data is nearly impossible to reproduce on a graph, we’ve included just 6500 frames in the plots below. Some cards produced as many as 140,000 frames, so be aware that we’re looking at a small snippet of the total data plotted below.

Well, that was unexpected. Look at the results for the fastest single-GPU cards, like the Radeon HD 6970 and the GeForce GTX 580. Why do those appear to have a bunch of jitter in them? Hmm.

Yep, both the GTX 580 and 6970 have quite a bit of variance over the span of three to four frames. The stair-step pattern on the GTX 580 is very regular. The pattern on the 6970 is a little different, but similar. I’m hesitant even to hazard a guess about what’s going on here. It could have something to do with triple-buffering, the way the game engine’s main timing loop works, or some interaction between the two. I suppose this is evidence that delivering consistent frame times isn’t just a challenge for SLI and CrossFireX setups.

The SLI configs don’t appear to be affected by the same brand of variance we saw in the single-GPU GTX 580 results. Instead, they show a small to moderate amount of garden-variety multi-GPU jitter. The CrossFireX configs, though, have that same regular three- to four-frame bounce we saw from a single 6970. I wish I could tell you more about what’s happening here, but it’s tough to say.

Slicing and dicing: StarCraft II

The GeForces are the champs at avoiding long frame latencies in SC2, in relative terms. However, we’re still seeing hundreds of frames over 50 ms during our 33-minute test session, even from the fastest solutions. There may be a CPU- or system-level performance limitation coming into play here. Still, a faster graphics subsystem will help quite a bit in avoiding slowdowns.

Even with all of the funkiness going on with frame time variance for some of single- and multi-GPU solutions, our 99th-percentile frame latency results track pretty closely with the FPS averages. Several corrections come with the change of focus, though. The latency-sensitive metric drops the Radeon HD 6970 to the bottom of the heap, for one. Also, in spite of its funky stair-step pattern, the GTX 580 rises in the rankings, moving ahead of even the Radeon HD 6990, whose own funny pattern of variance includes more frames at higher latencies. Finally, the GTX 580 SLI rig retains its overall performance lead, but its margin of victory over the rest of the field narrows substantially.

Multi-GPU micro-stuttering: Real… and really complicated

We didn’t set out to hunt down multi-GPU micro-stuttering. We just wanted to try some new methods of measuring performance, but those methods helped us identify an interesting problem. I think that means we’re on the right track, but the micro-stuttering issue complicates our task quite a bit.

Naturally, we contacted the major graphics chip vendors to see what they had to say about the issue. Somewhat to our surprise, representatives from both AMD and Nvidia quickly and forthrightly acknowledged that multi-GPU micro-stuttering is a real problem, is what we measured in our frame-time analysis, and is difficult to address. Both companies said they’ve been studying this problem for some time, too. That’s intriguing, because neither firm saw fit to inform potential customers about the issue when introducing its most recent multi-GPU product, say the Radeon HD 6990 or the GeForce GTX 590. Hmm.

AMD’s David Nalasco identified micro-stuttering as an issue with the rate at which frames are dispatched to GPUs, and he said the problem is not always an easy one to reproduce. Nalasco noted that jitter can come and go as one plays a game, because the relative timings between frames can vary.

We’d mostly agree with that assessment, but with several caveats based on our admittedly somewhat limited test data. For one, although jitter varies over time, multi-GPU setups that are prone to jitter in a given test scenario tend to return to it throughout each test run and from one run to the next. Second, the degree of jitter appears to be higher for systems that are more performance-constrained. For instance, when tested in the same game at the same settings, the mid-range Radeon HD 6870 CrossFireX config generally showed more frame-to-frame variance than the higher-end Radeon HD 6970 CrossFireX setup. The same is true of the GeForce GTX 560 Ti SLI setup versus dual GTX 580s. If this observation amounts to a trait of multi-GPU systems, it’s a negative trait. Multi-GPU rigs would have the most jitter just when low frame times are most threatened. Third, in our test data, multi-GPU configs based on Radeons appear to exhibit somewhat more jitter than those based on GeForces. We can’t yet say definitively that those observations will consistently hold true across different workloads, but that’s where our data so far point.

More intriguing is another possibility Nalasco mentioned: a “smarter” version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion.

Nalasco told us there are several ideas for dealing with the jitter problem. As you probably know, vsync, or vertical refresh synchronization, prevents the GPU from flipping to a different source buffer (in order to show a new frame) while the display is being painted. Instead, frame buffer flips are delayed to happen between screen redraws. Many folks prefer to play games with vsync enabled to prevent the tearing artifacts caused by frame buffer flips during display updates. Nalasco noted that enabling vsync could “probably sometimes help” with micro-stuttering. However, we think the precise impact of vsync on jitter is tough to predict; it adds another layer of timing complexity on top of several other such layers. More intriguing is another possibility Nalasco mentioned: a “smarter” version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion. We think that approach has potential, but Nalasco was talking only of a future prospect, not a currently implemented technology. He admitted AMD can’t say it has “a water-tight solution yet.”

Nalasco did say AMD may be paying more attention to these issues going forward because of its focus on exotic multi-GPU configurations like the Dual Graphics feature attached to the Llano APU. Because such configs involve asymmetry between GPUs, they’re potentially even more prone to jitter issues than symmetrical CrossFireX or SLI solutions.

Nvidia’s Tom Petersen mapped things out for us with the help of a visual aid.

The slide above shows the frame production pipeline, from the game engine through to the display, and it’s a useful refresher in the context of this discussion. Things begin with the game engine, which has its own internal timing and tracks a host of variables, from its internal physics simulation to graphics and user input. When a frame is ready for rendering, the graphics engine hands it off to the DirectX API. According to Petersen, it’s at this point that Fraps records a timestamp for each frame. Next, DirectX translates high-level API calls and shader programs into lower-level DirectX instructions and sends those to the GPU driver. The graphics driver then compiles DirectX instructions into machine-level instructions for the GPU, and the GPU renders the frame. Finally, the completed frame is displayed onscreen.

Petersen defined several terms to describe the key issues. “Stutter” is variation between the game’s internal timing for a frame (t_game) and the time at which the frame is displayed onscreen (t_display). “Lag” is a long delay between the game time and frame time, and “slide show” is a large total time for each frame, where the basic illusion of motion is threatened. These definitions are generally helpful, I think. You’ll notice that we’ve been talking quite a bit about stutter (or jitter) and the slide-show problem (or long frame times) already.

Stutter is, in Petersen’s view, “by far the most significant” of these three effects that people perceive in games.

In fact, in a bit of a shocking revelation, Petersen told us Nvidia has “lots of hardware” in its GPUs aimed at trying to fix multi-GPU stuttering. The basic technology, known as frame metering, dynamically tracks the average interval between frames. Those frames that show up “early” are delayed slightly—in other words, the GPU doesn’t flip to a new buffer immediately—in order to ensure a more even pace of frames presented for display. The lengths of those delays are adapted depending on the frame rate at any particular time. Petersen told us this frame-metering capability has been present in Nvidia’s GPUs since at least the G80 generation, if not earlier. (He offered to find out exactly when it was added, but we haven’t heard back yet.)

Poof. Mind blown.

Now, take note of the implications here. Because the metering delay is presumably inserted between T_render and T_display, Fraps would miss it entirely. That means all of our SLI data on the preceding pages might not track with how frames are presented to the user. Rather than perceive an alternating series of long and short frame times, the user would see a more even flow of frames at an average latency between the two.

Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem as it inserts that delay between the completion of the rendered frame and its exposure to the display. In most cases, we’re talking about tens of milliseconds or less; that sort of contribution to lag probably isn’t perceptible. Still, this is an interesting and previously hidden trade-off in SLI systems that gamers will want to consider.

Frame metering sounds like a pretty cool technology, but there is a trade-off involved. To cushion jitter, Nvidia is increasing the amount of lag in the graphics subsystem.

So long as the lag isn’t too great, metering frame output in this fashion has the potential to alleviate perceived jitter. It’s not a perfect solution, though. With Fraps, we can measure the differences between presentation times, when frames are presented to the DirectX API. A crucial and related question is how the internal timing of the game engine works. If the game engine generally assumes the same amount of time has passed between one frame and the next, metering should work beautifully. If not, then frame metering is just moving the temporal discontinuity problem around—and potentially making it worse. After all, the frames have important content, reflecting the motion of the underlying geometry in the game world. If the game engine tracks time finely enough, inserting a delay for every other frame would only exacerbate the perceived stuttering. The effect would be strange, like having a video camera that captures frames in an odd sequence, 12–34–56–78, and a projector that displays them in an even 1-2-3-4-5-6-7-8 fashion. Motion would not be smooth.

When we asked Petersen about this issue, he admitted metering might face challenges with different game engines. We asked him to identify a major game engine whose internal timing works well in conjunction with GeForce frame metering, but he wasn’t able to provide any specific examples just yet. Still, he asserted that “most games are happy if we present frames uniformly,” while acknowledging there’s more work to be done. In fact, he said, echoing Nalasco, there is a whole area of study in graphics about making frame delivery uniform.

So.. what now?

We have several takeaways after considering our test data and talking with Nalasco and Petersen about these issues. One of the big ones: ensuring frame-rate smoothness is a new frontier in GPU “performance” that’s only partially related to raw rendering speeds. Multi-GPU solutions are challenged on this front, but single-GPU graphics cards aren’t entirely in the clear, either. New technologies and clever algorithms may be needed in order to conquer this next frontier. GPU makers have some work to do, especially if they wish to continue selling multi-GPU cards and configs as premium products.

Meanwhile, we have more work ahead of us in considering the impact of jitter on how we test and evaluate graphics hardware. Fundamentally, we need to measure what’s being presented to the user via the display. We may have options there. Petersen told us Nvidia is considering creating an API that would permit third-party applications like Fraps to read display times from the GPU. We hope they do, and we’ll lobby AMD to provide the same sort of hooks in its graphics drivers. Beyond that, high-speed cameras might prove useful in measuring what’s happening onscreen with some precision. (Ahem. There’s a statement that just cost us thousands of dollars and countless hours of work.)

Ultimately, though, the user experience should be paramount in any assessment of graphics solutions. For example, we still need to get a good read on a basic question: how much of a problem is micro-stuttering, really? (I’m thinking of the visual discontinuities caused by jitter, not the potential for longer frame times, which are easer to pinpoint.) The answer depends very much on user perception, and user perception will depend on the person involved, on his monitor type, and on the degree of the problem.

Ultimately, though, the user experience should be paramount in any assessment of graphics solutions.

Presumably, a jitter pattern alternating between five- and 15-millisecond frame times would be less of an annoyance than a 15- and 45-millisecond pattern. The worst example we saw in our testing alternated between roughly six and twenty milliseconds, but it didn’t jump out at me as a problem during our original testing. Just now, I fired up Bad Company 2 on a pair of Radeon HD 6870s with the latest Catalyst 11.8 drivers. Fraps measures the same degree of jitter we saw initially, but try as I might, I can’t see the problem. We may need to spend more time with (ugh) faster TN panels, rather than our prettier and slower IPS displays, in order to get a better feel for the stuttering issue.

At the same time, we’re very interested in getting reader feedback on this matter. If you have multi-GPU setup, have you run into micro-stuttering problems? If so, how often do you see it and how perceptible is it? Please let us know in the comments.

Although they’ve been a little bit overshadowed by the issues they’ve helped uncover, we’re also cautiously optimistic about our proposed methods for measuring GPU performance in terms of frame times. Yes, Nvidia’s frame metering technology complicates our use of Fraps data with SLI setups. But for single-GPU solutions, at least, we think our new methods, with their focus on frame latencies, offer some potentially valuable insights into real-world performance that traditional FPS measurements tend to miss. We’ll probably have to change the way we review GPUs in the future as a result. These methods may be helpful in measuring CPU performance, as well. Again, we’re curious to get some reader feedback about which measures make sense to use and how they might fit alongside more traditional FPS averages. Our sense is that once you’ve gone inside the second, it may be difficult to look at things the same way once you zoom back out again.

Comments closed
    • HTWingNut
    • 8 years ago

    A little late to the party here, but found this article very interesting. Of most importance to me is how AMD will manage this considering they have introduced asymmetrical dual graphics using the IGP and discrete cards. This could be hugely beneficial in the mobile platforms. So far Xfire in this configuration is hit or miss and in most cases results in horrific stutter or “jitter” as described in this article. I have done some extensive benchmarking, but will have to add this to the list and see how this “jitter” results in my benchmarking with the mobile Llano. Thanks for the article. Looking forward to see if/how AMD and nVidia manage this, and updated hook for programs like FRAPS. Perhaps the drivers and/or FRAPS could provide a jitter report as well.

    • Haldi
    • 8 years ago

    i did some tests like that too.

    Bu i prefer the Frametimes shown in an other way. To see the whole Spread!

    Example: Bulletstorm
    [url<]http://widescreengamingforum.com/f/u/ShippingPC.png[/url<] You can see how many Frames have 25ms, 20ms. Then you can also see on the first view if something is off. Like here [url<]http://widescreengamingforum.com/f/u/Dirt3.png[/url<] Which looks like that in a normal Graph [url<]http://widescreengamingforum.com/f/u/Dirt3graph.png[/url<]

    • JMccovery
    • 8 years ago

    I was just reading through this again, and it struck me: No matter how beastly of a system you have (modded dual/quad 3960X, 6 of the fastest SSDs EVER, 1TB 64GHz DDR), the sheer complexity of games will (more than likely) always produce this issue.

    Even something as simple as a miniscule stall in input, networking, AI can cause these spikes. Even the software/hardware relationship of DirectX and OpenGL can cause hitches.

    I wonder if anyone would be willing to test this with SFR?

    I’ll also agree on the love fore the 99 percentile charts, they are real eye openers.

    • DEMONSLAYER
    • 8 years ago

    Scott,

    I currently run Nvidia GTX 460s in SLI mode and was planning to upgrade to the MSI GTX 560 Ti. However, your analysis of the micro stutter on the 560 SLI indicated that the 1GB cards may suffer from inadequate vram, possibly causing the stutter. Would the 2GB versions of the MSI GTX 560 Ti suffer the same stutter problem?

    My pc is a Alienware Area 51 ALX, 990X processor oc’d to 4.1, 12GB Corsair Vengeance ram, and 3.3TB (5 hard drives).

    • technogiant
    • 8 years ago

    Great article that has really blown the lid off a whole can of mutli gpu worms.

    The has been mention that PCIe 3 may reduce micro stutter…have you been able to test this on an X79 mobo?

    • vitalious
    • 8 years ago

    What an amazing article.
    This raises the bar in GPU performance analysis.

    Also, I think standard deviation can be a good measure for frame rendering consistency.

    • marraco
    • 8 years ago

    Excellent!!! this is my new standard on GPU reviews.

    This article puts TechReport a step before Tomshardware and Anandtech.

    I noticed since long time that higher FPS do not mean better experience, just because of this problem.

    • Creamy Goodness
    • 8 years ago

    great article, thanks for what must have been hours of research!!

    • xiaomimm
    • 8 years ago
    • xiaomim
    • 8 years ago
    • kamikaziechameleon
    • 8 years ago

    So a 6970 is a poor card by this metric?

    • CBHvi7t
    • 8 years ago

    If frames are independent there is an obvious solution:
    skip/drop the frame if the other GPU is already done with the next.
    => no, frame longer than 2/’usual fps’

    • mi1stormilst
    • 8 years ago

    I wonder if the frame rendering times change when connected to DVI, HDMI, DISPLAYPORT, VGA? Do non-HDCP enabled monitors exhibit a different render time? Do you guys personally have any idea of the number of slower frames or any patterns or rendering times that can have a negative visual impact?
    As interested as I am about this topic I think you should be very careful in assuming that 50ms or even 200ms is a perceptible time for a frame to rendered. In the event that you have multiple frames rendered long and short you might assume humans can pick up on that when the reality maybe that they can’t. One thing I am sure of is that people are happy with their video cards and love to play games. I guess I am just trying to point out that you can buy a really nice TV for a lot of money, but if you watch your favorite show with your nose on the screen it is going to look like shit. Having said that I would really like to see someone get to the bottom of the perception question sooner rather than later. If the perception can be reasonably quantified it should be a primary consideration for consumers.

    • ColeLT1
    • 8 years ago

    I noticed the stutter on a 9800gx2 only at lower fps (40-30s?). When I played GTA4 with that card it was very noticeable, but I thought it was an SLI problem with that game, since there were obvious visual artifacts, like open-able door would flicker like an old florescent light. I eventually dropped my resolution to 1680×1050 to get a steady 60fps and never had any more stutters.

    With my twin 460s the only time I noticed a stutter was on Crysis, maxed out and at 1920×1200 it would play fine, at 55-60fps. Then I had an issue where seemingly randomly it would drop to 20fps, then slowly rise back up to the 50s, but stutter badly even showing 55fps. I found in afterburner that I spiked over the 1024mb vram, like 1080, even though it had dropped to like 980, and the frames were high, the stutter was noticeable until I restarted the game. lowered the AA to 8x, the memory dropped to the 900s and the problem never came back.

    I probably am way off, but here is my experience: If you have a game at (not vsynced) 30fps on a single card, or 30 on dual, both setups are stressed to 100% gpu. Of course the card is going to have trouble syncing from one to another, its maxed out. But if you have a game played at 60 on a single or dual setup, I am presented with the same experience, no stutters, because the cards are not overloaded and struggling.

    SLI/CF isn’t perfect, I would rather have one card than 2. I only got it because I got a deal on cards, 2xMSI hawk 460s for 139 each. Since every game I play runs a steady 60fps (with the exception of crysis, which I only played 1 time through), I would guess I am not taxing my system enough to show a stutter, and I am in no rush to play metro2033 maxed out yet. But the fact that it took a game from 35fps, jumpy as hell, to a smooth 55, I am happy with my purchase.

    • aggies11
    • 8 years ago

    TLDR – Why not just calculate a “subsecond” framerate (1000/frametime). Measure the difference between these values and the “traditional FPS” value for the particular 1second of time. That difference represents how much the frametimes vary in a manner which we understand (FPS) vs something unfamiliar (milliseconds). You can then even calculate a min-diff, max-diff, avg-diff per second. To know how good/bad an individual seconds worth of frames were. Those values can then themselves be averaged/graphed out over the course of a bench run. (Eg. avg-diff of 3FPS would be pretty good, where the article decided threshold of 50ms would be a “max-diff of 20FPS”. The goal is to quantify what the end user experiences, sub-second changes in the framerate appearing as stuttering (if regular) or choppy (if irregular), when the traditionall “FPS” counter doesn’t move – TLDR

    Bit late on this one, just listening to the podcast and heard talk of the article. An absolutely fascinating issue, that has come up a few times but never in so much entertaining depth.

    One recommendation (out of the many you’ve gotten so far 🙂 ). Try not to get lost in the sea of numbers and stats (as you can see has already happened to some of the more statistically inclined people earlier on). Remember the stats are valid in so far as they can be linked back to people (gamers) perception/experience of the game. The first example in the article is the perfect example, how sub second variations in frame times can lead to a huge discrepancy from what the user experiences/perceives compared to what the numbers report(avgs fps). But if you can’t tie your data back to something tangible like that (a perception), then don’t be afraid to question it’s use/significance.

    This is actually something I”d pondered before (while certainly not in such awesome detail). When we traditionally report a frame-rate, there is the implicit assumption that this rate is constant for the entire second measured. As the article observes, this is often not the case, and can cause some serious discrepancy from what is experienced. Really, at it’s core, the issue is that of the “sub-second” framerate. Variations in the framerate, past the boundry of 1 second. This data comes to us in the form of frame times (and really that is what is important, as that should be what the end user/display “sees”). However this data really isn’t in an accessible form to really quantify/absorb, hence the whole subject the articles tackles of how best to present this info.

    What I’d always thought was that there is no need to actually abandon framerate. While FPS technically can’t penetrate the 1 second barrier that it’s name itself applies, the idea of what it represents still can. Basically if our problem is “How do we represent variations in the sub second frame rate?” why not still just use FPS? We have the frame times, and we know the rule that if a frame takes x milliseconds to render/display, then it’s framerate is simply 1000/x (assuming a constant frame rate over the entire second). So we can just use this relationship to convert back to “FPS” for each individual frame. At first glance it’s nonsensical, but if you think about it for a moment it holds up. The initial podcast examination, if we convert the frametimes, the 50ms that one long frame took, becomes 20FPS. So during that 1 second the sub-second-frame-rate(?) dropped to 20FPS for that one frame.

    With this in mind we can actually end up with “sub second” FPS measures (even min, max, and avg) for every second benchmarked.

    Now, considering that it’s variation that matters, we should actually be focusing on that. “How much did the individual frames differ from the value reported over the second?” We can then average that value out over the entire second. You can then have it’s own min-difference, avg-difference, and max-difference over the entire second. These values can then be averaged out over the entire run. Ignoring the “average of an average” part, you can collect these metrics and present them in interesting ways. (For example the “avg Max difference” can become an interesting metric, swayed both by how big the stutters are and how often they occur). Could even devise an avg-difference framerate display much like the current fraps, that shows how much variation between the traditional FPS and metric calculated.

    Anything that directly affects the end users experience and/or perception of the game, should hopefully be able to be captured and presented in a benchmark review. We have the tools to penetrate the “1 second barrier”, so all we need now are good ways of presenting/understanding that info. Keep up the excellent work!

    • novacatz
    • 8 years ago

    Considering that big issue is variability in the frame times – another metric you want to consider on top of the 99th percentile is an average of the times over the 99th percentile.

    Eg when using the 20ms threshold – even if the number of frames is large but the average over is just 30ms it might be ok but when the number of frames is low but the average over is 300ms that would be very noticeable

    • rootheday3
    • 8 years ago

    I would expect that the divergence between the FPS metric and the subjective experience due to micro stuttering would be most felt when the FPS of the 2 gpus was low to start with. For example, suppose you have a GPU that runs some game at 24fps at particular settings by itself – not smoothly playable. You pair it with an identical GPU – good scaling (70%+) would result in a Fraps FPS of ~41. But the subjective experience might be almost the same as the solo gpu – it might even feel worse.

    I see AMD touting that Llano can run in “dual graphics” mode with low-mid range discrete cards – exactly where we’d expect this scenario to arise. I’d be very interested to see to what extent micro stuttering is present in Llano+ discrete and what impact it has on the user experience. In other words, beyond the fact that “dual graphics” only works on DX10 and 11, does it actually help playability for the titles where it works? or is it a benchmarking ploy that actually makes little positive (or possibly even negative) user experience impact.

    • APWNH
    • 8 years ago

    I will echo many other commenters by mentioning that this article is yet another in a continuing tradition of excellence here at TR. Really, I’ve been around here for just under ten years, on and off, and I’m damn glad to say that the quality of the site hasn’t dropped a bit.

    I will mention that if you DO decide to use high-speed cameras (and I suspect a nice CRT capable of 200hz refresh may come in handy too) to do further investigation, it could only make the articles [b<]even better[/b<]. As an aspiring game programmer I applaud this shift in testing methodology towards the ms/f metric. It is much more useful for reasons that have already been gone over countless times. Yes, 1000fps+ cameras are still soul-crushingly expensive. But there was a point-and-shoot model that cost barely over $200 that I was thinking about getting, and this was several months ago at this point. I'm fairly sure it was capable of 400fps (or was it just 200fps?) high-speed at a pitiful resolution. It would still be enough to discern individual frames on any 60hz screen though!!!

    • Эльбрус
    • 8 years ago

    Hi,

    great article thanks!

    I now wonder, if that testing method would be appropriate for CPU – game tests, too.

    Some die-hard AMD fans always claim that their CPUs are running games “smoother” because of the bigger L2 and no hyperthreading, or whatever. Even if intel’s score (much) better fps.

    Cannot hear that anymore, thus I am looking for method / tests to reject it.

    Thanks

    Elbrus

      • kn00tcn
      • 8 years ago

      i cant vouch for the amd cpu stuff, but it’s fairly common for people & even some game devs to say & show that HT is causing either lower fps or worse stutters compared to no HT

      calculations that just have to be as fast as possible without any consistency like video encoding are perfect for HT though

    • clone
    • 8 years ago

    if Tech Report switched to showing graphs similiar to HardOCP strictly in apples to apples form I’d be good with it, but like this report when too much information is given, way too much, it starts to get lost and ignored, I value TechReports thoroughness but would feel remiss if I didn’t mention a fear that the reviews may become too dry.

    this article was interesting and informative while easy to understand.

    the fear for me is that the reviews may require a link to this article just to be understood especially if the graph count skyrockets and is in multiple formats causing the articles to become bloated and dry.

    it’s just a fear I’m not saying it’s going to happen.

    I’ve been following video card reviews since “96 and the old FPS count was ok back then, when sites started putting minimum frames along with the max and average things got a lot better.

    micro stuttering …… I’ll avoid multi GPU configs from this point forward in order to err on the safe side but aside from that if it’s not egregious I’m not to worried about it.

    • Coruscant
    • 8 years ago

    In a nutshell, the article exposes the pitfall of using an arbitrary timeframe to define an average rate. An implicit assumption in doing so was that the processing time variation was small between frames. The article did a great job disabusing the public of that notion.

    Moving forward

    Given the following variables impact user perceived performance
    1. processing time per frame previously measured as a minimum acceptable refresh rate
    2. variations in processing time

    Does it make sense to perform some basic statistical analysis of per frame processing times? Simple histograms with descriptive statistics would provide an objective measure from which comparisons between cards and driver revisions could be made.

    • Tamale
    • 8 years ago

    Scott, amazing work. I applaud your inherent gift for getting to the bottom of things and then thinking long and hard about the implications for the end-user and how you need to present the data carefully to prevent distorting the truth.

    I agree with the comments that are suggesting to find a place to store the raw data and allow many more people to help slice and dice the data in increasingly useful ways. Perhaps TR can kick-start a crowd-sourced data collection and analysis effort where anyone can submit their data in a normalized format along with their machine specs to help identify trends across the readerbase.

    I’m also thankful at how much this actually helps my upcoming video card purchase.. I think I’ll stay away from the tempting multi-GPU offerings for now and spring for a nicer single-board offering 🙂

    • bTx
    • 8 years ago

    it seems as though a rework of cfx and sli may be in order…….remove the added latency by some type of “Y” split active display (or any digital video interface) so both cards could output directly to a monitor???

    • NeronetFi
    • 8 years ago

    Scott, excellent job on this article. Bravo to you sir 🙂

    • Dr. Zhivago
    • 8 years ago

    I too, want to congratulate you Scott on the amazing article. TR has been my favorite site to get tech news and product information for a long time. As long as you’ve been around I believe, 10 years? Anyway, great job as usual. It’s articles like this that keep me recommending TR to people who ask me for the best site to visit to learn.read about computers and related things.

    Peace.

    • willyolio
    • 8 years ago

    maybe we can add a “stutter” rating to video card reviews. I it shouldn’t be too much work in a spreadsheet.

    1. get an FPS-over-time recording like you guys already do for some games.
    2. Calculate average FPS.
    3. calculate RMS. Invert the value. The higher the value, the greater stuttering you’ll see.

    Inverting the value, i think, just gives more weight to low FPS scores. People will notice a 30FPS drop but not a 60 FPS gain if you’re averaging 60FPS already. So inverting would give low FPS more weight, which is really what we really want to avoid with our high-end cards, don’t we?

    • XTF
    • 8 years ago

    Vsync is related to this issue. With vsync enabled and a 60 hz display, the time between frames being displayed is a multiple of 1/60 s, so 16.7 (60 fps), 33.3 ms (30 fps), 50 ms (20 fps) or more. If each frame takes 17 ms to render, frame updates will be irregular if triple buffering is used. So, is tripple buffering good or bad?
    The other important question is: why do LCD displays still have fixed refresh rates? Why can’t the frame be sent to the display as soon as it’s ready? It’d make tripple buffering unnecessary, it’d reduce latency and allow for more regular display updates.

      • kn00tcn
      • 8 years ago

      er, images ARE sent to the display as soon as they’re ready, how else does tearing exist? a piece of 1 image, a piece of another (or even more if the fps is > 2x faster than refresh)

    • FireGryphon
    • 8 years ago

    Excellent article. In fact, a gem. A masterpiece. The idea is kind of like what Hard|OCP did years ago with ‘playability’ testing to test for fluidity. This is much more scientific, though, and was a pleasure to read. It’s articles like this that keep me coming back for more TR.

    • kilkennycat
    • 8 years ago

    Scott,

    I noticed something odd in your analysis with regard to the GTX560 Ti and Bulletstorm In the Multi-GPU and Bulletstorm session (Page 7) , You had set up the display to 2560×1600 and mentioned the following:-

    [i<]The worst offender is the GeForce GTX 560 Ti SLI config, which performs poorly throughout. Our guess is that these 1GB cards are running low on total memory, which is why they struggle so mightily. Yes, the 6870 cards also have 1GB, but AMD has long been better at avoiding memory size constraints.[/i<] and later: [i<] The GeForce GTX 560 Ti SLI setup is a mess, with over 500 frames north of 50 ms; we've excluded it from the chart above so it doesn't throw off the scale. Beyond that, the multi-GPU solutions do a reasonably decent job of avoiding the longest frame times, generally speaking. The obvious exception is the Radeon HD 6870 CrossFireX config, which may be feeling a bit of memory size pressure, though not as acutely as the GTX 560 Ti SLI pair. [/i<] I would VERY respectfully suggest that you repeat all of the Multi-GPU Bulletstorm tests with ALL of the card combos at 1920x1080. You may get a big surprise when you come to analyse the results. Lowering the picture resolution to a level known to be well within the memory limit of ALL of the individual graphics cards in the sample set is vital to nullifying the effects of graphics memory shortage as a spurious contributor to graphics stutter. The effects of picture-resolution/graphics-memory size on stutter should certainly be the subject of a related but separate well-controlled analysis.

      • Chrispy_
      • 8 years ago

      What would that do, other than waste Scott’s time? We all know what the results would look like.

      If I was dumping around $500 on a pair of 560 Ti cards, I wouldn’t be wanting to limited to the decidedly mainstream 1080p resolution. Especially not when so many other similarly priced options do so much better at the 2560×1600 resolution.

      The point is that the 560 Ti SLI setup is a mess. We *KNOW* how it performs at lower resolutions. This article illustrates the problems you see when going to the limits with an SLI setup compared to a single card.

        • Bensam123
        • 8 years ago

        It would reduce the overall resolutions that the cards have to run and make smoother FPS since the stutters would be less pronounced. Similar to running all the tests at 800×600.

        • kn00tcn
        • 8 years ago

        why would you clutter up the results by adding the variable of ‘some games use > 1gb at 2560×1600’

        this is investigating natural stutter, not induced stutter by running out of vram (which a single card will hit as well)

    • MrJP
    • 8 years ago

    I’ve been running 4850 CF for three years now, and I haven’t noticed any massively signifiant issues with micro-stuttering or inconsistent framerates in general. However I strongly suspect that different people may have different sensitivity to this, and I’m sure it’s also affected by the other hardware you’re using. I’m using a monitor with an MVA panel which isn’t particularly fast, and this in itself has never particularly bothered me even in FPS games. I’m sure I’ll be paying more attention looking for issues from now on though, which is the only downside of a really great article!

    • MikkleThePickle
    • 8 years ago

    Excellent work as always, Scott. Thanks for the hard work.

    • JoeKiller
    • 8 years ago

    I have a GTX 295 and have stutter issues with TF2 all the time. I’ve learned to deal with it but I’m about to try turing off SLI on the 295 to see if it helps.

    Running a 75hz display, getting a 120hz LG W2363D. Beware of the LG though, I bought it new, left side was busted, got replacement, right side busted. LG is going to try to repair this brand new display or replace. Blah. But its supposed to have the least input lag with its thru mode of anything I’ve seen.

    I support the high speed camera evaluation but what if you have to test the camera for its jitter as well, oh noooo!

    • ThorAxe
    • 8 years ago

    This may well be the most ground breaking article that I have ever read at t The Tech Report. It’s has particular significance to me since I have been using multi-gpu setups since the 8800GTX series.

    • Pantsu
    • 8 years ago

    I did a short youtube vid testing FC2 stuttering, if anyone wants to see what it should look like:
    [url<]http://www.youtube.com/watch?v=zb3MsENJ-fU[/url<] I know YouTube re-encodes the video to slower framerate, but it is still noticeable, at least for me. See if you can spot it!

      • odizzido
      • 8 years ago

      That ran like ass. The video on the left was better, but still pretty terrible.

      • JoeKiller
      • 8 years ago

      This is exactly what my GTX295 looks like in TF2, it is awful. Looks like GTX 570 all the way eh?

    • vexe
    • 8 years ago

    Registered to say – best article I’ve ever read here or any other enthusiast-consumer tech site…

    Pure data pron…absolutely graphtastic.

    Really appreciate the work you put into this.

    Bravo!

    • netkcid
    • 8 years ago

    The issue here isn’t with the hardware or the drivers, it is the games not being friendly to SLI or CFX. There’s a reason both companies have ‘profiles’, they are fixing stuff the games are doing.

    • PenGun
    • 8 years ago

    I will point out most mobos drop to 8 lanes of PCIe when two cards are SLI’d. I bought my GTX 460s because they just fit inside 8 lanes worth of band with. I imagine some of these faster cards are tripping over their band width.

    • Lianna
    • 8 years ago

    Great article – I’m happy someone tried to measure and analyze this problem. And this is just after the great article on tesselation and lack of common sense in implementing it (to say nothing about ill will, because it can’t be proven).

    To be frank, I thought “low” you posted in previous reviews was something like median from the slowest percentile of frametimes, just inverted to produce fps, so I thought you were measuring this this already; alas. In most reviews here and on other sites, I have always thought that “low” or similar metric was much more useful as a playability metric than the average.
    First, [b<]please use the metric[/b<]. It would be nice if it was used in coherent manner, so in fps diagram you could chart with stacking colors on 99 (dark green), 95(green), 75(yellow) and 50(orange) percentile (just like you used average and "low", whatever it was, in previous reviews). Second, full frametime data charts may be used instead of current, per second average, charts. You may just invert the data to show fps instead of frametime. This would enhance already served data. Third, please show average number of deeps over 50ms per, say, 10seconds. If you count it per whole run, with different number of frames (and time?) in each card's run, how do we get how often the lag manifests itself, is it once a minute or once per 3 seconds? I'm sorry in advance for the long post. Even with proper conclusions in this article, I guess you have some of your facts wrong. I did program drawing in DX a few years ago (and read quite a bit about game programming on the way) so while [i<]you may want to check on this[/i<], the process looks as follows: 1) You get your current game time, thus checking how much time passed since last game time. 2) You prepare your game data according to game time from (1), so if e.g. unit has to move 100.0% of screen width in 1 second, and 20ms passed since last game frame, you move it 2.0% (if 50ms passed, you would move it 5.0%). This way game time moves properly, unlike old games, even in late 90s, that tied some relative speeds to framerate, so if your pc was faster, things moved faster. This is also a reason why some games have built-in framerate caps - there's a risk that updating much more than, say, 60fps (e.g. 200fps) would make simulation unstable - e.g. making different things move a lot faster/slower than they should due to calculation of these effects; so they play it safe. Other way around, if framerate falls below, say 10fps, you may want to calculate two or more steps of simulation during one game frame to properly calculate effects. Some games calculate set number of simulation steps per second (for consistency) - so if framerate is higher, they just skip calculating for the next frame (or interpolate calculated results), if framerate is lower, they do enough steps in one frame to be on time. 3) When you have simulation done, you draw objects in DX back buffer and say "Flip", and this is where fraps takes its timestamp. 4) If "vsync" is on, when the screen is to be redrawn, most recent ready screen is being output. When "vsync" is off, most recent ready screen is switched to view, even in the middle of screen refresh. Now the funny part is that the timing heavily depends on buffering (double/triple, was there quadruple in Xfire?) and driver threading, because when you say "Flip", you are blocked from drawing again until drivers tell you that it's OK. If there is double buffer selected, you have to wait until last frame was actually drawn to the backbuffer AND until currently displayed buffer finish being shown. If triple buffering is selected, you wait just for available back buffer to show up, which may mean waiting for current drawing to back buffer to actually finish (on a single GPU) or for any of the two GPUs' buffers being free in Xfire/SLI. Anyway, whether you do game time simulation in separate thread(s) or not, you cannot send the "Flip" command faster than drivers allow you to, so your [b<]fraps timestamp captures not the time of buffer being shown, but[/b<], depending on buffering, [b<]timestamp of when previous[/b<] or one before previous (triple with xfire/sli) [b<]frame has finished drawing, so it is useful measure anyway[/b<], just one or two frames late. With vsync off this is the time the new picture is being displayed, unless there is the "frame metering". You may ask nVidia if frame metering is used just for delaying showing the buffer, or delaying return from Flip, too (in the latter case, it would be included in fraps results). With vsync on, 0-16.6ms passes before next screen redraw, when the new picture appears. With vcyns on, triple buffering AND a fast enough card, statistically less time passes between fresh picture and screen redraw. But the other effect is that alternating, say, 10ms and 40ms (average fps) frame times may mean that depending on buffering and game threading you may see 10ms game time pass after 10ms display time and 40ms after 40ms which may look quite ok (objects show where they should after that time) OR you may see 10ms game time pass after 40ms and 40ms after 10ms which would look really awful, object moving fast after short time and moving slow after long time. This effect may be unimportant when dealing with frametimes in the order of less than 30ms. For all of you experiencing unpleasant stuttering in games, please check whether toggling triple buffering on or off changes the effect.

    • truser1
    • 8 years ago

    I’ve seen this before
    [url<]http://forum.avsim.net/topic/329356-i5-2500k-vs-ram-frequency-and-latency/page__view__findpost__p__1942411[/url<] [url<]http://forum.avsim.net/topic/329356-i5-2500k-vs-ram-frequency-and-latency/page__view__findpost__p__1942624[/url<]

    • Bensam123
    • 8 years ago

    average FPS / variance = profit?

    Brought to you by the TL:DR Bensam123 visibility committee for a better tomorrow.

    • alpha754293
    • 8 years ago

    And people are only finding about this NOW??? nVidia has LONG been known to drop frames that take “too long to render” wayyy back in the day.

    In fact, you used to be able to get a DVI/DE-15 dongle that actually counted the actual frames being sent to screen.

    And this is why testing games, especially OpenGL games with cards like former 3DLabs’ Realizm series can reveal a LOT of problems with the games themselves. And people thought that I was crazy when I was doing PRECISELY that back in like 2006 or whenever it was that I got both my Realizm 200 and 800. And those cards also support DX9 too, and even then games SUCKED on those. Some people (simple people) say that it is because the cards aren’t made for it. I contend (and still do) that it’s cuz the games weren’t coded properly to begin with. And the Realizm draws EVERYTHING EXACTLY as it is written. Unlike ATi/AMD/nVidia, the cards don’t compensate for coding errors/stupidity. Even non-games like Second Life popped up with issues. (Brown hair became an awesome shade of purple.)

    And considering that those cards were designed to handle massive engineering and scientific datasets, it ought to have more than enough power to run the games as-is.

    *sigh*…so yea. This doesn’t surprise me at all. I’m sure that you should be able to find my fps rant from 2006-2007 or so.

    • Sencapri
    • 8 years ago

    I want more articles!!! I can’t get enough! this review and the EPIC Intel i7 990x review! I love this site. 😀

    • odizzido
    • 8 years ago

    I don’t like multi-gpu setups because of the inconsistent frame rates. If I had the option to add another 5850 to my system for free, I wouldn’t take it.

    What I like to do it turn on v-sync and make sure my frame rate never drops below 60. I find that is the only way to get a smooth experience. For me the best way to measure performance is to show average FPS as well as show how much of the time the game is running below 60FPS(frames that take longer than 16.7ms)

    So it would say something like 90FPS average with 15% of those frames below 60FPS. Of course how much longer than 16.7ms the frame took is important as well so perhaps an average, or something better, of those 15% above 16.7ms would be handy to have as well.

    • anotherengineer
    • 8 years ago

    Great article. However the person that asked the question should probably get the real thanks.

    “I was having dinner with [u<][b<]Ramsom Koay[/b<][/u<], the PR rep from Thermaltake. He's an inquisitive guy, and he wanted to know the answer to what seemed like a simple question: why does anyone need a faster video card, so long as a relatively cheap one will produce 30 frames per second? And what's the deal with more FPS, anyway? Who needs it?" Edit - Scott maybe it's time to get a Samsung 2233rz 120hz panel for game testing, and run it at 120Hz in 2D and forget the glasses and 3D crap. [url<]http://www.tftcentral.co.uk/reviews/samsung_2233rz.htm[/url<]

    • user99
    • 8 years ago

    We need to focus on more than just FPS. This applies to both First Person Shooters as well as Frames Per Second

    • basket687
    • 8 years ago

    Simply one the best articles I have ever read.

    • goshreport
    • 8 years ago

    First off – great start! We need to focus on more than just FPS.

    But as other posters have already mentioned, instead of focusing on arbitrary cut offs like 20ms or 50ms, or even 99th percentile, the data would be much better analyzed with measures of variance such as Range, Mean Deviation, Variance, and especially **Standard Deviation**. These metrics are perfect for highlighting how much individual data points differ from the mean – and would easily bring the spikes in the data (jitters) to the forefront.

    • stan4
    • 8 years ago

    Awesome article, this is how real engineers analyze performance.

      • Captain Ned
      • 8 years ago

      Including Scott pointing out the to world the areas where his own analyses may be built on faulty premises.

      This article is falsifiable, which means it is science. There’s a bunch more work to be done, especially in exposing and extracting the true timings from the graphics subsystems, but this is the paper that will launch a complete re-think of GFX performance. It was and will be nice to be in on it from Day 1.

      If there is a peer-reviewed journal dealing with computer graphics, this article should be reformatted as an academic paper (you still remember how to do those, Scott?) and submitted.

    • End User
    • 8 years ago

    I’ll have to try a local only game. I play TF2 every day and I doubt I can pick out stutter over network lag.

    • teeks99
    • 8 years ago

    I’d really like to see a stat as to how many milliseconds over the 60 second test are spent more than 50ms after frame start.

    For example, with the following frames:
    20, 30, 40, 50, 60, 70, 60, 50, 40, 30, 20
    The cumulative amount over 50ms would be 40ms (60-50 + 70-50 + 60-50).

    This way, you have some indication of how long you are staring at stale data on the screen.

    • DPete27
    • 8 years ago

    First off I would like to congradulate Scott on putting some thought into his work. I agree with many of the other posters that many websites abuse/put too much emphasis on average frame rates. The kind of mentality that “Everyone else does it this way so I will too.” I don’t care if my graphics card can output 100 FPS average, if it hiccups or jitters regularly its a dealbreaker. The problem with that is, you’ve already purchased said card by reading average FPS reviews. Frustrating.

    With that said, keep in mind the end game that the readers reading the graphics card reviews are trying to figure out which products to buy over others. While the 99% method is admittedly better than using average FPS, as mentioned at the end of the article, FRAPS does not represent the display result to the end user which as Petersen stated is modified after passing through the point at which FRAPS takes its readings. I would imagine that the jitter variance discussed in the article would be lessened to some degree or another if the actual output to the monitor were recorded.

    Lastly, (and this is more up to the reader to remember) system performance and bottlenecks can have a significant impact on FPS results as well. The machines used for graphics card benchmarking on many websites including TR purposely attempt to eliminate any system bottlenecks except for the graphics card (ie top end processor, ram, SSD, etc) but many consumer systems may not have these luxuries due to a little thing called “budget.” So then the question comes, if you have a graphics card that makes up a third or more of your system cost (for example) could the rest of the system be introducing jitter or slide show-ing to your otherwise silky smooth graphics card?…I would say yes.

    • helix
    • 8 years ago

    It’s the outliers that are the nasty ones!

    The 99th percentile frame time seems good for measuring the fluidity between the (macro) stutter. But the macro stutters (the spikes in frame time) are the stuff we don’t want to see in games. With 60 frames per second there could be a macro stutter every two seconds (>100 frames) and it would not show up on the 99th persentile.

    Counting spikes as done in the article is important. But I think it could be good to use a proportionate cutoff level instead of a static 20ms or 50ms.
    1. Calculate the 99th percentile frame time. = baseline
    2. Count number of frames with frametimes greater than 1.3 x baseline. = spike count.

    Here the number 1.3 is an arbitrary guess at what would be a noticable difference in frame rate.

    The 99th percentile is a good measure of the cards graphics power.
    To evaluate how well the card handles the workload the spikes count is just as important.

      • Firestarter
      • 8 years ago

      Maybe the average or median of the top 1st percentile of frame times is a good compromise? As in, a card that has a lot of spikes in the frame time will show a pretty high number in that metric, while a card that has no spikes or very little will score a pretty low (good) number.

      Also, maybe it’s better to not consider the 99th and 1st percentile by frame [i<]count[/i<], but rather by [b<]total[/b<] frame time. Consider for example a benchmark that will run for exactly 60 seconds. That benchmark will produce frames with frame times that should add up to ~60.000ms. If you sort the frames by frame time and take the top X frames that add up to at least 600ms of total frame time, one big outlier and a few slow frames, or a bunch of really weird 100ms stutters would show up pretty nicely I guess. Ofcourse this method would degrade quickly with shorter benchmarks as the sample would become too small. Or alternatively, if you're getting all sciency up in here anyway, why not plot the frame time sample population anyway? Big variance would show up pretty nicely I guess, and you could even fit several graphics cards in one graph 😎

      • truser1
      • 8 years ago

      Exactly. The percentile needs to be based on the average for each card & test (like standard deviation is), not an arbitrary value like 50ms, to be really significant. Of course a lesser card will have more frames above 50ms than a top tier GPU, but that doesn’t tell us too much about spikes really

      • Bensam123
      • 8 years ago

      How about calculalating variance and standard deviation?

    • Rakhmaninov3
    • 8 years ago

    Scott is so smart! SMRT!

    • d0g_p00p
    • 8 years ago

    Page 4 of this article the link to the GTX 590 is broken. it links back to the same page you are reading.

    • marvelous
    • 8 years ago

    Good write up. Finally someone realizes how flawed the system is.

    • gregsw
    • 8 years ago

    What is really important is if the frames are displayed at the correct time. If frames take too long to render it can cause this but it depends on the depth of the pipeline. You can still get very smooth display regardles of how long each individual frame took to render if the pipeline is sufficiently deep. The thing you sacrifice is interactivity because it takes longer for a frame corresponding to user input to make it to the display. The things the graphics cards mentioned in the article are intended to try and even the frame display time, regardless of how long the individual frame took to render.

    I think you need to measure stutter and interactivity separately.

    Stutter is best measured by comparing when each frame actually hits the display and when it should have hit the display. The higher the variance the more perceptible the stutter will be.

    Interactivity is easy to judge subjectively but harder to measure in a game situation. You really want to measure the time from user input till the time the display reflects that input. You can do this in contrived systems (switch the color of the display between black and white when a key is pressed) but I don’t know how to do it in a game.

      • yokem55
      • 8 years ago

      Carmack has written (or was it in an interview? –don’t remember) about the effects of input latency in games. He pointed out how some game engines its horrible – on the order of 100ms or so. And then there is the problem of display latency – this is why some TV’s have a “Gaming Mode” which turns off a lot of the built in video processing to reduce display latency. All in all there is a heck of a lot involved in making a fluid gaming experience than just getting the video card running fast enough…

        • gregsw
        • 8 years ago

        Absolutely there is more too it than a graphics card. However that is what the article is talking about measuring. If the input latency varies when running the game with different graphics cards I think you can attribute it to the card and or drivers.

    • name
    • 8 years ago

    Bah, where’s the all-in-one-page option? Anyhow.

    The sneaky thing with averages comes down to widespread using them wrong. That’s right, just about everyone uses averages wrong. If you’ve had basic statistics in highschool (and who hasn’t?) then you’ll also have been told you about that strange little number called the standard deviation.

    That number should be given too whenever averages pop up, but most of the time it’s not even calculated from the data. It should be given because it tells you something about how much the outliers lie outside the average. That is if there’s many big large spikes in otherwise commendably low frame times, the standard deviation will go up.

    Minimum and maximum are also useful of course, especially here, where uniform frame rates are actually more important than how high they are as long as they’re over some basic number like 25. If they’re wide apart, that’s reason enough to start to dig in.

    • indeego
    • 8 years ago

    The stutter comes from spreading the article over 12 pages.

    I kid. Sorta.

    • LaChupacabra
    • 8 years ago

    Going to sound redundant now, but awesome article. Thanks.

    • gbcrush
    • 8 years ago

    Great article Scott!

    Just about everyone’s said it, but I’ll add my 2 cents behind it because this article deserves to have the weight of its effect felt.

    Not only was “PC Hardware Explored” deeply and thoroughly here, it was presented so that those of us who have based our understanding on FPS scores for years can comprehend the change in thinking this article suggests. And that’s really the kicker isn’t it. This article carries its weight because it argues we’ll have to adjust our thinking a bit.

    One more reason for me to like TR. I’m looking forward to seeing where this goes.

    • Dposcorp
    • 8 years ago

    Wow, just wow. What a great article. Just awesome, and just one more reason why TR is the cream of the crop for PC / Tech nerds/ Like it says under the Main Site logo, PC HARDWARE EXPLORED. Indeed.

    • dashbarron
    • 8 years ago

    [quote<]...but some folks probably would have simply looked at the numbers and flipped to the next game without paying attention to our commentary. (Ahem.) [/quote<] Who, meeeeeeeee? Great article Scott, so this was the super secret big project you were working on; I hope you're starting a technology revolution. Maybe all that religious training is helping you to make prophecies--Scott the Prophet?

    • TurtlePerson2
    • 8 years ago

    You have really outdone yourself this time Scott. It’s good to see TR pioneering new performance metrics in benchmarking hardware. Micro-stutter is something that I was not even aware of, but it’s surprising to see how much can occur even on top-of-the-line cards.

    Congratulations on being Slashdotted.

    • Kharnellius
    • 8 years ago

    I haven’t read a full article in a while. This was a great read! Thanks for writing this up. 🙂

    • YellaChicken
    • 8 years ago

    To mirror what a lot of people have been saying Scott, that’s a really great article.

    It’s got me thinking too…sorry if anyone’s already mentioned this in the comments, I haven’t read through all of them but I recall a long time ago reading about how sometimes overclocking can adversely affect minimum frame rates in some situations and while reading this article today and seeing your methods I figured this could be a great way to see if there’s any obvious negative effect on visual experience by overclocking or any increases in microstutter.

      • kamikaziechameleon
      • 8 years ago

      Part 2 will be micro stutter and visual quality degredation by OCing. I would start simply by testing some more questionable factory overclocked cards, I’ve personally had issue with this and I’m sure that sampling a couple different venders (start with PNY) will yield you a similar result.

      • Bensam123
      • 8 years ago

      Interesting… I’ve always unoverclocked my hardware as soon as I get it because it sometimes makes it feel really weird. I had nothing to back it up and it seemed completely subjective, so I never thought anything more of it then a preference.

    • SuperSpy
    • 8 years ago

    This is the reason I read my reviews here. Excellent article, it opened my eyes to a new understanding of graphical performance, and I hope it becomes an industry standard measurement.

    I am also concerned with FRAPS’ measurement being somewhat skewed or limited, and will be very interested in seeing ‘real’ timing information at the display.

    • Chrispy_
    • 8 years ago

    Excellent article. This reinforces my experience of SLI not working as well as the average FPS benchmarks indicate.

    I bought a pair of GTX460’s to go in my main box when my 8800(512) died. They were cheap and an SLI solution seemed better on paper than the noisy, hot, and power-hungry 470 and 480 at the time.

    In practice, the SLI experience increased average framerate but didn’t really help with minimum framerate much. The areas where a single card struggled were also areas where the pair struggled. However, on a single card, the normal framerate of 50fps would drop to 30fps when drawing messy scenes. With SLI, the normal framerate of 85fps would drop to 35fps.

    Now, even though I was vaguely aware of a less ‘even’ framerate in terms of micro-stutter, it never seemed to bother me very much (I play with vsync off by habit, so I’m used to a little stuttering anyway). What did bother me was the night-and-day difference between high and low scenes with SLI. Even though a minimum framerate is higher on the SLI config, it is the difference between buttery fluidity and choppiness. A single card sets a lower normal result for your eyes to accept, so when the framerate dips to 30fps, it’s not as noticeable a change.

    I would describe this effect as analagous to how we feel temperature. If you move your hand from warm water to hotter water, the body barely notices, yet if you move your hand from cold water to the same hotter water, it feels like you’re being scalded. The human brain detects changes better than it can measure absolutes. SLI introduces bigger variations in framerate, which makes motion less convincing because your brain has to “fill in more gaps” itself to maintain the illusion.

    • Rikki-Tikki-Tavi
    • 8 years ago

    Sublime article! Now what’s left is to intercept the video output of the cards and use an Arduino with a video shield to determine the actual frame times.

    • demalion
    • 8 years ago

    This reminds me of a [url=http://forum.beyond3d.com/showthread.php?t=2231<]discussion[/url<] from some years ago on Beyond3D. Along with some solutions and thoughts on the basic issue, there was also some info provided by Dan Vogel on how the UT engine offered benchmarking figures in a way that facilitated looking at data in this way, which might be pertinent in current versions of the Unreal Engine and games that use it, as well. I haven't kept track of whether this feature was maintained or might have evolved to give more useful information on investigating this kind of thing...?

    • xtremevarun
    • 8 years ago

    You just beamed us Techreport readers up to a whole new idea of evaluating GPUs, Scotty! Great read BTW.

    • Peldor
    • 8 years ago

    First, great article.

    [quote<]Because the metering delay is presumably inserted between T_render and T_display, Fraps would miss it entirely. That means all of our SLI data on the preceding pages might not track with how frames are presented to the user.[/quote<] Second, I'm not that surprised and was going to comment on checking your tools carefully even before I got this far in the article. When you start to slice things so finely you have to be very sure your tool is not introducing it's own artifacts. Is the steak serrated or the knife? Last, I think you should consider converting your frame time charts back to FPS because large differences get lost in the bottom of the graph. Take the 4th graph on page 8. Visually the difference between the first four cards looks quite small, but it's on the order of 20%. The difference between 16ms and 20ms will always appear small when your chart is scaled to 80 ms or more.

      • willmore
      • 8 years ago

      I’d like to see the frame times displayed as a normalized histogram. Label it with quintiles or decades and break down the upper decade or quintile a bit more so we can pick out the 90, 99, etc. percentiles easily. That’s trivial to script and would show a lot of data in a meaningful way.

      Like you, I was surprised at what FRAPS actually measured. Having been watching the 3D development on Linux for a long time, I was aware that a lot of drivers had callbacks for various render events so that you can precisely time when rendering and display tasks actually occur. I’m sort of surprised to find out that Windows drivers lack that ability. It would be nice to see nVidia and AMD quickly work to add that and for FRAPS (or some other tool) to report it.

      Plus, great article, Scott. Great detective work there!

        • SuperSpy
        • 8 years ago

        The only thing that worries me about using driver hooks is the inevitable messing with the location that hook is placed by AMD/nVidia in order to try and game the results ala the old quack3/3dmurk tricks. It would be nice for that hook to exist somehow in DirectX, so it’s controlled by at least a semi-neutral party.

          • willmore
          • 8 years ago

          If it’s lying, that would be pretty easy to verify with physical measurements, unless–like you suggest–that they do it based on executable name or other evil tricks. Then again, we know they do stuff like that and we will hold their feet to the fire. 🙂

      • jensend
      • 8 years ago

      I disagree about converting to FPS. It may be a large difference in the frame rate, but it’s important to see that it’s a small difference in frame times- and thus in perceptual quality and reaction time, and reaction time etc are what we really care about. The two measurements are inverses, and I think ms per frame is a much less misleading measure than its inverse.

      Another example where the same thing happens is fuel consumption: we keep talking about miles per gallon, but what we primarily care about is the fuel consumed in our driving, not the driving we can do on a given amount of fuel, so this is misleading. To use wikipedia’s example, people would be surprised to realize that the move from 15mpg to 19mpg (saving 1.4 gallons per 100 miles) has a much bigger environmental and economic impact than the move from 34mpg to 44mpg (saving 2/3 of a gallon per 100 miles).

      Similarly, moving from 24 fps to 32 fps has a bigger impact on the illusion of motion, fluidity, and response times than moving from 40 fps to 60 fps. I think everyone should have been using ms per frame all along.

    • Bensam123
    • 8 years ago

    I don’t know if I discussed this with you or Geoff years ago. I emailed one of you discussing something along these lines as I never thought average FPS covered everything, which it doesn’t. What I got at the time was that certain websites were looking into it by testing with v-sync on and it was left at there.

    I understand completely what you’re doing, however, I don’t think you accomplished completely what you intended to do. Essentially what you did was flipped the FPS upside down, instead of graphing things based on frames per second, you graphed it based on how many seconds per frame. It’s the same thing.

    The only real difference was that you broke it down into smaller increments, which does show off what you’re talking about better as you can see the peaks. You changed the scale and that’s why there is a difference in graphs.

    I think a more accurate approach to accomplishing what you’re thinking of is variance; specifically variance over time. This can be graphed, grided as a number, plotted, or otherwise used to show how far things differ from the norm. Standard deviation could also be used (what you were looking for when talking about percentiles) to show how far cards fall outside of the normal frame rate for that specific card. Using the cards average FPS as a baseline, which would be different from card to card. The variance or standard deviation could then be divided by the average FPS to give the highest average frame rate, with the most fluid gaming experience… essentially the best pick.

    This is really no different then analyzing statistics for anything else, SPSS works wonders with this. The average a lot of the times does not show everything, especially when you start getting into more specialized circumstances.

    For instance:
    [url<]https://techreport.com/articles.x/21516/6[/url<] The bottom graph, which looks ridiculously confusing, could be summed up into one number for each one of the cards based on variance and taken further, variance per average fps. Curiously, you started talking about variance at the end, yet besides some general observations, you didn't make any sort of tables or analysis based on it...

      • Bensam123
      • 8 years ago

      “More intriguing is another possibility Nalasco mentioned: a “smarter” version of vsync that presumably controls frame flips with an eye toward ensuring a user perception of fluid motion. We think that approach has potential, but Nalasco was talking only of a future prospect, not a currently implemented technology. He admitted AMD can’t say it has “a water-tight solution yet.””

      If it’s a buffer, I don’t think that will improve this issue. Adding another layer of latency on top of the mix I don’t think will cure anything, it’ll just help hide the problem. Coincidentally further down you mentioned something like this was already in place in a different manner… are they actually one in the same?

      More interesting, I didn’t know you guys tested games with tripple buffering on and if I had I would’ve commented on it probably not giving the most accurate numbers as it introduces a layer of latency to help smooth things out.

      “So long as the lag isn’t too great, metering frame output in this fashion has the potential to alleviate perceived jitter. It’s not a perfect solution, though. With Fraps, we can measure the differences between presentation times, when frames are presented to the DirectX API.”

      That’d actually be a pretty interesting thing to add to the data. That is another source of latency besides fluidity. If that can be measured it would be a great thing to add, I don’t know why you didn’t add it to the results already if you could…

      “Nalasco did say AMD may be paying more attention to these issues going forward because of its focus on exotic multi-GPU configurations like the Dual Graphics feature attached to the Llano APU. Because such configs involve asymmetry between GPUs, they’re potentially even more prone to jitter issues than symmetrical CrossFireX or SLI solutions.”

      That’s why I’d never run a mult-GPU rig, especially one with a new fast card and a old slow one. They don’t just magically pool their resources together and make everything better, unless one is being used to offload a different task… like physics.

      In a slightly off tangent that relates to this. I had a conversation with one of my friends about large scale physics and how it eventually runs into a bottleneck, not in terms being able to process it, but rather in terms of how much latency is involved between completing the calculation and gathering all the data before it becomes completely irrelevant. Essentially when a object interacts with something else it starts a chain reaction, which is updated like a wave. Each wave of updates has to interact with updates that are already complete. You eventually start running into a issue where updates can’t keep up with each other and by the time they finally all update, the data is completely irrelevant.

      Looks like a good start either way, I look forward to seeing what you pursue on this.

      Concerning micro-stuttering, something else you may want to look into… I’ve been messing around with Hyper-Threading and from what I can tell, it causes it. Turning off hyper-threading makes a more fluid experience in game and just around the desktop. I don’t know if that’s a problem with windows or the processors, but it’s most definitely there. Unfortunately there is almost no way to measure it.

      • entropy13
      • 8 years ago

      What would become an issue in variance would be the arbitrariness of the baseline.

      For example, the variability of “democracy” rated in 5 levels, specifically in elections, of Asian “democratic republic” countries. 5 would be “most democratic” and 1 would be “least democratic.”

      Thus there are several controlled variables there.
      – Asian
      – officially “democratic”
      – officially a “republic”
      – elections regularly held
      – elections contested by candidates from two or more parties (which would exclude China, which satisfies the previous restrictions)

      You would end up, as much as possible, with as similar countries as you can get for your analysis

      Yet in terms of games you cannot establish such a standard. Could you only stick with games using the same engine (similar to the “Asian or not” limit earlier)? How about only comparing among the same type of games (similar to the “republic or not” limit earlier)? How about how efficient the game engine used is (similar to the ” elections regularly held or not” limit earlier)?

        • Bensam123
        • 8 years ago

        It sounds like you just picked a random example out of a textbook you read and pasted it here, which shows a pitfall of variance, but doesn’t apply to the topic in any way.

    • ApockofFork
    • 8 years ago

    NEERRRDDDDD!!!

    Like even among nerds you are clearly a nerd and er… Great, Excellent, Amazing Article.

    • Jigar
    • 8 years ago

    This is the only reason i am here since 7 years. Thank you.

    • Pantsu
    • 8 years ago

    Thank you Scott, this is probably the best tech article I’ve read in a very long time, on par with some of the architecture talks Anand does. I had to just register here now to comment on this, and I’ll be following TR more closely from now on.

    As someone said earlier, Tom’s Hardware also did an article on micro-stutter, though it was much more limited and superficial. It did though touch on the issue that Triple SLI/CrossFire might alleviate the problem. If you ever do a followup, add that to the tests.

    I personally own a 6950 CF setup with Eyefinity and a 120 Hz monitor so it’s paramount for me that the performance is fluid. I see micro-stutter easily in many of the games I’ve played, considering 6000×1080 with max settings tend to drop the FPS to near 30 in many games. That’s where the issue is easily noticable in games like Crysis or FC2. It’s not limited to those games though, almost every game stutter more or less. And this is with a 4,5 GHz 2500K.

    While it’s great to hear AMD/Nvidia fess up about it and say they’re doing something about it, it doesn’t help those dealing with the issue now. Vsync is a limited solution, I’ve tried it and it does help in some cases, a bit, like AMD’s rep stated. A better solution is a max framerate cap, either done with a console command or a 3rd party app like DXtory. Limiting the max framerate below the avarage will alleviate if not completely remove the stutter. This I’ve seen work in all of the games I’ve tested, and capping the framerate has provided a considerably smoother experience, even if the avarage FPS has decreased. The problem with this fix is that every time the FPS drops below the cap, stutter emerges, right when the FPS is at its lowest and you need the smoothness the most. In essence while my CF setup could do Crysis 30 fps@6000x1080p with Very High, It’s unplayable due to stutter, and if I cap the framerate I need to do it at 20 to avoid dips and stutter, which is too low even if it’s otherwise smoother.

    The above mentioned framerate cap is easy to test in many games, and you can easily see if putting a cap produces smoother gameplay for you. If it does, it means the game has a stutter problem.

    All in all this issue puts into question whether CF/SLI is a reasonable choice. I’d say it’s only for the most extreme enthusiasts who really need the performance that the fastest single GPU can’t offer, and those who want to dabble with their hardware. Budget CrossFire/SLI seems a really bad choice after reading the article.

    • redpriest
    • 8 years ago

    This is excellent data and a brilliant article – I only wish other websites were this thorough. My only question is how repeatable are the tests you’ve run? – I’m assuming you did a best of 3 run and picked one, or maybe just ran only one. You might have specified it in the article but I missed it if you did (feel free to flame me if I did 🙂 )

    I’m curious how variable the results are on these given workloads run to run. I’ve done studies much as you have done with fraps frame rate data in some games a couple years ago and saw lots of variability with actual gameplay runs which were hard to resolve, depending on the game. Some games were just off the chart variable and it was hard to make heads or tails out of them. I haven’t done any specific analysis using the games you’ve run so I’m just curious how these hold up 🙂

      • cygnus1
      • 8 years ago

      yup, you missed it. the shooter games were 5 runs, and SC2 was a single 33 minute run

    • neon
    • 8 years ago

    [quote<]I really don't like the results in the chart above, but I've included them in order to demonstrate a potential problem with our frame time count. Look at how the GeForce GTX 580 produces substantially more frames above 20 ms than the GTX 570. ... The GTX 580's count is higher because it's faster and therefore produces more total frames during our test period. [/quote<] If you divide [number of frames above 20 ms] for each GPU by the [total number of frames produced] by each GPU, then you should get the fraction of frames above 20 ms, which should be a better comparison.

      • wierdo
      • 8 years ago

      I guess you could then multiply by 1000 and turn it into a (how many)/sec measurement, which would look sexy.

      • Bensam123
      • 8 years ago

      I’m not sure why you would do that as it would skew the results and make them not comparable…?

    • PopcornMachine
    • 8 years ago

    Excellent article. I’ve thought for a while now that there must be a better way to measure graphics card performance than just using FPS. Minimum frame rates seem important, but which minimum do you use? The very lowest may be rare and not representative of the cards true performance.

    Using frame times gives a whole new perspective. It is making me think twice about multi gpu setups. But perhaps smaller jitters aren’t as bad?

    What it really seems to show is how each game effects the cards performance. The engine and programming involved can make a big difference.

    Really a lot to think about here.

    • Jason181
    • 8 years ago

    I have a 6970 crossfire setup and a 120 hz lcd (with a resolution of only 1680×1050). I really, really like high fps and when I got the crossfire, it felt GREAT when i configured the game to never drop below 120 fps (vsync off with triple buffer). But if the game won’t do 120 fps (metro 2033 is a good example), it feels almost worse at 75 fps than it would at 60 fps. I suspect it’s that very issue.

    The article showed how easily frame latency gets above 8.3 ms, and that’s why I didn’t want a higher resolution monitor. When framerates really are important, it feels 50% smoother at 120 fps than it does at 100 fps. I never knew why. Now I know thanks to your excellent analysis.

    I too feared that the new testing methodology would be to target a certain framerate and find the best settings to achieve that target framerate. That seems severely limiting because you’re assuming your audience has the same preferences you do, or even worse that they have the capability to run at the same settings, which many people don’t because 2560×1600 monitors are still not the norm (correct me if I’m wrong!). I’m very relieved and encouraged by your new avenue of measurement.

    Great job Scott; been reading since ye olden days.

    • StuG
    • 8 years ago

    I’d love to see you guys do a CPU version of this! Great article, though my initial worry when you started looking into this came in with SC2. It worries me that the game or application can have such a important role in measuring these numbers that it might be difficult to discern what is the game and what is the setup.

    Either way I have a HD5870 Crossfire and the only game I have experienced micro-studder issues where the Crossfire actually was working in the game was with Dead Island. Granted, this game is riddled with bugs and really needs an update so take that with a grain of salt. However, I use to get micro-studders when I had my CPU at 3.0Ghz compared to the now 3.8Ghz. Hence why I’d love the CPU article…but this once again brings me back to my original worry. Is this something that can be reliably replicated and is relevant to all other systems? Or will each system with each configuration produce such different results that its not relevant?

    • volnaiskra
    • 8 years ago

    Fantastic article.

    I have a single GPU, but I’ve always been sensitive to variance in framerates. Some games are virtually unplayable for me at 30fps, and I can easily tell when a game dips from 60fps to 55fps (and judging by the revelations in this article, I’m not surprised, since maybe what I’m noticing is not a simple decrease of 5 fps but a much more prominent spike in per-frame variance).

    For me, a more consistent framerate is more important than a high averaging one. For that reason, I actually forced Divinity 2: The Dragon Knight Saga (which provides a frame limiter option) to play at 45fps max, even though I was previously getting an FPS from about 50 to 60 – it just felt much smoother that way (presumably, limiting the FPS went some way to smoothing out some of the outliers and irregularities in the individual frame times). For the same reason, I’m reluctant to ever get a 120hz monitor, since I figure that my 60hz one helps to force things to be steady.

    I’ll admit that though the article was very well written, it was a slog to wade through the various graphs and their explanations. It seems to me that working out some sort of “jitter index” that takes into account the various issues involved, and displaying it along side FPS in reviews, would be the best option, going forward.

    I really hope you guys figure out an effective and simply way to document this in future reviews, and that it catches on in the rest of the PC gaming world.

    • hans
    • 8 years ago

    You may want to consider releasing the raw dataset, possibly on one of the open source data sites. There may be some additional ways to analyze this that aren’t conducive to an article format.

      • lilbuddhaman
      • 8 years ago

      From a “business” standpoint, the data he collected is very valuable in article writing material, and therefore valuable financially. Hopefully other sites link to this article and TR gets a little extra ad $$. It’s up to Scott and the rest of the TR to make that decision if/when to release the data.

      Fortunately, since this is an option available in FRAPS, anyone can do these same tests on their own rig, and potentially identify issues they could only “feel” previously.

        • Bensam123
        • 8 years ago

        Yeah, first thought was that would be really cool cause I could plug it into SPSS and upload some nice graphs and statistics here to show what I meant in my post, but sadly, what you’re saying is true. This does represent a significant amount of time investment, releasing it for free would basically be giving everyone in the same business all the ingredients to your secret sauce. Which will just lead to a bunch of websites that copy because they know what looks better, even if they have absolutely no idea how it works.

        BTW whoever voted lil down is a dumbass. Just because you want everything to be open and free doesn’t make it a awesome idea. People do have to make a living and when you invent a utopia machine we can all have everything for free.

    • wmgriffith
    • 8 years ago

    I’m curious about the Battlefield 2 frame times. It seems that the lag that happens at around frame 500 for the Radeon 6870 happens earlier for the slower cards (that, as you said, have less frame data) and later for the faster setups. If you did a cumulative distribution, so that you plotted frame time vs. total time, would you find that they all happen at the same time? Wouldn’t this mean it has more to do with the game than with the driver and/or hardware? It seems the GeForce SLI frame times might match up the same way. I suppose that Bulletstorm might also be that way but it’s hard for me to see. Finally, I’d be surprised if SC2 was NOT this way, given how many different and widely varying numbers of units can be on-screen at once.

    Also, jensend mentioned in a previous post that adding squares of differences from 20ms might be useful. I’m not too sure about that, but something that might be illuminating is the calculation of moments for frame times that are within, say, one or two hundred frames (or some amount of time, depending on what makes more sense). The 2nd moment is just variance (square of standard deviation), but the higher moments may also be interesting (I just learned from Wikipedia that the 3rd and 4th are called “skewness” and “kurtosis”).

    I don’t know if I’m making this needlessly complicated or furthering the conversation. In any case, I found your article very interesting. Thank you, Scott.

    • thefumigator
    • 8 years ago

    Article of the year. ‘nough said.

      • lilbuddhaman
      • 8 years ago

      always thought it was ’nuff

        • UberGerbil
        • 8 years ago

        American vs British dialects/orthography strikes again

      • FireGryphon
      • 8 years ago

      It’s a masterwork,

    • tviceman
    • 8 years ago

    Great article Scott! Do you think Nvidia’s frame metering is hurting SLI FPS numbers when compared to crossfire? Since the hd6xxx series of cards, AMD’s scaling has surpassed Nvidia’s and as soon as your article started talking about Nvidia’ frame metering, I immediately wondered if that was limiting the maximum scaling potential of SLI…

      • Damage
      • 8 years ago

      No, as I understand it, frame metering only slightly delays every other frame in order to smooth out frame delivery. The actual rate of frames delivered shouldn’t be capped. Metering should operate “invisibly” to the rest of the system.

        • tviceman
        • 8 years ago

        Thank you for the response!

    • UberGerbil
    • 8 years ago

    You may not have to spend “thousands” to get a (sufficiently) high-speed camera. [url=http://www.casio.com/products/Digital_Cameras/High-Speed/<]Casio has a line[/url<] of still cameras that offer super-fast video (around 420 fps at reasonable resolutions, though up to 1000fps is possible on some models at low resolutions). I haven't used them myself, but I've seen the results of the older [url=http://www.amazon.com/Casio-EX-FH100-10-1MP-Digital-Stabilization/dp/B0032ANBXI/ref=sr_1_2?ie=UTF8&qid=1315537534&sr=8-2<]FH-100[/url<] and it's pretty impressive. Searching through YouTube with the model names will give you plenty of examples.

      • prb123
      • 8 years ago

      It seems like a capture card would be better than a camera. This is not a recommendation…just 2 minutes of googling: [url<]http://www.blackmagic-design.com/products/intensity/[/url<]

        • UberGerbil
        • 8 years ago

        That could well be; the casio was just the first thing I thought of because I’d seen the results and knew it didn’t cost “thousands.”

        Though now that I’ve checked with the owner, it turns out it was actually the [url=http://www.amazon.com/Casio-EX-FH25-10-1MP-Digital-Stabilization/dp/B002YPFKZ4/ref=sr_1_1?ie=UTF8&qid=1315537534&sr=8-1<]EX-FH25[/url<] model, which is just a dollar shy of $1000. Which still isn't thousand[b<][u<]s[/u<][/b<], at least.

      • yokem55
      • 8 years ago

      Curious though, what video format does the the Casio put the frames into? If it is anything mpeg-4 related, getting a count on the number of actual frames displayed on the screen might be hard as mpeg-4 takes a lot of shortcuts on the motion it is compressing down which might confuse the frame counting on the display. A lossless video format or something like mjpeg would work much better at that kind of task….

    • Forge
    • 8 years ago

    This is the old TR, second coming.

    Mr. Wasson, please mark this down among your greatest accomplishments. You have quantified and measured something many people perceive, but are unable to describe. I can only hope that Nvidia and ATI are reading half as attentively as I am.

    I recently disabled the second GTX 260 in my aging SLI setup. I couldn’t articulate or properly understand why, but even though SLI gave higher FPS, it didn’t feel as smooth. With this article in hand, it makes perfect sense, and directly steers my upcoming GPU upgrade.

    Thank you, sir.

      • Forge
      • 8 years ago

      [i<]At the same time, we're very interested in getting reader feedback on this matter. If you have multi-GPU setup, have you run into micro-stuttering problems? If so, how often do you see it and how perceptible is it? Please let us know in the comments.[/i<] I ran into it, saw it nearly constantly, but never consciously. Just FWIW, since you asked. It just didn't 'feel' as smooth.

    • Bauxite
    • 8 years ago

    I always avoided multi-gpu, seemed more trouble than it was worth regardless of which team you want to be a fangirl for.

    This just makes it even more the case.

    Also, it seems that the best experience is forcing fps caps to the monitor refresh and using a card that would otherwise be considered overpowered for game/detail level.

    Buy the fastest single gpu you can afford and don’t count on adding a second card down the line, save up for the next big process/architecture leap.

    • anotherengineer
    • 8 years ago

    Nice to see some issues finally exposed.

    I may have seen this micro-stutter first hand (although lag may have been possible)

    I recently replaced my radeon 4850 with a radeon 6850 and fired up CS:Source to test it out, on an empty server (de_port) and 25ms ping. I noticed while running around the map a few occasional times where (lag) or micro stutter was noticeable. I do run a samy 2233RZ at 120Hz also.

    Strange though how I didn’t really notice it with 2 years with a 4850 and 2 minutes with a 6850 I did, like I said, unless it was something else.

    There is something to be said for consistency.

    • Ryu Connor
    • 8 years ago

    Amazing work, Scott.

    This deserves more exposure. Any suggestions on who we can pimp this too for maximum impact?

      • UberGerbil
      • 8 years ago

      Is slashdot still around?

        • Xylker
        • 8 years ago

        +1, Funny

    • blitzy
    • 8 years ago

    High fps is not much good if it’s choppy, nice to see an eloquent investigation into measuring smooth fps. Thats very good information to know when picking a gfx card.

    • Goty
    • 8 years ago

    Very cool article. I think the most interesting secondary result here might be the extreme improvement in CFX rendering behavior when going from AMD’s 6800 series to the 6900 series.

    • KoolAidMan
    • 8 years ago

    Excellent article, thanks so much. You guys once again show how you’re better than almost every other hardware website out there.

    • Xylker
    • 8 years ago

    So, this is what you wouldn’t really discuss at the beach.

    Absolutely brilliant article. Will have to read it again and take notes the second time.

    • flip-mode
    • 8 years ago

    Very impressive investigation and reporting, Scott. My guess is that opportunities to present something this significant don’t come around too often. You hit it out of the park.

    • Thrashdog
    • 8 years ago

    Impressive work Scott, and very interesting results. I’ve long noticed a “hitching” effect in certain game engines on certain cards, and there it is in the graphs. For the longest time I thought Fallout 3 had animated a cyclical change in speed through the walk cycle, until I realized it was just the 8600M GT in my laptop choking at ~.5 second intervals.

    If I could suggest a tweak to the methodology, it would seem to me that comparing a given frame’s render time with a rolling average of those near to it is a better way of filtering out a true spike versus generally slow framerates. That way the “threshold” can be a percentage value of the card’s steady-state performance at a given point, rather than a hard time threshold that’s going to give inconsistent results between differently-performing cards and different scenes.

    • Captain Ned
    • 8 years ago

    Scott:

    Articles like this are why some of us gerbils have been hanging around here for almost 10 years. You certainly got the attention of the GFX card head designers just by digging out and publicizing something they’ve known for some time.

    If there’s any decency in the world whatever popularized measurement comes from this research shoud be the Wasson effect (Damage effect is a bit melodramatic, methinks).

    Take one Miss Sakamoto out of petty cash.

      • Thrashdog
      • 8 years ago

      But if it’s the Damage effect, could we prefix the name to indicate the period of cyclical frame-time changes? i.e.:

      single-frame spike = Damage Effect
      two-frame cycle = double-Damage effect
      four-frame cycle = QUAD DAMAGE EFFECT!
      etc…

        • Captain Ned
        • 8 years ago

        I rest my case. 😉

      • Firestarter
      • 8 years ago

      I logged in to post exactly this sentiment. The depth to which you rooted just to see what was happening is exactly the kind of depth that most technology websites miss.

    • puppetworx
    • 8 years ago

    This was a real page-turner!

    It’s really only relevant to gamers but this is must know data for serious graphics card buyers. I look forward to seeing these charts implemented in future reviews.

    • Krogoth
    • 8 years ago

    Excellent article, Damage.

    It shows what hardcore multi-GPU guys have known for years, but this article gives hard data on how much micro-shutter effects each platform. It is rather alarming that it is more pronounced in AMD’s implementations.

    I always figured that multi-card rendering is a trade-off. You can get a higher FPS output over a single-card, but the trade-off is that you get some micro-shuttering. The hope is that your FPS output is high enough that it effectively masks the “micro-shuttering”.

    • jensend
    • 8 years ago

    [quote<]The 20-ms threshold, however, is potentially problematic.[/quote<]Definitely agreed.. I really don't think that any of the 20ms charts are helpful, and I think a handful of frames in a row coming in at <50fps isn't likely to be noticeable, even by "pro gamers." But 33ms might be a good place to look- it's slow enough that it might actually affect reactions etc and it's fast enough that looking at it might provide information meaningfully different from that given by the 50ms picture. Another way to look at the data which might be helpful: instead of simply adding up the frames above a given threshold, weight them somehow by how much they were over the threshold. For instance, if you really feel that it's important for twitch gamers to have their frame times be close to 20ms, a better way to use the data would be to add up the squares of the amounts by which individual frame times exceeded 20ms. (Some other power or some other increasing function of the excess might give you a more accurate picture, but squared deviation would be a start.)

      • Waco
      • 8 years ago

      If the higher frame times were averaged with the number of frames produced overall it might be a good start to getting some real useful statistics out of all of this.

      Overall this is a damn good article. I run a Quadfire setup and damned if I don’t see this exact problem running a lot of benchmarks (oddly enough, games almost *never* show the problem to the naked eye).

        • jensend
        • 8 years ago

        Averaging completely incommensurate quantities is not really the best step towards meaningful statistical analysis.

        Of course it’s a good article, and average ms/frame, 99th percentile ms/frame, and >50ms frames/rendering period measurements are quite informative. I find it funny that I was downvoted for agreeing with him that >20ms frames/rendering period is not really that informative or helpful and for suggesting alternatives.

    • phez
    • 8 years ago

    I came in expecting one of those mental breakdowns that went on at [H] a while back when they introduced their apples-to-oranges style of reviewing.

    Instead I was greeted with an unbelievably comprehensive report on the magical microstuttering effect and information about the video subsystem I would never have expected to even exist.

    Perhaps the single most informative article (for the layman) I have ever read regarding computer hardware. Bravo.

      • Bauxite
      • 8 years ago

      Would people please stop comparing this site to that one…the latter was never in the same league even before it broke the >1 opinion/snide remark per paragraph metric.

        • lilbuddhaman
        • 8 years ago

        woah woah woah, both sites are great and you know you visit both! They are both far above most other sites in credibility IMO…..

          • StuG
          • 8 years ago

          I think you are comparing the credibility of [H] and SA not that of [H] and TR. ;D
          And for the record, I do not visit [H]. I only venture there when linked by a friend, and I always keep a cup by the computer so I can spit the little bit of throw-up that happens when I have to read their comparisons.

            • Dr. Zhivago
            • 8 years ago

            This is exactly how I feel about [H]. They write like 8th graders at best.

        • phez
        • 8 years ago

        Sorry, I wasn’t comparing them. I meant, the title of the article “Inside the second: A new look at game benchmarking” made me think it was going to be an article about TR radically changing the way they do reviews like what [H] did a while back.

    • XXXFire
    • 8 years ago

    Absolutely comprehensive, scientific method, committed to discovery whilst humble about the potential “inabsolute” implication of your results. I pray this leads to a revolution in GPU evaluation & comparison; Nvidia & AMD are sure to peruse this article and work toward accelerating hardware that negates, hides, or dispatches entirely of the problem

    Toms did a multi-gpu, micro-stuttering report that was quite good (although, in all honesty, elementary by comparison) & demonstrated triple gpu rigs capably eliminated the frame-time latency spikes. Please, make such your next detailed case-study.

    I also have, in my experience with quad-fire 6990s, noticed absolute elimination of microstutter with extremely high CPU clocks. Even when running single-mode crossfire, I’ve felt (perhaps placebo effect) 5 GHz clocks worked in reduction of the effect, too. Who knows, at this point?

    Fantastic, fellas. One of the single best, most informative, & innovative articles I’ve read of any tech website.

    • jamsbong
    • 8 years ago

    Great article. I’m glad that the ‘lag factor’ has come to reviewer’s attention. Having 200+ fps in benchmark is always meaningless when you use vsync on during games. At the moment, I’ve a IPS monitor with 60Hz refresh. So, if my game runs at 60fps throughout the whole gaming session, it is a joyful experience. I don’t really care about the hundreds of fps that the card can actually do. That said, I would still purchase a mid-to-high end card like 560ti (of $200+ value). The reason is that you want to have some performance buffer for upcoming games.

    Game developers know this and some would actually put effort to make it work. racing games from codemasters, id software, torchlight are some examples of good games that I’ve tried with smooth fps experience.

    • juampa_valve_rde
    • 8 years ago

    Great article Scott, u dig deep with this look. Anyway, the guys at HardOCP test the cards more like this way from long ago. One guy at the forums got a great analogy about that, something like most sites just test the thing about how fast is from 0-60, but the real thing is test how fast is all along track.

      • TheEmrys
      • 8 years ago

      Not really. HardOCP only does it by second breakdowns. This is in microseconds. Vastly different look. They are relying on the old fps, but over a certain number of seconds.

        • juampa_valve_rde
        • 8 years ago

        It looks more like this way at hardocp mostly because they consider a lot more the smooth factor, minimum and average framerate, and how the cards behave all around the test, taking notes about the level of detail for each card, but yeah its totally true that hardocp measures the “feeling”, here scott is going really deep with scientific metodology. This is the real thing.

          • TheEmrys
          • 8 years ago

          Glad we agree that its totally different.

            • XXXFire
            • 8 years ago

            totally? Not really. The difference is of qualitative and quantitative testing; though both readily acknowledge and report the fundamental problem, whereas near entirety of alternate tech sites don’t.

    • TheEmrys
    • 8 years ago

    I’d also love to see how CPU choice affects this. Really sort of revolutionary concept.

    • shank15217
    • 8 years ago

    This is a great metric and a good article but I’m not sure how gamers would really perceive the problem. Sure the brain can pick up on these things really well but it can also learn to ignore them. I’m more interested in the medical effect regarding frame stuttering, would people have a higher chance of getting an epileptic seizure due to sudden changes in frame times with SLI cards? Maybe people who are prone to seizures shouldn’t buy SLI or multi-gpu configurations.

    • TheEmrys
    • 8 years ago

    Really fascinating. Does Hydra affect this differently? How about an AMD/Nvidia hybrid solution? I’m going to have to link to this for some of my fellow gamers.

    • codedivine
    • 8 years ago

    Great article! I think the 99th percentile metric is much better than “number of frames above X ms”.

    • blorbic5
    • 8 years ago

    Great article!

    • no51
    • 8 years ago

    FIRST POST!

Pin It on Pinterest

Share This