review inside the second with nvidias frame capture tools

Inside the second with Nvidia’s frame capture tools

We’ve come a long way since our initial Inside the second article. That’s where we first advocated for testing real-time graphics and gaming performance by considering the time required to render each frame of animation, instead of looking at traditional FPS averages. Since then, we’ve applied new testing methods focused on frame latencies to a host of graphics card reviews and to CPUs, as well, with enlightening results.

The fundamental reality we’ve discovered is that a higher FPS average doesn’t necessarily correspond to smoother animation and gameplay. In fact, at times, FPS averages don’t seem to mean very much at all. The problem boils down to a weakness of averaging frame rates over the span of a whole second, as nearly all FPS-based tools tend to do. Allow me to dust off an old illustration, since it still serves our purposes well:

The fundamental problem is that, in terms of both computer time and human visual perception, one second is a very long time. Averaging results over a single second can obscure some big and important performance differences between systems.

To illustrate, let’s look at an example. It’s contrived, but it’s based on some real experiences we’ve had in game testing over the years. The charts below show the times required, in milliseconds, to produce a series of frames over a span of one second on two different video cards.

GPU 1 is obviously the faster solution in most respects. Generally, its frame times are in the teens, and that would usually add up to an average of about 60 FPS. GPU 2 is slower, with frame times consistently around 30 milliseconds.

However, GPU 1 has a problem running this game. Let’s say it’s a texture upload problem caused by poor memory management in the video drivers, although it could be just about anything, including a hardware issue. The result of the problem is that GPU 1 gets stuck when attempting to render one of the frames—really stuck, to the tune of a nearly half-second delay. If you were playing a game on this card and ran into this issue, it would be a huge show-stopper. If it happened often, the game would be essentially unplayable.

The end result is that GPU 2 does a much better job of providing a consistent illusion of motion during the period of time in question. Yet look at how these two cards fare when we report these results in FPS:

Whoops. In traditional FPS terms, the performance of these two solutions during our span of time is nearly identical. The numbers tell us there’s virtually no difference between them. Averaging our results over the span of a second has caused us to absorb and obscure a pretty major flaw in GPU 1’s performance.

Since we published that first article, we’ve seen a number of real-world instances were FPS averages have glossed over noteworthy performance problems. Most prominent among those was the discovery of frame latency issues in last Christmas’ crop of new games with the Radeon HD 7950. When we demonstrated the nature of that problem with slow-motion video, which showed a sequence that had stuttering animation despite an average of 69 FPS, lots of folks seemed to grasp intuitively the story we’d been telling with numbers alone. As a result, AMD has incorporated latency-sensitive methods into its driver development process, and quite a few other websites have begun deploying frame-latency-based testing methods in their own reviews. We’re happy to see it.

There’s still much work to be done, though. We discovered a couple of problems in our initial investigation into these matters, and we haven’t been able to explore those issues in full. For instance, we encountered concrete evidence of a weakness of multi-GPU setups known as micro-stuttering. We believe it’s a real problem, but our ability to quantify its impact has been affected by another problem: the software tool that we’ve been using to capture frame times, Fraps, collects its samples at a relatively early stage in the frame rendering process. Both of the major GPU makers, AMD and Nvidia, have told us that the results from Fraps don’t tell the whole story—especially when it comes to multi-GPU solutions.

Happily, though, in a bit of enlightened self-interest, the folks at Nvidia have decided to enable reviewers—and eventually, perhaps, consumers—to look deeper into the question of frame rendering times and frame delivery. They have developed a new set of tools, dubbed “FCAT” for “Frame Capture and Analysis Tools,” that let us measure exactly how and when each rendered frame is being delivered to the display. The result is incredible new insight into what’s happening at the very end of the rendering-and-display pipeline, along with several surprising revelations about the true nature of the problems with some multi-GPU setups.

How stuff works
Before we move on, we should take a moment to establish how video game animations are produced. At the core of the process is a looping structure: most game engines do virtually all of their work in a big loop, iterating over and over to create the illusion of motion. During each cycle through the loop, the game evaluates inputs from various sources, advances its physical simulation of the world, initiates any sounds that need to be played, and creates a visual representation of that moment in time. The visual portion of the work is then handed off to a 3D graphics programming interface, such as OpenGL or DirectX, where it’s processed and eventually displayed onscreen.

The path each “frame” of animation takes to the display involves several stages of fairly serious computation, along with some timing complications. I’ve created a horribly oversimplified diagram of the process below.

As you can see, the game engine hands off the frame to DirectX, which does a lot of processing work and then sends commands to the graphics driver. The graphics driver must then translate these commands into GPU machine language, which it does with the aid of a real-time compiler. The GPU subsequently does its rendering work, eventually producing a final image of the scene, which it outputs into a frame buffer. This buffer is generally part of a queue of two to three frames, as in our illustration.

What happens next depends on the settings in your graphics card control panel and in-game menus. You see, although the rendering process produces frames at a certain rate—one that can vary from frame to frame—the display operates according to its own timing. In fact, today’s LCD panels still operate on assumptions dictated by Ye Olde CRT monitors, as if an electron gun were still scanning phosphors behind the screen and needed to touch each one of them at a regular interval in order to keep it lit. Pixels are updated from left to right across the screen in lines, and those lines are refreshed from the top to the bottom of the display. Most LCDs completely refresh themselves according to this pattern at the common CRT rate of 60 times per second, or 60 Hz.

If vsync, or vertical refresh synchronization, is enabled in your graphics settings, then the system will coordinate with the display to make sure updates happen in between refresh cycles. That is, the system won’t flip to a new frame buffer, with new information in it, while the display is being updated. Without vsync, the display will be updated whenever a new frame of animation becomes ready, even if it’s in the middle of painting the screen. Updates in the middle of the refresh cycle can produce an artifact known as tearing, where a seam is visible between successive animation frames shown onscreen at once.

An example of tearing from Borderlands 2

I sometimes like to play games with vysnc enabled, in order to avoid tearing artifacts like the one shown above. However, vsync introduces several problems. It caps frame rates at 60 Hz, which can interfere with performance testing (especially FPS-average-driven tests). Also, vsync introduces additional delays before a frame of animation makes it to the display. If a frame isn’t ready for display at the start of the current refresh cycle, its contents won’t be shown until the next refresh cycle begins. In other words, vysnc causes frame update rates to be quantized, which can hamper display updates at the very worst time, when GPU frame rates are especially slow. (Nvidia’s Adaptive Vsync feature attempts to work around this problem by disabling refresh sync when frame rates drop.)

We have conducted the bulk of our performance testing so far, including this article, with vsync disabled. I think there’s room for some intriguing explorations of GPU performance with vsync enabled. I’m not entirely sure what we might learn from that, but it’s a different task for another day.

At any rate, you’re probably getting the impression that lots happens between the game engine handing off a frame to DirectX and the content of that frame eventually hitting the screen. That takes us back to the limitations of one of our tools, Fraps, which we use to capture frame times. Fraps grabs its samples from the spot in the diagram where the game presents a completed frame to DirectX by calling “present,” as denoted by the orange line. As you can see, that point lies fairly early in the rendering pipeline.

Since the frame production process is basically a loop, sampling at any point along the way ought to tell us how things are going. However, there are several potential complications to consider. One is the use of buffering later in the pipeline, which could help smooth out small rendering delays from one frame to the next. Another is the complicated case of multi-GPU rendering, where two GPUs alternate, one producing odd frames and the other churning out even frames. This very common load-balancing method can potentially cause delays when frames produced on the secondary GPU are transferred to the GPU connected to the display. Thornier still, Nvidia claims to have created a “frame metering” tech to smooth out frame delivery to the display on SLI configs—and that further complicates the timing. Finally, the issues we’ve noted with display refresh sync can play a part in how and when frames make it to the screen.

So.. yeah, Fraps is busted, right? Not exactly. You see, it’s situated very close to the game engine in this whole process, and the internal simulation timing of the game engine determines the content of the frames being produced. Game animation is like a flipbook, and the contents of each page must advance uniformly in order to create the fluid illusion of motion. To the extent that Fraps’ timing matches the internal timing of the game engine, its samples may be our truest indication of animation smoothness. We don’t yet have a clear map of how today’s major game engines track and advance their internal timing, and that is a crucial question. Fortunately, we do now have one other piece of the puzzle: some new tools that let us explore these issues at the ultimate end of the rendering pipeline: the display output. Let’s have a look at them.

The FCAT tools
You may recall that we first talked to Nvidia’s Tom Petersen about frame latencies and multi-GPU micro-stuttering right when we first started looking at these things. To our surprise, Petersen had obviously been working on these matters before we spoke, because he very quickly produced a fairly robust presentation related to micro-stuttering and Fraps captures. That was about a year and a half ago. Turns out Peteresen and his team have been working on FCAT tools for about two years. We’ve had a few hints along the way that something along these lines was in the works, and that some tools might be presented to the press when the time was right. A couple of weeks ago, Petersen and another Nvidia rep visited Damage Labs to help us get up and running with a frame capture setup and the FCAT suite of tools.

This setup requires a few bits of very specific hardware and a fairly capable host PC.

Pictured above is a Datapath VisionDVI-DL video capture card, which is capable of capturing uncompressed digital video over a dual-link DVI link at very high resolutions and refresh rates. For our purposes, it’s able to collect each and every frame of a video sequence at resolutions up to 2560×1440 at a refresh rate of 60 Hz—enough to stress a high-end GPU config running the latest games. (2560×1600 doesn’t seem to work, for what it’s worth.) During such a capture, the card is streaming data at a rate of 422 MB/s, which is… considerable.

I can’t say the setup process for this card is easy. The thing didn’t want to work at all with our Intel X79 motherboard (although I’d rather not work with an Intel X79 motherboard myself, I must admit). We eventually got it going with an MSI Z77 board, but we had to disable a number of extra system devices, like USB 3.0 and auxiliary storage controllers, in order to get it working consistently.

The video output from the gaming system being tested connects to this Gefen dual-link DVI splitter, which feeds outputs to both the monitor and the DVI capture card. Nvidia told us its cards could avoid using a splitter by working in clone mode. However, clone mode is not always possible on Radeons in conjunction with CrossFire, so Nvidia chose to include a splitter in its FCAT config for reviewers.

We were advised that we’d need a storage subsystem capable of fast and truly sustained transfer rates, so we turned to the folks at Corsair, who kindly supplied four Neutron SSDs for our capture rig. At Nvidia’s suggestion, we attached them to an Intel storage controller and put them into a RAID 0 config. If you like round numbers, this array is almost a terabyte of storage capable of writing at almost one gigabyte per second.

Which ain’t bad. In fact, it’s shockingly good and consistent; we virtually never saw dropped frames once our capture setup was configured properly. I suspect this RAID could easily go faster if Intel storage controllers had more than two 6Gbps SATA ports available.

Once you have the ability to capture each and every frame of animation streaming out of a video card at will, you’re already well down the road to some interesting sorts of analysis. You can play back the sequence exactly as it looked the first time around, slow it down, speed it up, pause, and step through frame by frame. You can even correlate individual frames of animation to spikes recorded in Fraps and things like that. But what if you want to measure the timing of each and every GPU frame coming to the display?

For that purpose, Nvidia has developed an overlay program that inserts a colored bar along the left-hand side of each frame rendered by the GPU. These colors are inserted in a specific sequence of 16 distinct hues and serve as a sort of watermark, so each individual frame can be identified in sequence.

This gets complicated because, remember, with vsync disabled, the “frames” produced by the GPU don’t correspond directly to the video “frames” displayed onscreen. In the example above, six GPU frames are spread across four display frames. The GPU is producing frames slightly faster than 60 FPS, or 60 Hz, during this span of time. The GPU frame marked “green” spans two video frames, occupying the bottom half of one and a small slice of the top of the next one, before the GPU switches to a new buffer with the aqua frame. And so on.

We used VirtualDub for the captures. If you simply capture this sort of output with the overlay enabled, you can page through individual video frames to get a clear sense of how frame delivery is happening. Very, very cool stuff. The next bit, though, is kind of magic.

The FCAT extractor tool scans through any input video with the overlay present and produces a CSV file with information about how many scan lines of each color are present in each frame of video. This file contains the raw data needed for all sorts of post-processing, including figuring out which GPU frames span multiple video frames and the like.

Interestingly enough, when we first tried the extractor tool with videos captured from a Radeon HD 7970, it didn’t work quite properly. We asked Petersen about the problem, and he eventually found that the extractor was having trouble because the overlay colors being displayed by the Radeon weren’t entirely correct. The extractor routine had to be adjusted for looser tolerances in order to account for the variance. That variance is mathematically very minor and not easily perceptible, but it is real. To the right is a pixel-doubled and heavily contrast-enhanced section of the (formerly) pink overlay output from the Radeon in Far Cry 3. You can probably see that there is some noise in it. The same pink overlay section from the GeForce GTX 680 is all the same exact color value. Not sure what that’s worth or what the cause might be, but it’s kind of intriguing.

After the overlay info has been extracted, the next step is to process it in various ways. Petersen has created a series of Perl scripts that handle that job. They can spit out all sorts of output, including a simple set of successive frame times that we can use just like Fraps data. The FCAT scripts include lots of options for processing and filtering the data, and since they’re written in Perl, they can be modified easily. One thing they’ll do is use Gnuplot to graph results. In fact, by default, the FCAT scripts produce two graphs that will look fairly familiar to TR readers: a frame time plot and a percentile curve.

Pardon the extreme compression, but the default plot size is ginormous, and I’ve not yet sorted out how to modify it. One nice thing the FCAT frame time plot does is correlate each frame time distribution to the scene time, something you won’t see in our current Excel plots.

The percentile curves will look inverted if you’re used to ours, because FCAT converts them into FPS terms. I know that option will be popular with some folks who still find the concept of FPS more intuitive to understand.

We haven’t yet converted to using FCAT’s visualization tools in place of our usual Excel sheets, but there is potential for automation here that extends well beyond what we currently have in place. If these tools are to be widely used in the industry—or, heck, even consistently used in several places—then automation of this sort will no doubt be needed. Processing this type of data isn’t trivial; it’s a long way from throwing together a few FPS averages.

Speaking of which, I should say that my summary of FCAT capture and analysis boils down a much more complex process. Configuring everything to work properly is a tedious affair that involves synchronizing EDIDs for the display and capture card behind the splitter, doing just the right magic to ensure good video captures without dropped or inserted frames, and a whole host of other things.

With that said, it’s still extremely cool that Nvidia is enabling this sort of analysis of its products. The firm says its FCAT tools will be freely distributable and modifiable, and at least the Perl script portions will necessarily be open-source (since Perl is an interpreted language). Nvidia says it hopes portions of the FCAT suite, such as the colored overlay, will be incorporated into third-party applications. We’d like to see Fraps incorporate the overlay, since using it alongside the FCAT overlay is sometimes problematic.

Now, let’s see what we can learn by making use of these tools.

Test notes
We were only able to get the FCAT overlay working reliably alongside Fraps in four of the nine games from our latest graphics test suite. Using both tools was important to us, simply because we wanted to correlate Fraps and FCAT data in order to see how they compare. We were able to do so with some games, but not others. We burned quite a bit of time converting our GPU test rigs from Windows 8 to Windows 7 in order to improve compatibility between Fraps and the overlay, but doing so didn’t yield any real improvement.

Furthermore, the data you’ll see on the following pages typically comes from just a single test run for each graphics solution. Our usual practice has been to use five test runs per game per solution, but time constraints and additional workflow complications made that sort of sampling impractical for this article. Heck, the FCAT data sets for a single run from each config across five games was over 630GB, if you include the raw video. Such problems can be managed with moar hardware, but we haven’t built an external FCAT storage array quite yet. The single test runs we’ve included should suffice for the sort of analysis we want to do today. Just keep in mind that this is not one of usual GPU reviews based on substantially more testing.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:

Processor Core i7-3820
Motherboard Gigabyte
Chipset Intel X79
Memory size 16GB (4 DIMMs)
Memory type Corsair
Vengeance CMZ16GX3M4X1600C9
DDR3 SDRAM at 1600MHz
Memory timings 9-9-9-24
Chipset drivers INF update
Rapid Storage Technology Enterprise
Audio Integrated
with Realtek drivers
Hard drive OCZ
Deneva 2 240GB SATA
Power supply Corsair
OS Windows 7
Service Pack 1
core clock
GTX 680
314.21 beta
1006 1059 1502 2048
Dual GeForce
GTX 680
314.21 beta
1006 1059 1502 2 x 2048

Radeon HD 7970 GHz
13.3 beta 2
1000 1050 1500 3072
Dual Radeon HD
7970 GHz
13.3 beta 2
1000 1050 1500 2 x 3072

Thanks to Intel, Corsair, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

In addition to the games, we used the following test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Fraps vs. FCAT: Skyrim
We’ll start with our Skyrim test because it’s very repeatable and has proven to be almost impossible to complete without a few spikes in frame times. The numbers you’ll see below come from both Fraps and FCAT captures from the exact same test run. I simply ran Fraps and the FCAT overlay together and recorded video on the capture system while benchmarking with Fraps. After the fact, I was able to watch for the start and end points of the Fraps test in the video and correlate them more or less exactly with the frames we analyzed to produce the FCAT results.

The outcome should give us a sense of what’s happening at two points in the rendering process: when the game engine hands off a frame to DirectX (Fraps) and when the frame hits the display (FCAT).

The plots above show how closely correlated the Fraps and FCAT frame time distributions appear to be. Click through the buttons above to see the results for each config tested.

I expect some folks will be ready to give me the beating I so richly deserve for presenting the data in this way, by which I mean “in a really small image” and “without sufficient color contrast.” I apologize. I was limited by both time and ability. And by Microsoft Excel, which should not escape blame. I fully endorse the use of “Ctrl + Mouse-wheel up” to zoom in on the frame time plots for better visibility.

What you should be seeing is that three of the four configs have very close correlations between the Fraps and FCAT numbers, and that plots for the FCAT results are much tighter, with less frame-to-frame variance than the Fraps numbers have. That suggests there’s some natural variance in the dispatch of frames coming from the game engine (closer to where Fraps measures) that gets smoothed out by buffering later in the pipeline.

Now, that doesn’t mean one set of results is “correct” and the other
“incorrect.” As far as we know, both are correct for what they measure, at different points in the pipeline. One thing we’ll want to investigate further is those spots where the Fraps plot shows latency spikes that the FCAT plot does not. Keep that in mind for later.

On another front, FCAT looks to be giving us some important additional insight about the Radeon HD 7970 CrossFire setup: its Fraps results look like the other solutions’ Fraps plots, but its FCAT output is much “fuzzier,” with larger frame-to-frame swings. Curious, no? That’s probably not a good outcome, but it does map well to our expectations that Fraps results may not capture the extent of the timing differences introduced by multi-GPU load-balancing.

Let’s see how these data look in our latency-focused performance metrics.

Using a traditional FPS average, the SLI and CrossFire setups would appear to perform nearly twice as well as the single-GPU solutions. However, when we switch to the latency-oriented 99th percentile frame time, the Radeon HD 7970 CrossFire config proves not to be so hot. The 99th percentile frame time is just the threshold below which 99% of all frames were rendered; we can look at the fuller latency curve for a better sense of what went wrong.

The FCAT latency curve for the 7970 CrossFire config has that classic profile shown by multi-GPU micro-stuttering. About 50% of the frame times are inordinately low, and the other half are inordinately high. As we approach the 99th percentile on the FCAT latency curve, the 7970 CrossFire config’s frame times climb to within a few milliseconds of the single 7970’s. Uh oh.

Our measure of “badness” often acts as an anchor for us, preventing us from getting too bogged down in the weeds of other analysis. This metric adds up any time spent working on frames that take longer than a given threshold. Our primary threshold here, 50 milliseconds, equates to about 20 FPS. We figure any animation that dips below the 20 FPS mark is in danger of looking choppy. Also, 50 ms maps to three vertical refresh intervals on a 60 Hz display. The other two, 16.7 ms and 33.3 ms, map to 60 FPS with a single refresh interval and 30 FPS with two refresh intervals, respectively.

The truth is that all these solutions are incredibly quick in Skyrim, with only a few milliseconds spent above our main threshold by any of them. That jibes with our sense that all these cards ran this test pretty smoothly, with only an occasional hiccup in each case. You may also notice that the Fraps data tends to show more time spent beyond each threshold than FCAT does. That should be no surprise given the larger spikes visible in the Fraps plots. Let’s see what we can make of that fact.

Anatomy of a stutter
We have two directions we can go at this point: pursuing the multi-GPU micro-stuttering issue, and looking at cases where Fraps latency spikes aren’t reflected as strongly in the frame delivery measured by FCAT. Let’s start with the latter, and then we’ll circle back to the micro-stuttering problem.

We’re not trying to pick on the Radeon HD 7970 by using it as an example here. As you saw on the last page, each of the configs tested had one or more of these latency spikes during the duration of the test run. The 7970 plot and video just give us a nice test case for what happens when Fraps and FCAT measurements don’t match.

We’ve zoomed in on the second of the two spikes in the 7970’s Fraps plot. As you can see, there’s a very small spike to 22 milliseconds in the FCAT plot, but a much larger spike, to nearly 50 ms, in the Fraps plot at the same spot. The question is: if frame delivery is still relatively even, as the FCAT results indicate, does it matter whether there was a spike in the Fraps data? As you might imagine, I was like a kid in a candy store when I got to pull up the video of the test run and see how it looked as I paged through the animation.

What I saw was… a quick but easily perceptible disruption in the animation, something much more on the order of the 50-ms delay indicated in the Fraps data. I’ve attempted to share this eureka moment with you by snipping out a brief video clip of the vaunted stutter in action. Since YouTube is gonna convert the video to 30 FPS no matter what, I took the liberty of slowing the source video to 15 FPS, in the hopes of keeping some semblance of each source frame intact. So hit play and buckle up. You should see a skip at about halfway through this six-second extravaganza.

Yeah, so YouTube pretty much adds its own stutter to the mix. Perhaps not the best tool for this job. Still, if you can get the video to play smoothly, the momentary stutter is real and perceptible. Although it’s still pretty minor in the grand scheme, it confirms for me what Andrew Lauritzen has argued about the value of Fraps data. He was discussing a different Skyrim video of ours at the time, but the principle remains the same:

Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the affect of the game adjusting how far it steps the simulation in time each frame. . . . A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.

Even with buffering smoothing out frame delivery as measured by FCAT, the spike in the Fraps plot indicates a disruption in timing that has an impact on the content of the frames being displayed and thus on the smoothness of the animation.

We’ve been hearing an argument out of AMD about the value of Fraps data that I should address in this context. Folks I’ve talked to there have insisted to me that they’ve seen cases where a spike in Fraps frame times doesn’t translate into an interruption in animation, seemingly casting aspersions on the value of Fraps data. After talking to AMD’s David Nalasco last night, I think I understand this position better.

I believe Nalasco would modify Andrew Lauritzen’s statement above to: “it is sometimes okay to see spiky output in frame latency graphs,” simply because not every little hiccup or spike translates into a flaw in the animation that one can perceive. There are a couple of possible reasons why that could be the case. One has to do with the tricky question of what constitutes a stutter and what the threshold for human perception of a problem might be. Small interruptions may not matter if no one will notice them, especially with the display refresh cycle complicating how frames are presented. A related technical question is how large an interruption has to be in Fraps—by back-pressure in the rendering queue preventing the submission of a new frame—before a perceptible stutter is created. This issue is complicated by the varying ways in which game engines keep and advance the timing for their simulations. Some engines may be more tolerant of small bubbles in the pipeline if they advance time in regular intervals from frame to frame, even when those frames are being submitted closely together to refill the queue after a hiccup.

I won’t argue with any of that. However, Nalasco also concedes that a sufficiently large frame time spike in Fraps will indeed translate into an interruption in the game’s animation. I think he just wants folks to avoid obsessing over small spikes in frame time charts and to keep in mind that perception is the final arbiter of animation smoothness.

Which, you know, is why we create silly little videos like this one, examining a single case of a 50-millisecond frame and the visual interruption it appears to create.

However—and this is a huge caveat—we have some trepidation about declaring even this one particular example a definitive triumph for Fraps-based measurements. You see, like most folks who test gaming performance, we’ve removed the built-in frame rate cap in Skyrim. We already know that doing so causes some funky timing quirks for things like the game’s AI, but it may also modify the game’s fundamental timekeeping method for all of its physical simulation work. (The variable we’ve modified in order to “uncap” Skyrim is called “iPresentInterval”, and we’ve changed it from “1” to “0.” You may recall that Fraps measures when the game calls Present(). Hmm.) If our uncapping effort has changed the way time is kept in the game, it may have created the possibility of frame-to-frame timing issues that one would usually not see with the game engine’s default timing method. This thought occurred to me on an airplane, on the way out to GDC, so I haven’t been able to dig deeper into this issue yet. I definitely think it merits further investigation, and the frame-by-frame playback and analysis possible with the FCAT tool set should be a big help when the time comes.

Borderlands 2
Let’s look at how Fraps and FCAT data compare in a few more games, while keeping an eye on the multi-GPU systems for evidence of micro-stuttering problems. Then, we’ll address micro-stuttering in little more depth.

Borderlands 2 is noteworthy in this context not just because it’s a great game, but also because it’s based on the incredibly popular Unreal engine, like a whole ton of other titles. Interestingly enough, our results for this game show incredibly close correspondence between Fraps timing and FCAT frame delivery. Yes, that’s what you’re seeing in the plots above—not just a single distribution, but two that almost entirely overlap. If you look closely, you can see that even the spikes tend to overlap. The peaks are a little higher in Fraps in several cases, but usually not by much. We do see a little “fuzziness” at a few spots in the Radeon HD 7970 CrossFire plot from FCAT, which likely indicates some micro-stuttering, but it’s relatively minimal.

Every one of our metrics confirms that Fraps and FCAT are virtually in unison here. That’s a good thing, because it should mean that the content of frames being displayed will match the timing of their appearance onscreen quite closely. It also gives us quite a bit of confidence that we’re measuring the “true” performance of these graphics solutions in Borderlands 2 between these two tools.

Guild Wars 2

This one is a little different. According to Fraps, the frame-time distributions for all tested configs are pretty spiky. By FCAT’s account, the single-GPU solutions are nice and tight, with relatively little variance overall; however, both of the multi-GPU offerings are at least as spiky as in the Fraps data. The Radeon HD 7970 CrossFire config’s Fraps and FCAT plots match up very closely, while the GTX 680 SLI’s FCAT results show even more variance than the Fraps output.

Frankly, we need to spend a little more time looking into this one. Doing so should help us understand the impact of the divergence between the Fraps and FCAT results for the single-GPU cards. This game is especially tricky because it’s an MMO with a client-server relationship—and, remember, we have only a single test run for each card in this case. Subjectively, the animation didn’t seem entirely smooth to us on any of the cards, but I’d like to spend more time with the captured video before drawing any conclusions. For now, let’s move on and take a closer look at some of the multi-GPU issues we’ve encountered.

Multi-GPU issues: Micro-stuttering, runt frames, and more
So what exactly was going on with those strangely “cloudy” frame time plots for the 7970 CrossFire, especially in Skyrim? Let’s look more closely at a small snippet of time from each test run, starting with the Fraps results.

The plots for the two multi-GPU solutions above both show obvious evidence of multi-GPU micro-stuttering, with frame times oscillating in that familiar sawtooth pattern. This timing issue is caused by the preferred method of balancing the load between two GPUs, alternate frame rendering (AFR), in which one chip renders the even-numbered frames and the other renders odd-numbered frames, in interleaved fashion. When the two GPUs aren’t exactly in sync, frame delivery becomes uneven. I have a hard time getting excited about this problem in this particular instance simply because the frame times involved are very short, so the difference between them is small. Still, a multi-GPU config with this sort of jitter is really no quicker than those longer frame times in the pattern. In some cases, with too much jitter, multi-GPU solutions may not be much faster than a single GPU of the same type.

So that’s that, but now look what happens with the two multi-GPU configs in the FCAT results, where we’re measuring frame delivery times, not frame dispatch times.

The jitter pattern is eliminated on the GeForce GTX 680 SLI setup—evidence that Nvidia’s frame metering technology for SLI is doing its job. This technology tracks frame delivery times and inserts very small delays as needed in order to ensure even spacing of the frames that are displayed. The FCAT tools give us the ability to confirm that frame metering works as advertised. Remember how I said the FCAT release was a bit of enlightened self-interest? Yeah, here’s where the self-interest comes into the picture. Nvidia gets to show off its frame metering tech.

Meanwhile, our FCAT results suggest the jitter on the Radeon HD 7970 CrossFire setup is much more severe than Fraps detected. The shorter frame times in the pattern are literally a fraction of a millisecond, while the longer frame times are effectively twice what an FPS average of this section might suggest. How does a 0.3 millisecond frame look onscreen? Something like this:

In that example, there are portions of five GPU frames onscreen at once, as the overlay indicates. The little aqua and silver snippets are tiny portions of what are, presumably, fully rendered frames from the GPU, but the timing is so off-kilter than only a few scan lines of them are shown onscreen. Here’s a close-up of one of these “runt” frames (Nvidia’s term for them) causing tearing in BF3.

The runt frame appears to have real content; it just isn’t onscreen long enough to add any substantial new information to the picture.

Above is an illustration of another snag we encountered with the 7970 CrossFire config in multiple games. This is a three-video frame sequence from BF3 with the FCAT overlay enabled. The expected color sequence for the overlay here is red, teal, navy, green, and aqua. What you see displayed, though, is red immediately followed by navy and then aqua. The teal and green frames aren’t even runts here—they’re simply not displayed at all. They’re just dropped.

Here’s another funky anomaly we encountered intermittently with the CrossFire setup. The FCAT analysis script was reporting two-pixel-tall “frames” that were out of the expected sequence, which was a bit of a puzzle. If you page through the video sequence shown above, everything looks correct at first glance, with the proper sequence of fuchsia, yellow, orange, white, and lime. However, if you zoom in on the top-left corner of the last video frame of the sequence, you’ll see this:

That’s a two-pixel yellow overlay bar and, to its right, apparently the other content of an out-of-sequence frame. When this happens, the out-of-place imagery always shows up at the top of the screen like this. Based on lots of zooming and squinting, I believe the content of the two scanlines here matches the timing of the yellow-marked GPU frame from the first video frame in the sequence. Somehow, it’s “leaking” into the top of this video frame. Not a huge problem, frankly, but it’s an apparent bug in CrossFire frame delivery.

So what do we make of the problems of runt and dropped frames? They’re troublesome for performance testing, because they get counted by benchmarking tools, helping to raise FPS averages and all the rest, but they have no tangible visual benefit to the end user.

Nvidia’s FCAT scripts offer the option of filtering out runt and dropped frames, so that they aren’t counted in the final performance results. That seems sensible to me, so long as it’s done the right way. The results you’ve seen from us on the preceding pages were not filtered in this fashion, but we can apply the filters to show you how they affect things. By default, the script’s definition of a “runt frame” is one that occupies 20 scan lines or less, or one that comprises less than 25% of the length of the prior frame. I think the 20-scan-line limit may be a reasonable rule of thumb, but I’m dubious about the 25% cutoff. What if the prior frame represented a big spike in frame rendering times?

Fortunately, the filtering rules in the FCAT scripts are easily tweakable, so we can define our own thresholds for these things. I expect you’ll see lots of results today and in the coming weeks that accept FCAT’s default filtering rules, though, so let’s take a look at how they affect some test data. Here are the Fraps and FCAT results for the Radeon HD 7970 CrossFire setup in Skyrim, followed by the filtered version from FCAT.

Filtered in this way, the CrossFire config loses lots of frames from its output. You can imagine what that does to its FPS average:

Interestingly, even the 99th percentile frame time is affected slightly by the removal of so many super-short-time frames, whose presence shifts the cutoff point for 99% of frames rendered.

So, yeah, accounting for these frame delivery problems with filtering really alters the relative performance picture. By contrast, the SLI setup is barely touched by the filters in this case. We did see a few runt frames from the SLI rig in both Skyrim and Guild Wars 2, but they never amounted to much.

Battlefield 3 multi-GPU performance
Let’s take a look at another game where multi-GPU micro-stuttering comes into play. This time, we’ve left out the Fraps numbers so we can concentrate on both raw and filtered FCAT results.

Obviously, the Radeon HD 7970 CrossFire setup cranks out more frames than anything else in this test, but just as clearly, it has some pronounced jitter going on. Here’s an extreme close-up of some of the worst of it:

FCAT’s filtering removes those runt frames from the equation, and we’re left with substantially altered performance results for the CrossFire rig. As you can see, none of the other solutions’ results are affected at all. Their filtered and raw frame time plots and latency curves entirely overlap, and the rest of their scores are identical.

With the FCAT filtering applied, the Radeon HD 7970 CrossFire setup’s average FPS drops precipitously. Even without filtering, the longer frame times in its jitter pattern affect its latency curve negatively. Add in filtering, and the CrossFire rig’s 99th percentile frame time drops back to match a single Radeon’s almost exactly. Filtered or not, there really is no measurable benefit to having a second graphics card in the mix.

With that said, we have to point out that in this game at these settings, the 7970 CrossFire rig performs just fine, speaking both subjectively and going by the resulting numbers. In our measure of “badness,” time spent working on frames beyond 50 milliseconds, none of the cards register even a single blip. They spend no substantial time working on frames that take longer than 33 ms, either.

So now what?
This first take on Nvidia’s FCAT tools and the things they can measure is just a beginning, so I don’t have many conclusions for you just yet. But this is an awfully good start to a new era of GPU benchmarking. We can now see into the early stages of the rendering pipeline with Fraps and then determine exactly what’s happening at the other end of the pipe when visuals are delivered to the display with FCAT. We can correlate the two and see how much leeway there is between them. And we also have videos that allow us to review the resulting animation with frame-by-frame precision, to show us the exact impact of any spikes or anomalies in the numbers we’re seeing. These are the best tools yet for understanding real-time graphics performance, and they offer the potential for lots of new insights.

The fact that Nvidia has decided to release analytical tools of this caliber to the general public is remarkable. Yes, the first results of those tools have detected some issues with its competition’s products, but who knows what other problems we might uncover with them down the road? Nvidia is taking a risk here, and the fact it’s willing to do so is incredibly cool.

Going forward, there’s still tons of work to be done. For starters, we need to spend quite a bit more time understanding the problems of multi-GPU micro-stuttering, runt frames, and the like. The presence of these things in our benchmark results may not be all that noteworthy if overall performance is high enough. The stakes are pretty low when the GPUs are constantly slinging out new frames in 20 milliseconds or less. I’ve not been able to perceive a problem with micro-stuttering in cases like that, and I suspect those who claim to are seeing extreme cases or perhaps other issues entirely. Our next order of business will be putting multi-GPU teams under more stress to see how micro-stuttering affects truly low-frame-rate situations where animation smoothness is threatened. We have a start on this task, but we need to collect lots more data before we are ready to draw any conclusions. Stay tuned for more on that front. I’m curious to see what other folks who have these tools in their hands have discovered, too.

The FCAT analysis has shown us that Nvidia’s frame metering tech for SLI does seem to work as advertised. Frame metering isn’t necessarily a perfect solution, because it does insert some tiny delays into the rendering-and-display pipeline. Those delays may create timing discontinuities between the game simulation time—and thus frame content—and the display time. They also add a minuscule bit to the lag between user input and visual response. But then there’s apparently a fair amount of low-stakes timing slop in PC graphics, as the gap between our Fraps and FCAT results (in everything but the Unreal-engine-based Borderlands 2) has demonstrated. The best thing we can say for frame metering is that it makes the Fraps and FCAT times for SLI solutions appear to correlate about like they do for single-GPU solutions. That’s a really high-concept way of saying that it appears to work pretty well.

We do want to be careful to note that frame delivery as measured by FCAT is just one part of a larger picture. Truly fluid animation requires the regular delivery of frames whose contents are advancing at the same rate. What happens at the beginning of the pipeline needs to match what happens at the end. Relying on FCAT numbers alone will not tell that whole story; we’d just be measuring the effectiveness of frame metering techniques. We’ve come too far in the past couple of years in how we measure gaming performance to commit that error now.

Ideally, we’d like to see two things happen next. First, although FCAT’s captures are nice to have and Nvidia’s scripts provide a measure of automation, using these tools is a lot of work and generates huge amounts of data. It would be very helpful to have an API from the major GPU makers that exposes the true timing of the frame-buffer flips that happen at the display. I don’t think we have anything like that now, or at least nothing that yields results as accurate as those produced by FCAT. With such an API, we could collect end-of-pipeline data much easier and use frame captures sparingly, for sanity checks and deeper analysis of images. Second, in a perfect world, game developers would expose an API that reveals the internal simulation timing of the game engine for each frame of animation. That would allow us to do away with grabbing the Present() time via Fraps and end any debate about the accuracy of those numbers. We’d then have the data we need to correlate with precision the beginning and ending of the pipeline and to analyze smoothness—or, well, for someone who’s smarter than us about the tricky math of a rate-match problem and the perceptual thresholds for smooth animation to do so.

Follow me on Twitter for shorter ramblings.

0 responses to “Inside the second with Nvidia’s frame capture tools

  1. I think that instead of the percentile curve you could reach a more meaningful result using a derived curve(of the frametime curve).
    Let’s say that the average is 60 fps.
    Now let’s say that 20 percent of the frames are 25 ms(40fps).
    The difference is how these 25 ms values are spread in the curve. If they are all together or if they are alternated to 17 ms ones, forming saw-like shape in the curve.
    You will not have the same feeling stutter-wise (and here i am not saying anything new)
    What i want to say is that the percentile graph is not appropriate for the kind of analysis that you are doing. You should use a derived curve since deriving a function measures how quickly a curve grows (negatively or positively) and this is not measured by the percentile curve. After this you could measure the area of this curve and you could also arrive to use one only number to measure the amount of stutter.Infact in this way you would bring out of the equation the part of the frametime curve that is below the average but that runs steadily(something that with percentile curve you cant do).
    Calculating the area of the derivation of a very saw-like frametime curve you would obtain a high number whereas calculating the area of the derivation ofa smooth (even if variating) frametime curve you would get a very low number. This would tell you how smooth are transitions, not if the gpu is powerful enough to make the game playable. For this you should check the average fps.
    So in the end if you got decent fps and very low value for the area of this function you got a great experience,
    if oyu got decent fps but high derived func area value then you got stutterish experience.
    If you got low fps and low value you got a underdimensioned gpu but good smoothness.

  2. Actually, I misunderstood your original statement, and I agree with you- developing for the commercial side first, where the margins are, makes sense as they’ll want to get the specific desirable features worked in.

    But in production, it makes sense to release the consumer product first due to variability in the manufacturing process. Initially, they’ll likely have far more parts that are only fit for the consumer market than those that rate high enough for the commercial products.

  3. Anybody else bothered that the nvidia supplied tool of FCAT only seems to show their multi-gpu products in a good light? And to then go ahead and base an entire article on this vendor supplied tool? Seems incredibly unprofessional to me.

    The only saving grace is the inclusion of FRAP’s data which counters the obviously biased nvidia tool. But then the article continues on to debunk the FRAP’s data and talk about what an innacurate tool it is or may be. This may well be the case with FRAPs not providing the most accurate data, but I think it is irresponsible to base conclusions on a tool provided by nvidia. Did anyone reach out to amd to see what kind of developer tools they have that could measure the same thing? Or better yet how about a third-party supplied tool?

    Many of us have known for a long time that both SLI and Crossfire have not been really worth all of the trouble that they cause. Wether it is having to find profiles for games, multiple beta drivers, micro-stuttering, visual artifacts, or just general instability it has largely not been worth the hassle. Until there are third party tools that can prove otherwise, I will just go back to my rule of thumb that all multi-gpu set ups are not worth it.

  4. I can see that, but commercial interests are more likely to have specific requirements rather than “moar pixels” from consumers. Again, I’m just thinking of how I would approach it. I suspect that Nvidia, AMD, and Intel all have a better understanding of the markets than I do.

  5. Worryingly it seems like they weren’t aware of the problem themselves, and in any case it’s not clear whether this is a recent problem or has always been the case with Crossfire.

    FWIW, I ran 4850s in Crossfire for four years before recently upgrading to a 7950 and never noticed a massive problem. However I typically used Vsync, at least partly because it always felt smoother as well as getting rid of the tearing. Perhaps with Vsync off I was seeing this frame metering problem all along?

  6. I have been reading about this new technique for frame metering the last couple of days from the various websites that received the video capture hardware along with the scripts from nvidia and I just found another tool that might be of interest.
    The tool is from intel and it’s called the Graphics Performance Analyzer. As seen from the corresponding video it seems a really good fit for frame metering.
    Apart from that, there is the GPUview tool from Microsoft, but I suppose you already know that

    Anyway here are the links to check it out:
    Intel GPA: [url<][/url<] Intel GPA Demo Video: [url<][/url<] GPUview: [url<][/url<] Oh and great work scott, you really started a new trend with your unique approach. The thing is that for almost 2 years you insisted on it, not giving up although you were the only one doing so. And voila the results are just coming to light, there are already 3 more sites investigating the issue at this level and the two main graphics companies now consider this a priority, well done!

  7. Yes that’s possible. Bottom line is the dual gpu setup becomes almost useless at this time. Also for a long time the fps numbers everyone relies on were not really exposing real world performance issues.
    Obviously this is fixable eventually, but I think in this case AMD owes buyers of dual GPU set ups an apology.

  8. Really, Consumer to Commercial makes more sense, especially when you’re binning. Consumers will take crippled-ish products that are stable and effective in their areas of interest, while Commercial users demand the absolute best in terms of reliability and accuracy.

    Releasing to Consumers first allows companies time to cull the best products for their Commercial customers.

  9. There’s a difference between rendered and displayed.

    This might seem an academic technicality, and I don’t disagree that frames that are reduced to just a few lines on the final display should be ignored in the final count. However the tragedy for AMD here is that the cards probably are fully rendering all these frames and it’s the lack of synchronisation at the display that is resulting in a lot of this effort going to waste, just as JAE suggests.

    That’s why in these cases the “corrected” framerate drops down close to the equivalent single card level, because a lot of the time we are only seeing the output from one of the cards. The second card is happily rendering away at the alternate frames at the same rate, but the highly uneven output rate means that we see almost nothing of these rendered frames.

    To me, the fact that Nvidia have bothered to sink time into supplying this tool to reviewers suggests that they believe this will not be an easy problem to fix i.e. it is somehow embedded in the Radeon hardware.

  10. “Probably” not. Vsync has other issues btw. Read the articles. The runt frames are NOT fully rendered.
    That is why Tech Report uses a filter to remove them which greatly alters the REAL fps number everyone relies on.

  11. Except for Fermi. They did end up releasing the consumer GPU first, but it was a server design.

  12. If you’re going to continue to define AMD fans as “anyone that vocally disagrees with me” and ‘the problem’ as “whatever I decide this week” (it was single card latency, now it’s Crossfire now that single cards have no issues remaining) then sure, you’re going to find a lot of people having trouble living in reality… yourself included.

  13. You’re fighting straw-men and pumping out nonsense arguments that aren’t backed up by fact. The only impressive thing about your post is its length and given the content that’s not a compliment.

  14. Check the Anandtech article on this, they had extensive input from AMD on the testing methods. nVidia may have supplied the kit, but AMD do not question the results it produces.

  15. While this is what I believe also based on this article and those I’ve read over the years, I guess it’s not completely obvious- thanks for spelling it out simply.

    It’s one of the reasons I’d like to see more game engines looking very closely at synchronizing with GPU output, and GPU output synchronizing with display frequencies, in such a way that input lag is not introduced as it can be with vertical sync today.

  16. You may be misinterpreting the data. The so-called “runt” frames probably are fully-rendered in the frame buffer. However, since they’re almost immediately obsoleted by a newer frame from the other GPU, they don’t last very many scan lines on the monitor before they are replaced. That’s what happens when you have VSync turned off.

  17. It’s because the Fraps counter for fps is triggered by pixel fragments. It does not differentiate fully rendered from runt.

  18. [quote=”Penut”<]It used to be amusing to read your posts, now it's just sad to me. I dunno how you don't see yourself behaving a lot like other people you describe.[/quote<]┐(‘~`;)┌

  19. Took me a while to find them, but I do have a few screenshots of bugs I’ve experienced with this card, although I should have taken more.
    [url<][/url<] [url<][/url<] Anytime I see someone spout nonsense about nvidia having better drivers, I just remember all the [url=<]issues[/url<] I've dealt with and laugh. It's complete nonsense. I've never experienced bugs like these with AMD. It's great that nvidia fixed the bugs, and I don't hold it against them, but the propaganda about better drivers is a complete lie. If nvidia has better drivers now, it's only because Fermi had so many problems *cough* TDRs *cough* and they've been working on fixing things for the last several years. AMD wasn't in that situation, so they're just now realizing their drivers need work. I think once AMD focuses on the issue, things will even out.

  20. He’s right- the ‘Quadros’ are usually 3-6 months behind in my observation. That most of Big-Keplers went to HPC before the consumer market was an aberration, and I’m still blaming AMD’s refocus on HPC with GCN on that one. They sacrificed game performance for compute, and in turn Nvidia was able to sell their half-size non-compute card that normally runs $250 for $500. Ain’t no point in selling a massive GPU like GK110 in a consumer product with margins like those!

  21. Thanks I don’t think i ever read that article cause I don’t care about BF3 but there is def some ugly stuttering on the second test. 13 fps lower 6850 with much less stutters than 560 ti.

  22. Yeah, that’s the excuse anyways, or TR just has a double standard, and marginalizes nvidia problems. TR did know about the problems when they were new cards, they just didn’t make a big deal of it, unlike what’s going on with AMD now. It got a sentence mention here or there, but perhaps because Fermi was sometimes faster in dx11 benchmarks, that outweighed/justified whatever problem existed at the time.

    Here are some older benchies. I just googled TR and BF3 for my original link, because I know TR’s mentioned the problems with that game. Quite frankly, if you all followed the news, you’d know this. Fermi wasn’t perfect.
    [url<][/url<] [quote<] Based on that data, we'd be tempted to recommend Radeons for use with Battlefield 3 on mid-range systems. Then again, the strongest message our data has to offer is that we need more of it. There are evidently stark differences between GeForces and Radeons in this game, and we wonder if, perhaps, other missions and levels would tip the odds the other way. Now, Nvidia did throw us a bit of a curve ball earlier today, releasing a beta GeForce driver with purported Battlefield 3 performance improvements for DirectX 10 cards. After some spot checking, we've concluded that the new driver doesn't improve either frame times or frame rates in our test scenarios. We still recorded astronomical numbers of high-latency frames in Rock and a hard place. Besides, the results you saw today were obtained with WHQL-certified drivers in a game that's been around in some form or other for about a month and a half.[/quote<] My brother has a 6950, and he's had no problems with it, compared to what I've dealt with. The ONLY redeeming factor of Fermi compared to the 6 series was how much faster it is using tessellation, and how much better it's gotten with driver updates. Plus I got my 470 on sale, cheap. The 6950 was the better card on release, and still is for certain scenarios. I think it has cleaner rendering and texture filtering, but it isn't optimal for new dx11 games. Since my brother primarily plays Starcraft2, CS, and Q-Live, the 6950 is the better card for his usage scenario, especially considering the power savings and stability. I play a variety of games, so the 470 suits me better for that. The driver updates have fixed a lot of problems, but I'm not going to pretend they never existed, because they did. The whole nvidia has better drivers spiel is a bunch of malarkey. There have been pros and cons with each side, and only recently with Kepler has nvidia really had any advantage.

  23. No I’m not. Fermi was notorious for stuttering and had all kinds of bugs early in it’s life, and certain games like BF3 have been broken permanently. Glad I don’t play it, via my boycott of Origin.

    [quote<]Darksiders ran flawlessly in 1080p on my 460[/quote<] NOW, and I'm not disputing that. It was fixed, but the game performed terribly on release. There was also all kinds of stability problems with flash video, that I NEVER experienced with my 4870, and several console port games like F.E.A.R. 3(?) would cause bluescreens, and the game Overlord had a transparency bug where tree leaves were grey boxes. Let me put it to you this way, I was seriously considering RMAing the card for several months after I first bought it. That's how BAD the drivers were. The bugs were INSANE, and the only reason why I didn't RMA it was because I wasn't the only one having issues, and not all games had problems, so I figured driver updates would resolve it, and for the most part it did. TR knows there have been all kinds of issues with nvidia in the past, like TDR's, and they've repeatedly marginalized these problems compared to anything that has happened with AMD. The double standard is ridiculous, and I'm not going to be swayed by it. My next card will likely be AMD, since nvidia is not offering any honest mid-range products. I'm not wasting $300+ on a bus crippled 660. $300 should buy a 256-bit card, and I won't take anything less for that kind of money.

  24. Maybe they didn’t make a big deal out of it because they were already 1-2 year old cards when they reviewed the 660ti. Are there any BF3 99th frame time tests for older Radeons to compare? Not trying to make an excuse for TR but I haven’t found anything that proves Nvidia’s 400/500 series had much worse stuttering issues than AMD’s 5000/6000 series OR if they did that TR knew about it when they were new cards and more relevant.

  25. Never thought I’d say this, but l33t-g4m3r makes a valid point in my eyes. The data clearly shows high frame-times for Fermi-based cards.

  26. [quote=”Sarge from Red vs. Blue”<]You're makin' that up.[/quote<] Darksiders ran flawlessly in 1080p on my 460, and again on 460 SLI. ♪~(´ε` )

  27. Quadros are usually out long after the respective Geforce they are based on. That was my point.

    Tesla is relatively new and an entirely different market. Usually it’s consumer first.

  28. [quote=”Bensam123″<]Disabling aero fixes this as well.[/quote<]Oh, I see. ヘ( ̄ー ̄ヘ) I don't use Aero on Win7.

  29. Hmmm for me the difference is pretty night and day. I guess this depends a lot on who’s using it.

    I also found out using a secondary graphics card stops the stutter thing (such as integrated graphics on a IB/SB). Otherwise disabling aero fixes it, but makes everything less pretty.

  30. Thanks for the heads up 🙂 I shall investigate. Was very happy with my Dell 2408WFP though, so maybe I’m just not that bothered by input lag etc. I don’t play many twitch-type games, about the worst is World of Tanks and that’s still a league below an actual FPS in terms of required response time – probably why I enjoy it! :).

  31. SiberX, I think it’s right to say that one goal for smooth animation is correspondence between the game simulation timing for a frame and when it hits the display. That was one point I was making. However, if we’re talking about the ideal, even spacing from one frame to the next is also important. So it making sure the interval from frame to frame isn’t too large.

    All of these ingredients are needed to generate animation that is “perfect” in theory. Some measure of each is needed to fool the eye well enough for any imperfections not to matter.

    SLI’s frame metering does adjust the timing of frame deliveries to the display some, but based on our data, it appears to function as a corrective, keeping the frame display times more closely correlated to the frame dispatch times measured in Fraps (and thus likely closer to the game engine’s internal timing when the frame’s content was generated). Without metering, Crossfire looks much worse on this front. In fact, the frame dispatch/display correlation for SLI frame metering looks a lot like the single-card correlation for the most part. (Note there is some timing “slop” between the single-card dispatch and display distributions, too, except in Borderlands 2, where triple-buffering is likely disabled.)

    So I think frame metering is probably a good thing.

    However, metering alone is not a magic solution, if the frame dispatch times are see-sawing all over the place. We need more data, especially from situations that challenge the GPU more, with low frame rates, in order to draw more conclusions. That is high on my agenda, believe me.

  32. WTF, Kool-Aid Drinkers. Fermi had horrible stuttering issues, and TR DID MENTION IT, but only several times in the sidelines. It was never brought to the front-page like the 7950 vs 660 Ti article.
    [url<][/url<] [quote<] A look at the broader latency curve further illuminates the problem. Frame times on the older GeForces stay happily below 40 milliseconds most of the time, but roughly three percent of the frames involve a much longer wait, spiking to 60 milliseconds or more, where you'd really notice the slowdown. We've seen this problem in certain levels of BF3 a number of times before on Fermi-class GeForces. Fortunately, the newer Kepler cards appear to have overcome it. In fact, Zotac's GTX 660 Ti AMP! hangs with the boost-enhanced Radeon HD 7950.[/quote<] This isn't the only time the stuttering in Fermi cards have been mentioned, but TR's always kept it on the down-low, instead of making a big deal of it. I have first hand experience of how bad the stuttering was, as when I first played darksiders with my 470, the game would stutter extremely bad in the floating tile sections, like every tile was individually being loaded into memory. Not a fun experience, although it was fixed down the road.

  33. Might want to check reviews on the Dell’s for input lag- Dell’s OSD’s and scalars have traditionally come with a price. HP tends to put the same LG panels in their monitors without said features and associated input-lag induced buffering.

  34. I don’t think there’s any doubt that x2’s suck for microstutter, but AMD’s drivers are fine for single GPU use.
    Both Bioshock and Crysis 3 run beautifully on my lowly [email protected]

  35. Ah, thanks for posting this, I’d been annoyed by the 120hz/60hz issue too. Switched to using my laptop for my second screen to avoid it!

    That said, having used a BenQ XL2411T for 2 months now as my primary screen, I don’t think I’m sold on it, even for fast gaming. Think I might pickup another Dell 24″ IPS screen, my last one lasted ages before it went funny.

  36. Maybe I’m not paying enough attention, but from a development perspective, server to consumer makes more sense.

    I’m running a Nahlem processor at home, the first generation i7, and while the consmer version was released first, it really is a server influenced architecture, and–no surprise–the Xenon series released later blew everything else out of the water. I think they even beat AMD’s hex-core processors with quad-cores.

  37. It happened [i<]once[/i<] (8800 series). It usually is HPC/enterprise first with consumer cards getting benefits on the side. Why else do you think the GPUs always have a massive capability of DP calculations that get cut out or disabled in the driver for the consumer version?

  38. I moved from a 5970 (both GPUs at 5870+ clocks) to a [i<]single 7870[/i<] and my experience was drastically improved in Skyrim. CrossFire is something I'll probably never touch again till they get this all sorted out. This coming from someone who loved his 4870X2 QuadFire rig but could never really get it to feel "smooth" even at ridiculous framerates.

  39. I’m not convinced that’d be a bad thing to be honest if it means they’ll get picked up regardless. It’ll certainly highlight the problem just as before (runt frames and dropped frames are effectively the same to the end-user).

  40. “We have conducted the bulk of our performance testing so far, including this article, with vsync disabled. I think there’s room for some intriguing explorations of GPU performance with vsync enabled.”

    I would love to see some vsync-on test. I know vsync off is great to see the power of the card running “flat out” to get a measure of total raw performance, but I think it would give us a insight on how each card, driver, game engine, API, etc, handles its job when under a partial load, not running at 100% all the time. If there is no difference, so be it, but I would be willing to be there is a difference between single GPU, Crossfire, and SLI.

    In my case, I typically load up a game, and max it out with vsync off, and if the minimum fps I can produce is >60, then I enable vsync and enjoy a locked in a 60fps experience, the card runs cooler and at a lower GPU % load, which gives it breathing room to handle spikes and more complex scenes. Of course if I find a game that dips below 60, I set the game to minimum settings and test each setting one at a time (or groups) until I find something that is the largest performance hit like dynamic shadows “off” give 150fps, and “max” gives 50, but “high” gives 90, so I go with “high”, and so on.

    On a side note (my motivations), I typically buy a single gpu, then after a 6-12months I may run across a game that requires more performance, so I drop in a 2nd gpu, have ran crossfire (5770) and SLI (6600gt, 9800gx2, gtx260, gtx460), and crossfire is the only setup I have run that I notice a stutter even with a game that is capable of 150+fps, but is vsync capped at 60. Although I must admit I only have that crossfire example to go by. With SLI it has always been perceived by my eyes as a smooth experience. But the things that most always seem broken with multi gpu, is there is always some kind of texture bug, like all (open-able) doors in GTA4 would flicker like bad fluorescent light, or water on Gamebryo engine games (Oblivion, Fallout3 and NV) would also flash, hard to explain, but it would look transparent, then hazy, then back as you walk past, etc. And this has been over 4 computers, all the way back to my NF4 SLI-DR Lanparty mobo.

    So I guess my point is, I can put up with occasional texture flukes, but am I also getting micro stutters? If so, maybe I need to rethink my strategy and ebay my single card, for a better single card, instead of doubling up.

    Edit: I did read the PCPer article that other linked elsewhere in the comments, but the issue I have with them is they picked a game (w/settings) that averaged below 60fps, so the whole time it was going from 30 to 60 back and forth, of course that is going to introduce hitches. What I am curious about is how to achieve “buttery smooth” 60fps, when you are limiting the game to 60fps (with a load that would have the the minimum fps higher than 60 with vsync off).

  41. a 4-1 scaling factor might squish out the runt frames completely though, showing them as dropped frames instead

  42. Just wanted to post and say “thank you” for such a long, detailed, and immensely interesting article. It was a pleasure to read, and just as importantly, presented a lot of technical information in a clear manner. (I was going to say “clear and concise” manner, but it’s hard to judge conciseness with such a huge article!)

    Great job, Scott!

  43. [quote<]Second, in a perfect world, game developers would expose an API that reveals the internal simulation timing of the game engine for each frame of animation. That would allow us to do away with grabbing the Present() time via Fraps and end any debate about the accuracy of those numbers. [/quote<] Couldn't you use an open sourced engine like Doom 3 for this?

  44. Intrigued by the Guild Wars 2 results. I just can’t get that game to “feel” as smooth as the frame rate counter indicates. For what it’s worth, setting the Reflections to “Terrain and Sky” (I think it is, can’t check right now) instead of “All” really enhanced the subjective “smoothness” of the game in my eyes.

  45. Not really a source for this as it’s pretty much what’s going around the web and what I ended up googling to figure out what was wrong with my monitor. No one has done a real article on it. Disabling aero fixes this as well.

    [url<][/url<] Google 120hz 60hz monitor stutter

  46. Have a look at the [url=<]PC Perspective article[/url<]. Particularly fascinating Crossfire results with Vsync enabled. The missing thing with their work is despite talking a lot in the introduction about the importance of comparing the frame times at the start of the pipe (FRAPS) with the end of the pipe (FCAT) to ensure smooth animation, they then fail to use the FRAPS frame times in the restof the article, just the uninformative FPS vs time plot which averages the data over one second intervals. I suspect the FCAT results in the plot of framerate against time are being processed in a similar way. Nowhere do they appear to be presenting the raw FRAPS frametime results, which is a shame.

  47. I hope so too even though I prefer Nvidia cards. Better benchmarking tools are better for everyone except for the diehard fanboys that wouldn’t switch brands no matter how much better the competition is.

  48. Did you even look at the graphs for Skyrim? It’s the least problematic of the games tested for Crossfire but it still has very odd frame timings that will result in less smooth gameplay.

  49. They might not legally be able to open source it due to some directx stuff in it but they said they plan to distance themselves from the project and want someone like the FRAP guys to take it over and make an open source version. The guys at PCP who have worked alongside Nvidia on it have seen the source though and said there’s nothing in it that would make it biased towards Nvidia.

    It’s not like its something new that Crossfire has had a ton of issues on AMD’s recent GPUs anyways so this would just be par for the course. It’s about time a tool like this was made.

  50. I’m about to go to bed so I could only skim, but is this open source? If not, why not open source it? Let us vet it, see for ourselves there are no shenanigans, and then we can all rest easy that a vendor supplied tool is actually something usable in the context of an unbiased review. Not only that, but then it can be developed and maintained further by the community.

    If NVIDIA is not willing to open source it, then I have to wonder why not? Is there some secret code in there that addresses their hardware in such a specific way that they don’t want us/AMD to know about it? If that’s the case then the tool can’t be trusted.

  51. [quote=”Bensam123″<]For instance there is a glaring bug in Windows 7 in which if you have two monitors hooked up, one running at 60hz and one faster then 60hz, it results in occasional stutters in whatever you're playing on this system. Strangely enough this happened with my i5-3570k, but when I switched to a FX8350 it went away. It occurs most noticeably if you have a video of some type playing on one monitor and a 3D game going in the foreground on another.[/quote<]Source? This never happens to me. I have one monitor at 120/144Hz and two more at 60Hz (and a fourth USB display at 60hz). Everything works fine.

  52. [quote=”l33t-g4m3r”<]No it hasn't. This all started with Fermi, as those cards had horrible stuttering problems (which only got a small mention by TR), and nvidia recognized this and has been working on it ever since.[/quote<]Unsubstantiated statements ahoy! Fermi has no problems. It is the best chip Nvidia has ever made; Kepler is a grotesque hack for the purpose of reducing power consumption by moving large portions of the chip to the driver software. Your mother was a hamster and your father smells of elderberries! (・`ェ´・)つ

  53. As someone who has Vsync enabled at all times, I’d be interested in seeing some benchmarks with it on.

  54. Not at all. I just think the buyer should have an accurate assessment of performance. These new metrics will be good for consumers going forward and inspire GPU companies to do better.

  55. It used to be amusing to read your posts, now it’s just sad to me. I dunno how you don’t see yourself behaving a lot like other people you describe.

  56. It’s not that I’m saying I’m against having multi-GPU options in Linux… it’s that given the issues even Windows has, and given the great expense of a worthwhile multi-GPU setup, I’m not really missing out on much. Given that even under Windows the SLI users are a *small* minority, it’s not like you’ll be required to use SLI to play games anytime soon.

  57. With Valve pushing Steam to Linux and Apple running a *nix-based OS, you’d think people would want more options for Linux, not less. Linux even makes sense when you want to make a customized/focused system, and you still need more than one physical GPU if you’re going to span a game across multiple higher-resolution displays (~4MP+ each). It’ll definitely be needed for 4K.

  58. Don’t sell yourself short- while AMD may have been able to improve the situation with drivers, if they’re really focusing on this issue, the next generation will be a much bigger jump in performance than what we saw in the last transition, and AMD might quite rightly have delayed releases for fine tuning in gaming experience. It’s not like their current products are lacking in the Professional and HPC spaces for applications that can make use of them, and Nvidia is quite happy selling Half-Kepler for more than they were getting for Full-Fermi!

  59. I can’t say you’re right or wrong on your core point, but I do think that it’s a valid question to ask, that will likely never be answered.

  60. [quote<]Frame delivery is 100% a software issue...[/quote<] ...really? Like, the hardware isn't a factor at all? [quote<]Anyway let the man gloat, he deserves it.[/quote<] Oh, yeah, totally agreed. That's why I suggested he treat himself to some Olive Garden, or some Bailey's Irish Creme.

  61. Oh, where to begin parsing the bullshit… let’s start here:

    [quote<]total and complete conspiracy theorists to the point of refusing to believe any wrong doing by AMD, while criticizing everything and everyone to "justify" AMD's faults and ignoring reality.[/quote<] How am I refusing to believe any wrongdoing by AMD anywhere in my post? Moreover, where does it appear that I'm sucking AMD's knob in my post? [quote<]It can't possibly be that AMD is facing problems...[/quote<] No, really, it can be -- every time an article about AMD pops up, 90% of commenters open their caves to bitch about how shitty AMD is because they're not keeping 100% performance and power efficiency parity with Intel and Nvidia. People who point out that AMD has about 1/20th the budget of the both of them get downvoted into oblivion, so what were you saying again? [quote<]For the AMD fanboys it couldn't possibly be true what the numbers glaringly showed and the uproar reached a point that Scott followed up the article with two other articles to prove what already had been proven...[/quote<] Jesus, it's like you think I care which company is winning. I do, because I like AMD's approach, but if they can't cut it, they can't cut it. People like YOU are the ones who insist that AMD's lackluster product performance MUST be because AMD is incompetent and because of internal management issues, and who simply refuse to believe that having considerably fewer resources demonstrably affects their ability to compete with a behemoth like Intel. But oh wait, yeah, I'm the fanboy for suggesting that Scott's article, which DID fundamentally change how graphics cards are reviewed, might've made an impact on AMD's product releases.

  62. I’m curious as to what a LucidMVP setup with integrated/discrete chips looks like to FCAT…

  63. [quote<]All this nvidia tool has done is verify what they've said is accurate.[/quote<] I hope AMD returns the favour when Nvidia screws up.

  64. [quote<]Those biased folks at Nvidia are really trying hard to make AMD look good![/quote<] By showing how bad the Crossfire setups are? In a year, nobody will remember that the single-GPU setups did alright and even better than expected, but many people will still remember that Crossfire is horrible and should be avoided at all costs (even if the issues have long since been fixed!). That is a blow to AMDs reputation, regardless of whether multi-GPU performance is relevant to you and me. As such, releasing this FCAT tool that clearly paints Crossfire in a bad picture (by multiple reputable tech sites!) is a pretty good PR move by Nvidia. We might see right through it, but that doesn't mean Joe PC Gamer will.

  65. nvidia needs to make the entire program open source not just a couple scripts if they want reviewers to use the tool in reviews. if its not open source i’m not sure how reviewers can justify using an nvidia tool to compare amd and nvidia cards and then make a recommendation off of the results. it will also make it very easy for people to say your biased and to question how valid your tests and results are. then you mention something about nvidia adjusting for “looser tolerances” with amd cards. how do we know those tolerances are correct? seems nvidia already screwed up when measuring amd’s frames. i don’t think nvidia is doing anything shady. i think they saw an opportunity with amd struggling on frame latency and took advantage of it. open source or ditch it.

    also in borderland you should be playing as the gunzerker and either dual wield 2 guns with different elemental effects that have a fast rate of fire or use a rocket launchers like the Nukem. then get the skill so you throw 2 grenades instead of 1 and use something like the Bonus Package or Rolling Thunder and spam them when gunzerking and shooting. this would be much better then walking around as a sniping doing nothing.

  66. You mean it’s convenient that NVidia made a consumer product of one of their server components? Yeah, like THAT has never happened before…

  67. Thinking about it further, I believe you’re right- [H] was using one second averages, which means that they only saw the large stutters on their graphs- and that the stuttering they mentioned in subjective analysis was the only point where they could address high frequency stuttering.

    Still, they did rightly identify Crossfire as having this issue quite early on, and accounted for it in their ‘maximum playable settings’ data.

  68. I love internet misunderstandings, they’re so much fun!

    I’m equating instantaneous framerate, or the ‘framerate’ of one frame, to frametimes. It’s no different than the speedometer on your car.

  69. Correction/clarification:

    Instantaneous framerate, or the framerate of one frame, is the same as frametime.

  70. The results of FCAT and Fraps certainly don’t surprise me at all. I used to run a Crossfire setup (5870s), but went back to a single 5870 because the Crossfire setup felt slower than just a single card in many games, especially BF3. It’s good to have that feeling quantified.

  71. Pot calling the kettle black ?

    Their was some very valid issue brought against TR, its just lame that people always muddy the water to hide what they dont want to look at. (all sides being guilty)

    And its very easy to be biased, so I can see why people question review site testing choices.
    The right pick of games, the right pick of drivers, etc.. and you can show a card being better then another. Change the games, testing methodology, and the table turns.

    [url<][/url<] The 660ti is totaly spanked by the 7950. Its even slower then a more power efficient pitcairn 7870. And even in the same game the balance can switch from one card to another. Testing at medium vs high quality totally changes to workload (mainly shaders) so if an architecture is compute heavy its advantages grow.

  72. [quote<]nVidia has been smoother for years.[/quote<] No it hasn't. This all started with Fermi, as those cards had horrible stuttering problems (which only got a small mention by TR), and nvidia recognized this and has been working on it ever since. For the most part, I'd say these issues were non-existent until we got the added complexities of dx10+, combined with outdated vsync technology. Video card output needs to move past CRT refresh tech, like spiked_mistborn said. Either way, this is something I believe can be easily fixed in the drivers, given current results.

  73. Except that, at least with current drivers, Radeons are not producing significantly worse frame latecies in single GPU configurations relative to their competing Keplers (for the most part). (Though this still has to be confirmed by TR). Games that were particularly problematic in this regard (eg, Hitman: Absolution and Far Cry 3) have been addressed in Cat 13.3b

  74. Folks, this post right here illustrates what I’ve been saying about AMD zealots: total and complete conspiracy theorists to the point of refusing to believe any wrong doing by AMD, while criticizing everything and everyone to “justify” AMD’s faults and ignoring reality.

    I’ll admit that I never read something quite like this (i.e. an article delaying the release of a product). It can’t possibly be that AMD is facing problems (which coincidently is what is assumed EVERYTIME for anything by NVIDIA) or that AMD simply decided to change their roadmap for cost effectiveness. NO! It must be the article!!!!

    The last time I saw this behavior was right here and regarding the same issue with AMD cards, where Scott followed the article that shows the 660 Ti spanking the 7950, which was received by AMD fanboys as an offense, with more articles! For the AMD fanboys it couldn’t possibly be true what the numbers glaringly showed and the uproar reached a point that Scott followed up the article with two other articles to prove what already had been proven…

  75. You’re implying active dishonesty when it’s pretty clear that AMD would not want this to be happening, and by all accounts seem to have been totally unaware that it was. Though, in some people’s eyes, that might actually be worse…

  76. Yeah. And wow I was on a tear yesterday.

    Originally the world was simpler and was in a way more deterministic.

    It goes vaguely like this for modern drivers ‘vsync off’ (sorry long post again, and keep in mind that the driver ‘render’ CPU thread is actually running in parallel not lockstep (on dual+ core) and that the GPU is also in parallel, just trying to illustrate):


    1) engine api calls made
    2) driver setup for its CPU side “render thread”
    1) engine api calls made
    2) driver setup for its CPU side “render thread”
    3) render thread executes pushing commands to hardware (pipe 1)
    1) engine api calls made
    3) render thread executes pushing commands to hardware (pipe 1)
    2) driver setup for its CPU side “render thread”
    3) render thread executes pushing commands to hardware (pipe 1)
    1) engine api calls made
    2) driver setup for its CPU side “render thread”

    4) engine calls blocking “scene complete” of some kind (FAPS measures here)
    3) render thread executes pushing commands to hardware (pipe 1)
    3) render thread executes pushing commands to hardware (pipe 1)
    5) engine thinks it waits til pipe flushed and ‘flip’
    6) engine measures time (2ms elapsed total from last #6)
    7) other stuff
    8) engine runs animation using some kind of time step formula


    1) engine api calls made (frame 2)
    2) driver setup for its CPU side “render thread” (pipe 2)
    3) render thread executes pushing commands to hardware (pipe 1)
    1) engine api calls made (frame 2)
    2) driver setup for its CPU side “render thread” (pipe 2)
    3) render thread executes pushing commands to hardware (pipe 1)
    1) engine api calls made (frame 2)
    2) driver setup for its CPU side “render thread” (pipe 2)
    3) render thread executes pushing commands to hardware (pipe 2) ***
    1) engine api calls made (frame 2)
    2) driver setup for its CPU side “render thread” (pipe 2)
    3) render thread executes pushing commands to hardware (pipe 2)

    [GPU FRAME 1]

    9) GPU hardware reaches ‘end’ of (pipe 1) marked earlier by “scene complete” and flips output to (buffer 1) (FCAT presumably here)

    1) engine api calls made (frame 2)
    2) driver setup for its CPU side “render thread” (pipe 2)
    3) render thread executes pushing commands to hardware (pipe 2)
    10) **** call happens to saturate GPU ****

    4) engine calls blocking “scene complete” of some kind (FAPS measures here)
    5) engine thinks it waits til pipe flushed and ‘flip’
    6) engine measures time (5ms elapsed total from last #6)
    7) other stuff
    8) engine runs animation using some kind of time step formula


    1) engine api calls made (frame 3)
    2) driver setup for its CPU side “render thread” (pipe 1)
    10) **** call saturates Render thread ****
    3) render thread executes pushing commands to hardware (pipe 2)
    1) engine api calls made (frame 3)
    11) **** engine actually blocks here ****
    2) driver setup for its CPU side “render thread” (pipe 1)
    3) render thread executes pushing commands to hardware (pipe 2)

    4) engine calls blocking “scene complete” of some kind (FAPS measures here)
    5) engine thinks it waits til pipe flushed and ‘flip’
    6) engine measures time (50ms elapsed total from last #6)
    7) other stuff
    8) engine runs animation using some kind of time step formula


    [GPU FRAME 2]

    9) GPU hardware reaches ‘end’ of (pipe 1) marked earlier by “scene complete” and flips output to (buffer 1) 15ms since last (FCAT presumably here)

    Now theoretically this evens out and the idea is that you are trading for parallelism and speed. The problem is that this is real time animation.

  77. I think something like that would only work for full-screen gaming. The display would have to switch back to a standard refresh rate in desktop mode. Or, I suppose if the implementation was done correctly, you could run a game in window mode and only update the portion of the screen that the game occupies. This goes back to my idea of the display having its own frame buffer, and leaving it up to the display to handle updating the screen. Just have the driver send data from the region of the gpu FB that the game occupies to the same region on the display FB.

  78. Well done Scott! Kudos once again for starting all of this and now seeing it rewarded with better tools for the job!

  79. Cheers! I’ve been saying that for quite a long time and I just got a whole bunch of thumbs down 🙂

  80. Don’t be ridiculous. Is reality so hard that AMD fans don’t like to live in it ? The info before FCAT (using FRAPS) already proved the problems that AMD had here. Why do you ignore those ? It’s a rhetoric question…I know very well why you ignore them.

    FCAT is just a better tool to judge what was already provable with FRAPS.

  81. I absolutely loved this article, and it’s uplifting to *finally* see a usable set of tools becoming available to actually analyze the frame cadence of a game’s output at multiple points along the pathway (including one very close to the actual user).

    While Scott touched upon it, I think that even more emphasis needs to be placed on the relationship between FRAPS (game) frame times and FCAT (display output) frame times, especially in relation to any trickery a graphics card might be doing to “even out” frame delivery at the tail end of the pipe. I understand (and can even see, from the excellent graphs in this article) that nVidia is doing some work on their cards to smooth out frame delivery (especially with mult-card setups). This is commendable, I’m glad they’re making efforts to address the evenness problems.

    There’s a *very important point* perhaps being overlooked here; what a GPU’s goal should be is *not* delivering perfectly evenly spaced frames; it should be delivering frames *with the exact same timing/cadence as the input it receives from the game engine*.

    This is a very important point, because if a game engine has systemic problems that cause it to deliver uneven frames (say, it alternates between 5ms and 20ms) then a GPU trying to even out that frame delivery will introduce *worse* stutter as its frame contents will be presented at times that differ from the game’s internal timing. A GPU should make the assumption that a game is delivering frames at valid simulation times vis-a-vis the frame presentation times (ie: if it receives a frame 5ms later than the last one it must necessarily assume that frame’s simulation time is 5ms later, and the same for a 20ms delay) because it has no information to prove otherwise (it can’t see inside the game engine, and a properly working game engine should be delivering even simulation time anyways).

    It is not a GPU’s responsibility to “correct” for unevenly spaced frames fed to it from the game engine (and it doesn’t have the capability to do so anyways, without somehow magically altering the content of the frames) – but if it is *exaggerating* the timing issues with the frames being fed to it (or simply introducing them) then it should be striving to correct for any jitter *that the GPU* (and its associated bits like DirectX, drivers, etc…) is adding to the pipeline. It should not be trying to evenly deliver a frame every average time interval; depending on how poorly the game’s frame output jitter is, this will more negatively impact perceived experience than doing nothing at all.

  82. I hardly think AMD have done that deliberately.

    Would be interesting to see how many frame lines are output per second.

    Ideally you’d see 60fps x 1080 lines = 64,800. I wonder how many you actually get from different cards.

  83. One other thing: I think it should be considered that the perl scripts be moved to Python, and using Numpy/scipy and also Matplotlib, I think that would give much better visualization tools.

  84. Any method that identifies runt frames from fully rendered frames is good for the consumer and should be a standard metric. Including a sliver of pixels to pump up fps numbers is not honest.

  85. Well I’m glad you’ve got to the bottom of micro-stuttering TR. The micro-stuttering issue and the seemingly mysterious and elusive reasons for it happening had turned me off of multiple GPU setups these last few years. This looks like the information that the public needed in order to have both companies seriously address the issue.

    Truly enlightened journalism, this was one article I put other things off for in order to read it. You’re the best TR.

  86. Wow. Amazing the difference an article from someone comfortable with this type of testing reads vs what’s coming from others just getting their feet wet. Absolutely stellar stuff.

  87. Anandtech can’t hear you with the pile of money they swim in. They are a MUCH larger site than TR. Their mobile reviews are vastly more popular than Graphics card reviews.

  88. Ugh. Not this again. This is an absurd notion. I’ve already gone through the logic of why it is absurd:

  89. Correct, but I was only talking in terms of jitter and timing before the signal reaches the display panel.

    It should be quite possible to have a 500Hz LCD, and while I may not get 500 fully-done-transitioning frames I can still output 60fps to it if desired. What the 500Hz get me is pretty much identical output whether 60fps or 59fps is rendered by the game, since a de-sync no longer results in waiting another full 16.7ms frame time, but only 2ms. This would have other benefits like reducing input lag, and reducing RTC overshoots (potentially allowing more aggressive compensation without artifacts).

  90. Battlefield 3
    Guide: How to Fix Low FPS – Stuttering – Lag
    [url<][/url<] There is a well documented stuttering fix for both Nvidia and AMD users on multiple forums. I've tried this for my HD 4870 Crossfire setup and it works. This particular user from the above link has a NVIDIA GTX 470. 5.Open notepad and paste this into it and save it as "user.cfg" inside your "program files/origingames/battlefield3" folder: RenderDevice.TripleBufferingEnable 0 RenderDevice.ForceRenderAheadLimit 1 WorldRender.SpotLightShadowmapResolution 256 WorldRender.SpotlightShadowmapEnable 0 Render.DrawFps 1 With this applied to the game, are there any differences? Render Ahead seems to really affect these results and it would be nice if it were tested with FCAT. The default render ahead limit is 3. Thanks

  91. The problem isn’t just display connectors and bandwidth, but the actual display itself.

    Response times on monitors just aren’t that fast, the times that manufacturers quote are really quite misleading in terms of what the user actually sees.

  92. How would multiple programs trying to update the display work though? Imagine having a game trying to update at 90fps in windowed mode, and a video playing back at 24fps. Should the display update with every game frame or every video frame? There’s not enough bandwidth to do both if the interval between a game frame and video frame becomes too small.

    A possible alternative would be to just build a display updating at something like 500Hz, then push a new frame to it whenever one is complete. Even with vsync you’d never have to wait more than 2ms for the next refresh, and any jitter from a mismatch of the program update rate and display update would be limited to ~2ms. Of course you’d need display connectors and electronics capable of much higher bandwidth than are available now, but considering how little change there has been over the past decade we’re due for some new technology anyway.

  93. Interestingly sometimes FRAPS does seem to ignore inflated fps numbers (in some games). Bit of a mystery why this is though.


  94. “framerate over time” is nothing like “frame time”

    Frame time in a 60 second run at 80 fps in Skyrim = 5000 points of data

    Framerate over time in a 60 second run at 80 fps in Skyrim = 60 points of data

    Which one tells you more? FPS simply leaves out way too much information, even when you show it over a period of time.

  95. Scott, the capture card seems capable of some kind of scaling. Even with 4-to-1 scaling, you would be able to do frame time analysis with an accuracy of four scan lines (~0.05 ms) instead of one (~0.01ms), which is a marginal drop. Playback would be in lowly Symbian resolution but it would still show the stuttering and tearing artifacts.

    Have you considered using scaling to reduce the amount of data and data rate?

  96. [i<]Quote from the article's conclusions: [b<]"The presence of these things in our benchmark results may not be all that noteworthy if overall performance is high enough. The stakes are pretty low when the GPUs are constantly slinging out new frames in 20 milliseconds or less."[/i<][/b<] I don't agree at all. In fact, I couldn't agree less: I understand that you're giving a nod to AMD and to what they told you guys of the press two days ago, because someone might think it looks like this is all almost a crossfire conspiracy from Nvidia and you both, but the conclusion above is absolutely wrong. OF COURSE it's noteworthy even when performance is high enough, how could it not be when a $800 dual gpu solution cannot do any better than a single $400 one? Aren't this kind of conclusions some of the biggest achievements from the new testing method? Truth is Crossfire finally revealed itself as a complete and utter waste of money, with AMD well deserving to come out of this bashed to the ground. Forgive my bluntness, but managing to worsen frame presentation during the rest of the rendering pipeline is something not even their biggest detractor would have expected, it's so much of a fuck up that I almost understand your intention to soften the blow. They need to lighten up. Luckily the same cannot be said about single Radeons, which I'm happy to see manage to improve a bit at display level. But man, those CF graphs are embarrassing.

  97. I think that’s pretty far-fetched. Many of the original inside the second articles flagged up issues on cards from both vendors in various different games. It was really only the 7950/660Ti review where the bulk of the problems fell at AMD’s feet. Subsequently they’ve managed to turn things around pretty quickly through driver tweaks, bearing fruit in the latest round of reviews. That suggests that much of the core problem is software rather than hardware related, at least in single GPU performance.

    Having said that, the Crossfire issues flagged here could still turn out to be hardware-based: it will be interesting to see if a quick fix is possible here as well. My suspicion is that Nvidia would not have bothered to put these tools in the hands of reviewers unless they knew it was going to cause a significant problem for AMD that would be difficult to fix. It will be interesting to see what happens next.

    Getting back to the point, I think the delay to the real 8000 series is much more likely to be simply a reflection of the current market situation.

  98. Frame delivery is 100% a software issue, the underlying hardware has very little to do with it at this point, with only Nvidia having additional metering tools that help them sync multiple GPU, but that’s driver driven anyway.
    That’s not to say GPU manufacturers might not develop hardware means to help smooth framerate presentation in the future, but on PC most of the issue is so much game engine/direct3d related that only the drivers can manage to properly manage it. An hardware aid would definitely be nicer, but I doubt high end SI have been delayed for such a minor thing.

    Anyway let the man gloat, he deserves it. Others would have slapped everyone in the face with much more vigour.

  99. Absolutely not, they’re two entirely different measurings and the fact that at this time, after more than one year of TR’s new testing, you still confuse the two, pretty much tells the whole story about what you understand of GPU benchmarking as a whole.

  100. No, their FPS vs time plot is part of the standard FRAPS output and averages the framerate over one second intervals. It is not the same as the inverse of the frametime plot and will not pick up the high frequency stuttering seen in the frametime analysis.

    Having said that, I always thought it was useful to have their subjective opinion on card performance as well as the objective results from everywhere else, so hats off to them for ploughing their own furrow. Just a shame the editor has somewhat challenging tendencies.

  101. I haven’t really been seriously following the developments in the frame time latency issues that were noticed last year with AMD cards. Recently, TR’s take on the 7790 (or was it 650 Ti Boost) seem to point out a role reversal between AMD and Nvidia. So are AMD’s frame time issues fixed? Nvidia also seemed to have this sort of problem a few generations ago, IIRC. Hopefully frame time latency will be a thing of the past soon.

  102. No, because Fraps counts runt frames as fully rendered, inflating the fps number, as TR has shown.

  103. Superb bit of digging!

    If you guys ever wind up with way too much time on your hands, it would be interesting to run older ATI/AMD card + driver combinations. One wonders how long the CF stutter has been around. The bigger question is, “Why has it never been addressed?’ Certainly, there must (?) be software and hardware engineers who have been aware of these issues since forever. My inner cynic strongly suspects that some Pointy Haired Manager decided it wasn’t a ‘significant problem.’

  104. Fraps does not tell the whole story. Not sure why top of thread was down voted.

    Fraps is a better measure in my opinion if you had to pick one, as probably what the engine is using to time its animation. But you need to see results of somthing like frame captures for it to have any real meaning. The swap request, as things stand now, is not when the frames are updated.

    [quote<] It would be very helpful to have an API from the major GPU makers that exposes the true timing of the frame-buffer flips that happen at the display. I don't think we have anything like that now, or at least nothing that yields results as accurate as those produced by FCAT. [/quote<] So I'd say this is true and FCAT is another valid data point. FCAT type setups showing the page flip, minus the latency variances of the actual display that you dont get captured, plus any latency variances or inaccuracies of the capture rig, is as good as it gets at the moment. If there was an ubiquitous API exposed that would be nice for *everyone*. In modern terms (last few years of drivers for sure), the actual flip is when ever the hardware gets done, and just flips, so it doesnt have to round trip again, and does this on another thread than the one that called "flip". So theres no way excpet waiting for an interupt back that says "hey, i did your s*** at this time". I am unaware of any such 'real' callback supported and publically available. It was supposed to be when blocking on flush() and friends,which was the point, I block until pipe clean,and assume then swap or block til vsync, measure, run calcs, repeat. [quote<] With such an API, we could collect end-of-pipeline data much easier and use frame captures sparingly, for sanity checks and deeper analysis of images. [/quote<] Well one problem is that you an have it, or measure it. Hooking the 'flush/flip' API using the render thread the engine is one thing (if thats what FRAPS is doing). Calling out of band on another thread or getting calledback on another thread and some time slice of whateverwill gum up the results probably. [quote<]Second, in a perfect world, game developers would expose an API that reveals the internal simulation timing of the game engine for each frame of animation. That would allow us to do away with grabbing the Present() time via Fraps and end any debate about the accuracy of those numbers.[/quote<] You have the same problem we all do on this side of the fence. It exists for the reasons above. Its what you see, if the FCAT says 18ms, 20ms, 25ms, and FAPS says 50ms, 10ms, 10ms, then my sample is 50ms worth of time (with a streight lerp) for the next frame not knowing 10 and 10 are coming and not knowing only 18ms passed in reality for the last frame. 🙂

  105. Crazy notion:

    Scott, I can’t help but notice that you’re awfully liberal in patting yourself on the back for being the first site to test GPU gaming performance via frame render times in addition to FPS. Don’t get me wrong, it’s a solid testing metric, and you certainly seem to have started a thing that other reviewers across the interwebs are starting to really warm up to.

    But honestly? What if it goes even further than “other reviewers?”

    Your watershed article, “Inside the second,” was published on 8 September 2011 (at least, that’s what the date on the article itself says). I think we can effectively say that, before that, both AMD and Nvidia had probably been focusing on improving FPS, since that’s where journalists and gamers were heavily focused. However, even AFTER the article, it took awhile before others really warmed up to the idea, and it wasn’t long after September 2011 that the GCN architecture debuted. Given my limited knowledge of the process of engineering and then manufacturing and then distributing semiconductors, I don’t think it’s a stretch to suggest that GCN was developed [i<]without[/i<] frame-times in mind, but rather, with FPS in mind. The "old way," so to speak. I think it's fair to say that the Radeon 8000-series cards were on track to be released in the same yearly release cycle that today's immense competitive pressures require. I mean, seriously, who releases MOBILE versions of their next generation graphics cards, and then suddenly balks and says "Oh, you know, we just figured the 7000-series was good enough." Right about that time, the 7000-series, which were VERY competitive upon their initial release, were suddenly showing lackluster performance in frame render times. Call me crazy, but what if your review, and the subsequent adoption of its testing metrics by other websites AND by GPU buyers, influenced AMD to delay the 8000-series cards so as to ensure that there wasn't any sort of a [i<]hardware[/i<] drawback to frame render time performance? What if, all things said and done, your article was the reason the 8000-series has been delayed? I have no way of verifying this, as I only have my "Armchair Expert" and "Professional Internet Bloviator" certificates hanging over my desk, but I dunno. It's just something that struck me. AMD is under intense competitive pressure from all sides. I don't think they can risk a failed product launch, under any circumstances. If this is the case, and we may never know, you should probably do more than pat yourself on the back. Treat yourself to some Olive Garden, or maybe some Bailey's on the rocks. Addendum: Shit like this is why I come to the Tech Report. You guys are, hands-down, the best tech review site on this spiral arm of the Milky Way.

  106. Fraps tell you exactly and accurately how many frames are rendered and displayed.
    The issue is that it doesn’t tell you the exact display time of a given frame.

    For example if you where to play a movie tracked with fraps, it will report 24fps.
    but because you monitor is running at 60hz, the frame cadence will be 50ms, 33ms, …
    On a TR graph this would look like some horrible stuttering.
    But its actually correct.

    The perfect benchmark would be for the game to time stamp its game clock and see how its displayed witth he capture tool. Because the frame overall might be displayed at a smoother interval, but that interval might not match the game clock interval. Result, stutter.

    It might be counter intuitive, but if you render game frame at irregular interval, you want to display the frames at irregular interval. Because what matter is not when a frame update, but what time the data in the frame represent.

  107. Anandtech was poo-poo’ing this thing up until the moment AMD actually said they had a problem.

    Even in that article they wrote, they must have wrote paraphrasing of the quote, “FRAPS is not indicative of real issues, BUUUUT…” in about 25 different ways through the course of their explanation of why they’d not done anything sooner.

    Honestly, my impression is that their initial statement that Scott had found something new for the first time since Dr. Pabst was an admission of outright jealousy at Scott’s discovery. The rest of the article is spent explaining why FRAPS was a bad tool to use and it shouldn’t have worked, but… ….somehow it did.

    Those poor guys. Perhaps if they weren’t lost doing mobile device and Apple reviews they wouldn’t have gotten lost in trying to understand how Scott was right and they were wrong. But hey, the truth won out. TR showed them all how it was done.

    Review sites like that and others who dismissed the whole thing are the reason AMD could wear the blinders on how jerky their cards have been for years and get away with it.

    Hell, even now, you have people in these very comments trying to make this about nVidia slandering AMD instead of rightly blaming AMD for failing to do their job properly.

  108. The overlay does look like some sort of image compression artifact. Reminds me of stuff run through low-quality jpeg compression.

  109. Micro stuttering is actually pretty common in US broadcast because so much went through 3:2 pulldown. Where one frame duration is encoded at 50ms the next one at 33ms
    And the content of the frame will not even match its display time.

    On another note, I think vsync got a bad rap because of buggy drivers.

    But my biggest issue is with games loading assets synchronously during game play.
    It kills me (virtually literally) to move the mouse and be faced with a 500ms computer brain freeze.

  110. Convenient how Titan shows up just in time for a statement like this to be made in regards to an issue as long in the making as this. Pretty slick, nVidia.

  111. How could AMD work together with nVidia? nVidia’s been working on this for years. AMD only just recently even figured out this was an issue.

    Do you really think if nVidia showed up and said, “Hey, AMD, your products suck and here’s why,” that they’d have listened? No. In fact, do you think AMD would have heard any of this had an independent reviewer not showed up and said, “Hey, something crazy is going on here!”?

    Nope. AMD would have just shrugged and said what they always said to reports of micro-stutter. “Working as intended.”

    Now they’ve changed their tune, but only because the problem got so bad finally that large numbers of people finally saw what’s been problematic for years.

    It’s a case of AMD didn’t want to admit to a problem they wish they didn’t have. Especially when they were always making new drivers for a new set of cards just on the horizon. Conveniently, they now have plenty of time to work on drivers because they’ve delayed the cards that should have come out now-ish to the end of the year and subsequently delayed whatever was coming after the now-ish cards to a year from the end of this year.

    Odd how they just freed up their driver teams to refocus on these issues now, right? What a coincidence! Or maybe… no. Well, maybe… just maybe… these two things are related.

    I love how people read how nVidia knew about this all along, took the time to build a way to measure it, and it’s, “Ohhhh, nVidia’s nefarious! nVidia’s out to get AMD!”

    Sure, nVidia’s out to get AMD. nVidia wants to sell its cards and not AMD’s. But do you know who you should blame for AMD’s stuttering? AMD. They’re the ones caught completely flatfooted by a longterm issue.

    And I suppose an AMD user who didn’t see the issue and discounted it for years because they never had a point of reference to see it, well those saps may be partially at fault, too. Next time an nVidia user tells you that the AMD cards may have higher frame rates, but they don’t seem as smooth…

    …listen to them. nVidia has been smoother for years. Now you know that’s a factually true statement. Even AMD admits it.

    This begs the question. Why are you doubting nVidia when AMD’s the one that dropped the ball so completely?

  112. Are you kidding me? The FCAT plots show AMD in a much better light than FRAPS.

    Look a page 5, FRAPS vs FCAT in Skyrim, time spent about 50/33.3/16.7ms
    FCAT makes the 7970 and 7970CF look excellent

    Those biased folks at Nvidia are really trying hard to make AMD look good!

  113. Great to see TR exposing the real performance of GPUs, as Fraps alone clearly does not tell the whole story.

  114. From what’s been posted on PCPer, it sounds like Nvidia and Ryan have been working closely together on getting all this to work for a long while now.

  115. I agree that measuring input lag is something that really needs to be looked into. You could maybe measure it by having a device (any Raspberry Pi people reading this?) that would simulate mouse movements or button inputs in specific, measurable patterns at 60Hz, and also use a sensor that monitors the display output somehow.

    You also make a good point about the multi-gpu setups as far as input latency is concerned. It seems like the current AFR method would actually make input latency worse. Maybe they need to have both cards working at the same time and do actual scanline interleaving like the 3Dfx cards used to.

  116. Sorry for the wall of text. lol

    Yeah letting the display do the flip itself in its own way would be neat.

  117. [quote<]Nvidia's FCAT scripts offer the option of filtering out runt and dropped frames, so that they aren't counted in the final performance results....By default, the script's definition of a "runt frame" is one that occupies 20 scan lines or less, or one that comprises less than 25% of the length of the prior frame. I think the 20-scan-line limit may be a reasonable rule of thumb, but I'm dubious about the 25% cutoff. What if the prior frame represented a big spike in frame rendering times?[/quote<]This is an important point. Sure, 25% of an average frame is probably not a problem, but typically what happens on harsh stutters is that the recovery is jarring. Filtering that out would reduce the impact of a runt frame when it is most relevant. Since FCAT represents a new toolbox for analyzing frames, any ideas on how the metrics might change? As touched on with the filtering mechanism, you definitely want to penalize runt frames when part of a stutter recovery, and I could see dropped/runt frames as a problem with tearing if things get particularly bad.

  118. I know it would be difficult to implement something like this due to the wide range of different players in the display industry, but I think this is where we need to go. It seems like all of these current methods that are used to try to smooth out animation, so that it comes out correctly at the display’s refresh rate, are like trying to fit a round peg into a square hole. We’re trying to fix the problem the wrong way, and it’s creating problems of its own. If we let the game drive the refresh rate the benefits would be:

    less latency from user input to a displayed frame
    no tearing
    no vsync quantization
    no “runt” frames
    smoother animation
    better experience on lower-end hardware as long as it can deliver movie like frame rate (24Hz?)

    On the display interface side, it could use something like lossless compression similar to how mpeg works, where only areas that change need to be sent. That way it could use very little power if nothing is going on.

  119. Patience…

    [url<][/url<] [url<][/url<] [url<][/url<] Focus on the direct drive thing when you read. Baby steps.. ------- Something similar to what you are saying could have been done going way back even in the CRT days, and the idea of "let the device do it the best way for it can, please save me from caring about its timig issues". The reason is cost. As an analogy to back long ago this is sort of like the emergence of IDE drives, where the drive package itself basically becomes a block level device wiht all the expense and problems, but eventually a much lower variance and much better economies. Unlike the IDE thing, the display types and requirements are wildly differnent and arguably much more price sensitive. The usual external nature and cabling, bandwidth, power requrements, size are similar in kindsof problems but far more extreme, on a more thunderport scale than a 8bit wide SCSI scale. To do what you are asking right now would require coherency and compositing smarts. Lke coordinate and bit-blit processing and truely discrete and well known pixel formats and translation on both ends and all the modern problems of a like a client server thing, like how do I know if I missed a block? vs just repaint Sadly the most intensive senarios that might benefit the most from smarter more playful display, like movies and 3D games, all the pixels could change every refresh. Many displays also dont have a "framebuffer" as we think of like a video card, and its just states set in the "matrix" so to speak done by following a linear signal as it comes, so the serial nature whole screen each frame sort of works. There the thunder / lightining port thing and the USB 3.0 video which tries to push parts of the problem closer to the display device, and at UBS 3.0 speeds you get lossy video. This is basically why without going into long historical 🙂 [quote<] Double buffer the work of the GPU, and as soon as a frame is completed: 1) start sending it to a buffer in the display as quickly as the display link will allow, 2) the GPU can start working on the next frame in it's other buffer while the transfer is taking place,[/quote<] This is basically what happens now, and most setups default to just double, not tripple. The only difference is that the buffer is sent on an interval rather than a burst. There is no way to copy faster than 1/60th of a second on a device made to not do more than that. The best case is that you can do them back to back which is still 60. Best case with "when I want to send" way, you could delay the start of the send to remove "too slow with vsync off" tearing, and would be very expensive feature for that, vs. 'known rate' one way communication. Intrinsically so long as there is a max "rate" of update for a device, this problem never goes away. And you still have to calc animation update factor for the whole frame. I promise you dont want to see half rendered frames, which are for modern cards and engines mostly z-buffer and overdraw based. The next step is bit-blit like block update, which is a long way off consumer hardware wise. The real problem, is more in line with one of the main issues presented with this article. And truely it matters less that the frames are "updated" at regular intervals, if the engine gets bad info and provides imprecise animation time steps anyway. The truth is that the newer drivers lie about the swap, and let the engine run thinking it has a timestep, then render on another thread. So you get divergent completely jacked steps. They are more like double pipeline, than double buffer and whne you say "go", that state just finishes rendering into the frame buffer on another thread. Looks better on reviews, theoreticaly makes games intrinsically mutithreded on dual+ cores, and does give naturally single threaded games seemingly more breathing room, but is bad measurements creating variance like a slinky. Eventually either you get completely backed up and have to block, or you are way ahead and for 3-4 frames either way its just off. Like having quad buffer you cant turn off.

  120. Scott, looking at the runt frames that always start at the top of the screen–I think that was one common case you pointed out. Is there any chance that they were really displayed sooner, but it was during the VBI? I don’t know what timing you’re using for your display, is it GTF or CVT-RB. If the former, a frame could be ‘being displayed’, but you’d never know as usless blanks were being scanned out instead of real frame data.

    Regardless, great article and I hope that it serves not only educate customers, but to ‘hold to the fire’ the feet of the vendors (both of games and of GPUs). We can probably assume that nVidia has rigs like your setup for their testing and that AMD will have such–if they don’t already–in the very damn near future. And that game houses will be following along promptly.

  121. Great article Scott!

    I was wondering if we know whether FRAPS being active has any impact on frame latency or FPS. It seems like you could look at a setup where FRAPS and FCAT are highly correlated and use FCAT to see if performance is different when FRAPS is active versus inactive. You could also do the opposite to see if FCAT being active has any impact on performance. I have always wondered if FRAPS is a neutral sensor.

  122. Oh I’m with you there. They’re all trying to catch up now, and Ryan at Anandtech really seems to be trying not to be seen as having been caught with his pants down (which I think he really was).

  123. [quote=”Bensam123″<]That aside, these results still don't measure the usage of buffers or actual input latency unless I'm mistaken. There could be substantially longer buffers on either the AMD or the Nvidia cards and none of these results would reflect it. The only way you'd be able to tell is based on your input to the system.[/quote<] This is the last real benchmark goal- to see how long it takes the entire system to react to user output. It's the one aspect of Crossfire/SLI that bothers me, in that they may very well be increasing the input lag over a single card setup such that the extra performance isn't really adding to the user experience, or even detracts from it.

  124. It’s really not complex- each frame output to the screen should represent the result of user and game engine inputs that have occurred since the frame before it.

    John Carmack went to great lengths to address this on the game engine side with Rage/Tech5, and Nvidia has gone to great lengths to address it on the rendering side with their efforts to provide a smoother experience to the user and to provide the tools to the industry to verify their work.

    And this is something we’ve been aware of largely since the dawn of 3D gaming hardware as it was spawned by 3Dfx- that computers are very complex, and that this complexity makes it very difficult for game engines to generate consistent output. Rendered frame metering is a great way to explain the problem so that it can be more fully addressed by the industry, but we’ve known all along that we needed faster hardware and unhinged software to make sure that games can run as smoothly as possible, reducing the impact that those ‘slowdowns’ can have to the user experience and the frequency at which they occur.

  125. Definitely like the depth in the article and the direction you’re going Scott. Kudos for continued exploration into this topic. It’d be great if there is a better way to put all these results into context after zooming in on them. You constantly stipulate how little one spike can be at times, but there is never any charts that focus around this or results that concentrate on a overall picture. I know that’s a rather ambiguous thing to express, but all we can look at for this is average FPS.

    “We have conducted the bulk of our performance testing so far, including this article, with vsync disabled. I think there’s room for some intriguing explorations of GPU performance with vsync enabled.”

    As well as Lucid’s hyperformance… Definitely sounds like a educational endeavor. Especially if you start throwing some new 120 or 144hz monitors into the mix.

    It sorta looks like Nvidia choose to come out with these tools after pcper came out with it’s own almost complete system for frame rating. Like if the cats out of the bag they rather get publicity for it and good PR. I mean this system is almost identical to pcper’s only it’s more mature. I wonder how long they’ve been working with these tools and on the same issues making Nvidia cards better and just thinking people would find their cards overall better and provide a more fluid experience (which they do, although there are a lot of other biases involved here).

    I would be very interested and I’m sure a lot of gamers would be in results at 120 or 144hz. I think that may change things quite a bit. Currently there is almost no one focusing on higher then 60hz results, either for benchmarking, frame time, or frame rating (FCAT). For that matter, I’m not even entirely sure video game developers, AMD or Nvidia, or even MS are taking these newer monitors seriously.

    For instance there is a glaring bug in Windows 7 in which if you have two monitors hooked up, one running at 60hz and one faster then 60hz, it results in occasional stutters in whatever you’re playing on this system. Strangely enough this happened with my i5-3570k, but when I switched to a FX8350 it went away. It occurs most noticeably if you have a video of some type playing on one monitor and a 3D game going in the foreground on another.

    After using a 144hz monitor there really is no going back for me and I’m sure more gamers would be sold on it the instant they try a good one.

    That aside, these results still don’t measure the usage of buffers or actual input latency unless I’m mistaken. There could be substantially longer buffers on either the AMD or the Nvidia cards and none of these results would reflect it. The only way you’d be able to tell is based on your input to the system.

    I should add after doing a bit more streaming, the stutter bug is still there with my 8350.

  126. It might just be me, but I feel like everyone else, even Anand, is simply running to catch up with this thing TR is onto. Alot of the articles on other sites I’ve seen seem to use flawed methods or presentations, some appear to be “me too” sort of reviews.

  127. Quick question on the treatment of runt/dropped frames: could you use a weighted average? Weight the time to deliver each of the various components of each display frame by their contribution to the image. For the sake of clarity rather than explaining a weighted average, suppose you have frames generated as below (numbers mostly picked out of the air).

    Frame 1:
    Generation times (ms): 1 10 14
    Proportions (%): 4 40 56
    Weighted average: (4*1)+(40*10)+(56*14) / 100 = 11.88ms weighted

    Frame 2:
    Generation times (ms): 1 100 14
    Proportions (%): 10 20 70
    Weighted average: (10*1) + (20*100) + (70*14) / 100 = 29.9ms weighted.

    Frame 1 contains three frames, but they are generated at roughly the same rate for the area covered, so the weighted frame latency is about where you’d expect.

    Frame 2 contains a massively delayed frame for not much gain (20% of the screen), and so it’s weighted average is much higher.

    There’s probably better ways you could weight this; it’s 1:37 AM here at the moment (support call) and I’m probably not thinking completely straight here somehow.

  128. Yes! That’s why I used the analogy.

    I want to hear about higher-end stuff that I can’t afford, be it computers, cars, guns, what have you. I’m a geek; I learn about these things as a matter of enjoyment. Sometimes I can afford them.

    And they’re good for the industry. They push the limits of what can be done with what we have, and improve the limits of the human experience (obviously speaking in a broad sense of technological innovation) even if everyone can’t afford one of their own.

    Though I guess that while the majority of your point is ‘on point,’ I’m not agreeing with your original second paragraph about spending resources to tune SLi (and Crossfire, when AMD gets around to it), because the very article you’re comment is posted under refutes the implication that the effort is a waste to a large degree, as does a vast amount of researchable articles and user experiences posted in forums.

    Nvidia has done the industry a favor by taking a long, hard look at how they can improve the user experience of their products, and it’s paid dividends.

  129. Sure, if you want it to be. If you’ve been following along with all this you’d know AMD hasn’t really begun to address their xfire issues. Their focus is single gpu configs first. In a few months they’ll be tackling xfire, probably coinciding with the reference 7990 launch. All this nvidia tool has done is verify what they’ve said is accurate.

    I really get sick of AMD owners yelling bias every time something doesn’t end well for them. It’s to the extent where some owners won’t trust anything out of sites like Anandtech and TR because they’re so certain AMD products are better than they actually are.

  130. Conclusion: glad I bought my Titan instead of wasting money in a multigpu setup again.

  131. Those high-end cars aren’t making the bulk of the revenue through. It is the mainstream stuff through sheer volume. The same high-end cars are typically test-bed platforms for new gimmicks and devices that later get distilled into the mainstream stuff.

    Sound familiar?

  132. I think the best way to fix the animation “smoothness” problem is to develop a new industry standard for displaying images on a digital display system — we can call it GSYNC. Get rid of the 60Hz scanning line by line output — it’s not a CRT and there is no longer a need for this precise timing to get the electron beam to line up correctly. Get rid of the 3 frame queue in the rendering pipeline. Double buffer the work of the GPU, and as soon as a frame is completed: 1) start sending it to a buffer in the display as quickly as the display link will allow, 2) the GPU can start working on the next frame in it’s other buffer while the transfer is taking place, 3) as soon as the display has a full buffer leave it up to the display’s controller as to the best way to display the image (use tiles or something and work on the whole screen at once). So, the screen would only update when it gets a completed frame, and the update interval would be set directly by the simulation timer. This would take a lot of the guesswork out of updating the simulation timer and result in much smoother animation from frame to frame. The simulation timer could always limit itself to 60Hz if it is consistently able to deliver frames quickly enough (also to avoid bandwidth limits of the display link). Or detail levels could be dynamically scaled to get close to 60Hz. By basing the screen refresh solely on the arrival of completed frames this would also get rid of the runt frame problem.

    I doubt the “industry” would be willing to collaborate on a new display standard just for us gamers, but… “industry people,” think about what this could do for power saving on mobile devices. Just refresh parts of the screen that change instead of sending the same mostly static image to the display 60 times per second. As I’m typing this right now, the only thing on my screen that is changing is the add on the right side of the screen.

    Any thoughts on this?

  133. For years under Linux I was disappointed that it wasn’t possible to do CF at all, and SLI only works in theory. Now that we’ve seen the issues that these setups have even under Windows, I’m going to give them an official: meh.

  134. Specific games aside, its definatly a shared responsibility and what I think is good about these articles especially this one, is the exposure about what whats going on where, and the peeling the layers of the onion away for the public.

    For the engines, its nessesarily done usually the same way; time what happened to know how much to advance the animation. Engines hope to assume that ‘swap’ means ‘swap’ and that output will follow with equal or scaling latencies, otherwise the animation step factor is meaningless and only exaspebates (sp) the stutter problem. The only ‘thing’ that knows exactly when the display is being driven with which buffer is the hardware, and if it varies dissimilarly from the percieved swap on the API side, then you will have a big problem.

    You have to vote via purchase, and shouldn’t have to live with unnessary crap for those that care, which is the core of what these sites used to be about. Doing the wierd hard painstaking stuff emploring more resources or time than most have to spend so we could know what we were getting before making mistakes. Smells like the old days a little.

  135. It’s a tiny niche, and yet, it’s like GM building Corvettes/Cadillacs, BMW’s Ms, AMGs, what have you- very few people can afford to afford those cars yet most ‘car people’ are pretty interested in them.

    I was gaming on computers, and building them, for over fifteen years before I built a dual-GPU setup, and even then it was after I’d saved up for a 30″ monitor that would give those cards a reasonable workout. But I’d been reading about them since ‘SLi’ meant ‘Scan-line Interleave’ on Voodoo2 cards.

    And from the results in this article, it appears that AMD deserves the criticism they’ve garnered for Crossfire, but isn’t SLi being shown as quite effective at providing an increase in quality to the user’s experience?

  136. I see where you’re coming from, but (in regards to thought #1) I’m referring to my assumption that if one card was running a higher clock speed, would that not mean that it has the capability of rendering a frame quicker than the card with the lower clock speed?

    And again, sort of the same view for my #2 thought: Those of us who tend to overclock components sometimes experience that they tend to “feel” (or “respond” is probably a better word to use) quicker with a given voltage vs. clock speed. Sometimes (and I know I’ve experienced this, and would assume others have as well) I’ve noticed that a higher vcore with a lower clock speed “responds” seemingly quicker than a lower vcore and higher clock speed, or even a higher vcore and higher clock, for that matter.
    Usually, I’ve found that I can make an OC feel more “responsive” by cooling the component better (especially in the case when I find myself with a higher OC with a higher vcore, and it just doesn’t seem anymore responsive than the previous, and lower, OC/vcore. Unless I’ve just hit the wall with that component’s OC potential, short of LN2 or the like, in which case its just diminishing returns at that point, but I’m straying from my original point….)
    Basically, in too many words, I’m attempting to say that heat obviously has negative affects on performance.

    But how this actually affects statistical performance numbers (if at all) is beyond me. Heck, I don’t know if it even applies to Scott’s study at all at this point.

  137. Really nice article Scott.

    I skimmed it towards the end, but basically you have confirmed that FRAPS isn’t too bad a tool, and that Crossfire really does really suck, in every way possible.

  138. Haters gonna hate.

    [H] has included a graph of ‘framerates over time,’ which is essentially frametimes, as instantaneous framerate requires the frametime to be calculated. So yeah, they’ve been looking at the same problem without really digging into it other than to inform their readers what impact the problem presented to the experience perceived.

  139. Ryan over at Anandtech already indicated that this cannot be open sourced. I would imagine that the issue may be this this is a DirectX wrapper/injector and the DirectX EULA isn’t friendly to open source software.

  140. It has been going on for years on both platforms. It is just nobody gave a hoot until now with the new focus on frame-rate latency.

  141. These tools provide even more evidence to prove that mutli-GPU setup have always been used as a marketing gimmick that exaggerate FPS scores. The epenis crowd sucks it up like candy and marketing people like to use as leverage on their brand.

    SLI and CF are both the same thing, the only difference is that Nvidia is willing to spend additional resources to fine-tune the software to mitigate the micro-shuttering and other issues with muiti-GPU rendering solutions. IMO, it is kinda silly to spend extra resources that only benefit the top 2% of the market, but that’s just me.

    I have personally used a CF setup to get a first-end experience see what is the deal. I went back to single-card solutions. I have heard countless stories for ex-SLI users. That speaks volumes on why SLI/CF never gone beyond their tiny niche.

  142. That was simply a great article.

    I suspect a few other sites out there will happily dance on FRAPS’ grave and only present FCAT results, but as you rightly say it’s imperative to make sure both ends of the pipe are consistent.

    For this reason I wonder whether Nvidia’s SLI frame metering will always be a good thing. The final frame rate may be smooth, but won’t the time steps in the simulation still be uneven, and worse then not match up with the display rate?

    On the AMD side, the Crossfire problem does look serious. For there to be such a big discrepancy between the start and end of the pipe, does that suggest that the fault is in the way the final frames are interleaved and combined for output, rather than in the actual rendering process?

    In both cases, the real test will be in cases where the minimum frame time drops further into unplayable levels. It will be fascinating to see the next few articles.

    P.S. The problem with the overlay colour matching on the Radeons: is that evidence of some form of internal compression?

  143. Unless these findings are unique to the games tested, I like how they seem have put those who made such a big stink about using FRAPs to assess frame latency in their place. Damage already more or less concluded this, but FRAPs did match up pretty darn closely to the FCAT data. And in cases where it didn’t so much (ie, Skyrim), FRAPs was [b<]more[/b<] conservative, at least in the sense that it registered spikes more sensitively than FCAT. As far as the consumer is concerned (if not the fanboi), erring on the side of being more sensitive to these potential faults or smoothness interruptions serves them best (though I agree that the ultimate objective is to have a measure that reflects perceived smoothness - if this is even possible given how this could vary by observer). The important thing is that, by and large, pretty much all the spikes in frame times that registered with FRAPs, were coincident with those from FCAT. Sure, more tools and data from different points in the graphics pipeline is always nice, and I like reading these articles, but in the end, the idea is to find one or two measures using standardized tools that best capture the latency issue if possible. And, as it happens, FRAPs frame times don't seem to be so bad in this regard after all.

  144. How is this guy getting upped? The [H] is a no bullshit benchmarking site with subjective and quantitative data. And it looks like their subjective data was backed up nicely with the frame time data.

  145. I think that one issue with Starcraft 2 is that patch 1.5 changed the game engine in significant ways. There is no way to launch a retail copy of the game without this latest patch and there are frequent patches that impact performance.

    Another think to note in this game is that is streams textures as they are first used. There is a unit pre-loader in the ‘custom games’ section that you can run first each time you launch the game to get the multi-player units into the VRAM.

    Finally, Starcraft 2 isn’t 100% GPU bound. Setting the ‘physics’ and ‘model’ settings to low alleviates some of the stutter when rendering large armies of units. This expecially helps when huge groups of units are dying – when physics are set to high the game uses the CPU to ragdoll all of the flying body parts /debris which can be hundred of objects when dozens of units are destroyed at once.

  146. When did you leave?!?

    TR and Anandtech have largely been in sync with their conclusions even while using different tests and coming about those conclusions from different perspectives on occasion. I throw HardOCP in the mix for their more ‘down to earth’ subjective measurements after getting the best of the objective perspectives from TR and Anandtech, but TR comes first (especially since Anand hasn’t heavily involved himself in the PC consumer side of things much lately, though the dude still commands tremendous respect).

  147. Not speaking for Scott, but:

    1. While you could certainly balance voltages and possibly force the cards to run at the same speed, you’d be putting them at a disadvantage since they’re designed to boost to increase performance. It may be possible to force them to only boost as high as either of them can handle at any one point in time, but even that would hinder the cards’ performance.

    2. I believe that there was only one monitor being used, and that TR tests on an open bench with plenty of space between the cards, largely negating the heat issue, though this would be a good configuration variable to explore when this kind of testing invariably moves on to multi-monitor scenarios!

  148. Somehow, StarCraft 2 is one odd duck. Remember the overheating issue?

    But you’d think that the game isn’t really limited by anything. Modern CPUs are an order of magnitude or more faster than the 486’s and Pentiums that we played the first StarCraft on, and the game hasn’t grown nearly as much in scale or complexity; and hell, it’s an RTS with moderate graphics, so it shouldn’t be GPU limited either.

    I’d bet that making StarCraft 2 ‘smoother’ is largely in the hands of Blizzard, and though I haven’t played it much, I will say that I didn’t have any problem playing it on my HD3000-equipped half-HD laptop.

  149. Yup!

    And I know it’d be difficult, but I’d certainly like to see how a game like Rage deals with this sort of thing. Developers are really being called out here for not making sure that their games provide smooth output, which is something that John Carmack has been working on solving for what appears to be half a decade!

  150. Wow… for whats its worth, I think Scott did a very good job, especially in illustrating the difference between smooth frame delivery and accurate animation timing.

    Generally, engines not designed around fixed rates will have to guess and preemptively ‘time’ their advancement of animation based on whatever the current rate appears to be. What you mostly hope for is 1) that the latency perceived via api will actually coincide each frame with the real visible output latency, so that the guess isn’t in vein, and 2) that you get a run of enough frames at reasonably similar rate to minimize stutter artifacts to keep the player properly immersed. Just because the display output delivery rate seems smoother like shown with FCAT, doesn’t mean its the same as the animation advancement or FRAPS “block til rendering is caught up for next swap” rate. The essential and unavoidable problem is that you have to calculate ahead of time some sort of ‘factor’ that can be applied the world and animation advancement. So what you want is pipe determinism, which is what I think all the inside the second is really starting to hone down and call out.

    This is probably the first time I’ve seen someone publically really call that out with real evidence, and in such an unbiased way, that wasn’t in a graphics book or in trade literature. Usually there is a huge disparity between what really goes on, on the industry side and what the lore becomes on the public side. So good job.

  151. From what I can see from this article and what I’ve read in the past here and on other benchmarking sites, I think I’ll stay away from SLI and CrossFire until the frame quality becomes more balanced between the two cards or until a revolutionary new way to render frames is introduced into the fold.

    Still, I would like to know if there are improvements with StarCraft 2 framerate. This is a very popular game. In your first [i<]Inside the Second[/i<] article, you included StarCraft 2 in the benchmarks. Recalling what I read in the article, there wasn't a single GPU solution that could keep the longer frames under 50ms. However, you did mention that SLI helped to alleviate the stuttering issue. I would like to see more research on this, as to why StarCraft 2 performed better with SLI than other games. Hopefully, we might find some answers, and not just skip over it just because it doesn't fall in line with your more "benchmarkable" games.

  152. TR,

    While reading this, I had a thought that may help rule out some variance in your test results (assuming you haven’t already thought of them, and assuming my understanding of the frame rendering processes are anywhere near correct) …

    Those thoughts were as follows:

    1. Were the SLI setups balanced? Meaning, were all voltages and clock speeds matched during testing?

    2. Has it been considered running the tests with all monitors connected to one card vs. the other (for SLI setups) ?

    I mention these two things, because I’ve noticed on my own GTX 680 SLI setup, that the upper card (closest to the top of the case) tends to run hotter. One, because obviously heat rises, and Two, (which may or may not have much affect) the upper card boosts to a slightly higher clock speed than the lower card (as measured by eVGA’s Precision).

    I’ve since moved all [b<]*EDIT*[/b<] of my monitors connectors (2x DVI, 1x HDMI) [b<]*END EDIT*[/b<] to the lower card, which has at least balanced out the temperatures to a variance of ~1-2*C in nearly all situations (GPU Compute, 3DMark, BL2, Idle, to name a few). Which is all I was really trying to do anyhow, but I can't help but wonder if the two thoughts above have any potential to contribute to any variance in the test results, even if minor. If I'm off in left field, let me know - as I'm always learning. (Great article btw!)

  153. I said before that “I have a sinking feeling that in order to fix these issues AMD would have to take back the ‘optimizations’ they found of late. I think those words have proven true. I think a comparison and analysis between the current radeon driver set and the pre-optimized drivers in the last few months may reveal otherwise identical performance when filtered….possibly implying that that the current AMD driver had these issues added to inflate performance scores.

  154. Yeah. I find it suspect, at the least. It’s interesting, but FCAT needs to be open-sourced, along with the hardware, so that the community at large can ensure that there’s no funny-business.

  155. Very nice work, Scott.

    You mixed up the colors in the Guild Wars “badness” graph, though.

    In the end I’m glad you’ve decided to keep Fraps and add FCAT instead of replacing the first by the second. It will undoubtedly increase your workload, but it’s the best choice and as a reader I really appreciate the effort.

    The only downside is that understanding benchmarks will be a lot harder for some people. :p

  156. It’d be pretty easy to point a high speed camera at the display to see if it can capture the ‘runt’ frames the same way as the Datapath VisionDVI-DL card can.

  157. HardOCP was right to always include their subjective impressions in the reviews. But, subjective results are just that, subjective, and as such it’s hard to reproduce or compare them. These tools offer a way to actually measure that which up until now could only be observed.

  158. MAN this is complex. I knew after reading PCPer and Tom’s articles that they weren’t getting the whole story – and even in their reviews it looked like FRAPS is more accurate than they give it credit for. Very nice to see tech sites taking an objective look at the video output in addition to frametimes.

  159. I want to hope for the best, but due to Nvidia supplying everything, I do take it with a grain of salt.
    Not saying its rigged, but just being weary since we have been down this path before.

    I’d feel much better if AMD and Nvidia had worked together on something like this, and maybe brought in Intel as well, since they have some skin in the game too.

  160. It’s a bit smellly that Nvidia provided the tools that point out a rather glaring problem in AMD’s Crossfire implementation, nonetheless I think it’s commendable that they released them. It’s a Good Thing[super<](tm)[/super<] that multiple tech sites can now more accurately (and for that matter, more precisely) benchmark video cards and identify issues that flew right by the traditional benchmarks. The whole industry will be better for it, and we as gamers/tech enthusiasts stand to benefit most from it. Also, I like how Nvidia probably wanted to prove that all AMD setups are crappy, but in the end only succeeded to show that SLI is better than Crossfire.

  161. HardOCP was doing it right by their readers, calling it like they saw it, though they do cater to a slightly different crowd than TR and Anandtech.

  162. BF3 was rough as hell on a pair of 2GB HD6950’s at 2560×1600 compared to a single GTX670, even though they were technically faster, so at least that far back :).

  163. What methods, subjective smoothness? I don’t recall too many people saying they were a bad thing to do so there isn’t any vindication needed, but they were subjective rather than objective.

  164. Looks like a whole generation of FPS based Crossfire benchmarks for the 7000 series are deeply misleading as compared against 600 series SLI.

    Who knows how many generations back that statement main remain true.

  165. Can anyone opine on whether or not the whole dive into frame times vindicates HardOCP’s methods?