So now what?
This first take on Nvidia's FCAT tools and the things they can measure is just a beginning, so I don't have many conclusions for you just yet. But this is an awfully good start to a new era of GPU benchmarking. We can now see into the early stages of the rendering pipeline with Fraps and then determine exactly what's happening at the other end of the pipe when visuals are delivered to the display with FCAT. We can correlate the two and see how much leeway there is between them. And we also have videos that allow us to review the resulting animation with frame-by-frame precision, to show us the exact impact of any spikes or anomalies in the numbers we're seeing. These are the best tools yet for understanding real-time graphics performance, and they offer the potential for lots of new insights.
The fact that Nvidia has decided to release analytical tools of this caliber to the general public is remarkable. Yes, the first results of those tools have detected some issues with its competition's products, but who knows what other problems we might uncover with them down the road? Nvidia is taking a risk here, and the fact it's willing to do so is incredibly cool.
Going forward, there's still tons of work to be done. For starters, we need to spend quite a bit more time understanding the problems of multi-GPU micro-stuttering, runt frames, and the like. The presence of these things in our benchmark results may not be all that noteworthy if overall performance is high enough. The stakes are pretty low when the GPUs are constantly slinging out new frames in 20 milliseconds or less. I've not been able to perceive a problem with micro-stuttering in cases like that, and I suspect those who claim to are seeing extreme cases or perhaps other issues entirely. Our next order of business will be putting multi-GPU teams under more stress to see how micro-stuttering affects truly low-frame-rate situations where animation smoothness is threatened. We have a start on this task, but we need to collect lots more data before we are ready to draw any conclusions. Stay tuned for more on that front. I'm curious to see what other folks who have these tools in their hands have discovered, too.
The FCAT analysis has shown us that Nvidia's frame metering tech for SLI does seem to work as advertised. Frame metering isn't necessarily a perfect solution, because it does insert some tiny delays into the rendering-and-display pipeline. Those delays may create timing discontinuities between the game simulation time—and thus frame content—and the display time. They also add a minuscule bit to the lag between user input and visual response. But then there's apparently a fair amount of low-stakes timing slop in PC graphics, as the gap between our Fraps and FCAT results (in everything but the Unreal-engine-based Borderlands 2) has demonstrated. The best thing we can say for frame metering is that it makes the Fraps and FCAT times for SLI solutions appear to correlate about like they do for single-GPU solutions. That's a really high-concept way of saying that it appears to work pretty well.
We do want to be careful to note that frame delivery as measured by FCAT is just one part of a larger picture. Truly fluid animation requires the regular delivery of frames whose contents are advancing at the same rate. What happens at the beginning of the pipeline needs to match what happens at the end. Relying on FCAT numbers alone will not tell that whole story; we'd just be measuring the effectiveness of frame metering techniques. We've come too far in the past couple of years in how we measure gaming performance to commit that error now.
Ideally, we'd like to see two things happen next. First, although FCAT's captures are nice to have and Nvidia's scripts provide a measure of automation, using these tools is a lot of work and generates huge amounts of data. It would be very helpful to have an API from the major GPU makers that exposes the true timing of the frame-buffer flips that happen at the display. I don't think we have anything like that now, or at least nothing that yields results as accurate as those produced by FCAT. With such an API, we could collect end-of-pipeline data much easier and use frame captures sparingly, for sanity checks and deeper analysis of images. Second, in a perfect world, game developers would expose an API that reveals the internal simulation timing of the game engine for each frame of animation. That would allow us to do away with grabbing the Present() time via Fraps and end any debate about the accuracy of those numbers. We'd then have the data we need to correlate with precision the beginning and ending of the pipeline and to analyze smoothness—or, well, for someone who's smarter than us about the tricky math of a rate-match problem and the perceptual thresholds for smooth animation to do so.Follow me on Twitter for shorter ramblings.
189 comments — Last by uartin at 4:00 AM on 04/05/13
|Are retail Radeon R9 290X cards slower than press samples?We take a look||257|
|Delving deeper into AMD's Mantle APIDispatches from APU13||191|
|AMD's Radeon R9 270 graphics card reviewedPitcairn again||77|
|Nvidia's GeForce GTX 780 Ti graphics card reviewedNow witness the firepower of this fully armed and operational battle station||284|
|AMD's Radeon R9 290 graphics card reviewedHope you didn't buy the X yet||308|
|AMD's Radeon R9 290X graphics card reviewedHawaii erupts||653|
|Not-quite-live blog: panel discussion with John Carmack, Tim Sweeney, Johan AnderssonThree game engine gurus talk about PC gaming tech||37|
|Live blog from day two of Nvidia's Montreal 2013 eventThis one should be interesting||32|
|HP offers Leap Motion-infused keyboard with desktop, all-in-one PCs||9|
|Brawling my way through Batman: Arkham Origins||17|
|Heavyweight rematch: Gigabyte X79-UP4 vs. MSI X79A-GD45 Plus||3|
|Thursday Night Shortbread||23|
|Acer's Iconia W4 tablet offers Bay Trail, 8'' display for $330||26|
|AMD issues statement on R9 290X speed variability, press samples||124|
|MSI's new gaming notebook has a 2880x1620 screen||28|
|Next-gen Intel SSDs could have 2TB capacities, integrated heatsinks||32|
|Data suggests consumer drives are as reliable as enterprise models||58|
|They had a 40M mail-in-rebate.||+29|