So now what?
This first take on Nvidia's FCAT tools and the things they can measure is just a beginning, so I don't have many conclusions for you just yet. But this is an awfully good start to a new era of GPU benchmarking. We can now see into the early stages of the rendering pipeline with Fraps and then determine exactly what's happening at the other end of the pipe when visuals are delivered to the display with FCAT. We can correlate the two and see how much leeway there is between them. And we also have videos that allow us to review the resulting animation with frame-by-frame precision, to show us the exact impact of any spikes or anomalies in the numbers we're seeing. These are the best tools yet for understanding real-time graphics performance, and they offer the potential for lots of new insights.
The fact that Nvidia has decided to release analytical tools of this caliber to the general public is remarkable. Yes, the first results of those tools have detected some issues with its competition's products, but who knows what other problems we might uncover with them down the road? Nvidia is taking a risk here, and the fact it's willing to do so is incredibly cool.
Going forward, there's still tons of work to be done. For starters, we need to spend quite a bit more time understanding the problems of multi-GPU micro-stuttering, runt frames, and the like. The presence of these things in our benchmark results may not be all that noteworthy if overall performance is high enough. The stakes are pretty low when the GPUs are constantly slinging out new frames in 20 milliseconds or less. I've not been able to perceive a problem with micro-stuttering in cases like that, and I suspect those who claim to are seeing extreme cases or perhaps other issues entirely. Our next order of business will be putting multi-GPU teams under more stress to see how micro-stuttering affects truly low-frame-rate situations where animation smoothness is threatened. We have a start on this task, but we need to collect lots more data before we are ready to draw any conclusions. Stay tuned for more on that front. I'm curious to see what other folks who have these tools in their hands have discovered, too.
The FCAT analysis has shown us that Nvidia's frame metering tech for SLI does seem to work as advertised. Frame metering isn't necessarily a perfect solution, because it does insert some tiny delays into the rendering-and-display pipeline. Those delays may create timing discontinuities between the game simulation time—and thus frame content—and the display time. They also add a minuscule bit to the lag between user input and visual response. But then there's apparently a fair amount of low-stakes timing slop in PC graphics, as the gap between our Fraps and FCAT results (in everything but the Unreal-engine-based Borderlands 2) has demonstrated. The best thing we can say for frame metering is that it makes the Fraps and FCAT times for SLI solutions appear to correlate about like they do for single-GPU solutions. That's a really high-concept way of saying that it appears to work pretty well.
We do want to be careful to note that frame delivery as measured by FCAT is just one part of a larger picture. Truly fluid animation requires the regular delivery of frames whose contents are advancing at the same rate. What happens at the beginning of the pipeline needs to match what happens at the end. Relying on FCAT numbers alone will not tell that whole story; we'd just be measuring the effectiveness of frame metering techniques. We've come too far in the past couple of years in how we measure gaming performance to commit that error now.
Ideally, we'd like to see two things happen next. First, although FCAT's captures are nice to have and Nvidia's scripts provide a measure of automation, using these tools is a lot of work and generates huge amounts of data. It would be very helpful to have an API from the major GPU makers that exposes the true timing of the frame-buffer flips that happen at the display. I don't think we have anything like that now, or at least nothing that yields results as accurate as those produced by FCAT. With such an API, we could collect end-of-pipeline data much easier and use frame captures sparingly, for sanity checks and deeper analysis of images. Second, in a perfect world, game developers would expose an API that reveals the internal simulation timing of the game engine for each frame of animation. That would allow us to do away with grabbing the Present() time via Fraps and end any debate about the accuracy of those numbers. We'd then have the data we need to correlate with precision the beginning and ending of the pipeline and to analyze smoothness—or, well, for someone who's smarter than us about the tricky math of a rate-match problem and the perceptual thresholds for smooth animation to do so.Follow me on Twitter for shorter ramblings.
189 comments — Last by uartin at 4:00 AM on 04/05/13
|The Tech Report System Guide: September 2017 editionHog heaven at the high end||99|
|Nvidia Quadro vDWS brings greater flexibility to virtualized pro graphicsPascal Teslas play host to Quadro virtues||2|
|AMD's Radeon RX Vega 64 and RX Vega 56 graphics cards reviewedRadeons return to the high-end graphics market||279|
|AMD's Radeon RX Vega 64 and RX Vega 56 graphics cards revealedGamers get Vegas to call their own||177|
|Radeon Software Crimson ReLive Edition 17.7.2 boasts refinements galoreTidying up ahead of RX Vega||22|
|Corsair's Hydro GFX GeForce GTX 1080 Ti graphics card reviewedNo assembly required||28|
|The Tech Report System Guide: May 2017 editionRyzen 5 takes the stage||111|
|Aorus' GeForce GTX 1080 Ti Xtreme Edition 11G graphics card reviewedThe eagle has landed||36|
|Qualcomm shows progress on 5G mobile broadband||14|
|ROG Strix X370-I and B350-I are itty-bitty boards for Ryzen builds||10|
|Samsung foundry train stops at 8-nm LPP before heading to EUV||10|
|Wednesday deals: a Ryzen combo, mechanical keyboards, and storage||9|
|RX Vega prices inch downward in our latest graphics-card spot check||22|
|HP ZBook x2 detachable is a consummate professional||7|
|NZXT Grid+ v3 keeps PCs quiet with machine learning||9|
|Razer's Blade Stealth and Core V2 step to the cutting edge||14|
|Intel unveils purpose-built Neural Network Processor for deep learning||19|
|That nanometric metric had little value before and completely lost it now. It's time to start talking about MTr/mm2 primarily.||+12|