On the plane home, I started to write up a few impressions of the new G-Sync display technology that Nvidia introduced on Friday. However, that attempt pretty quickly turned into a detailed explanation of refresh rates and display technology. Realistically, I'll have to finish that at a later date, because I have another big graphics-related project hogging my time this week.
For now, I'll just say that whatever its drawbacks—which are mainly related to its proprietary nature—the core G-Sync technology itself is simply The Right Thing to Do. That's why Nvidia was able to coax several big names into appearing to endorse it. Because G-Sync alters conventional display tech by introducing a variable refresh rate, there's no easy way to demonstrate the impact in a web-based video. This is one of those things you'll have to see in person in order to appreciate fully.
I've seen it, and it's excellent.
You may remember that I touched on the possibility of a smarter vsync on this page of my original Inside the Second article. In fact, AMD's David Nalasco was the one who floated the idea. We've known that such a thing was possible for quite some time. Still, seeing Nvidia's G-Sync implementation in action is a revelatory experience. The tangible reality is way better than the theoretical prospect. The effect may seem subtle for some folks in some cases, depending on what's happening onscreen, but I'll bet that most experienced PC gamers who have been haunted by tearing and vsync quantization for years will appreciate the improvement pretty readily. Not long after that, you'll be hooked.
In order to make G-Sync happen, Nvidia had to build a new chip to replace the one that goes inside of most monitors to do image scaling and such. You may have noticed that the first version of Nvidia's solution uses a pretty big chip. That's because it employs an FPGA that's been programmed to do this job. The pictures show that the FPGA is paired with a trio of 2Gb DDR3 DRAMs, giving it 768MB of memory for image processing and buffering. The solution looks to add about $100 to price of a display. You can imagine Nvidia could cut costs pretty dramatically, though, by moving the G-Sync control logic into a dedicated chip.
The first monitors with G-Sync are gaming-oriented, and most are capable of fairly high refresh rates. That generally means we're talking about TN panels, with the compromises that come along with them in terms of color fidelity and viewing angles. However, the G-Sync module should be compatible with IPS panels, as well. As the folks who are overclocking their 27" Korean IPS monitors have found, even nice IPS panels sold with 60Hz limits are actually capable of much higher update rates.
G-Sync varies the screen update speed between the upper bound of the display's peak refresh rate and the lower bound of 30Hz—or every 33 ms. Beyond 33 ms, the prior frame is painted again. Understand that we're really talking about frame-to-frame updates that happen between 8.3 ms and 33 ms, not traditional refresh rates between 120Hz and 30Hz. G-Sync varies the timing on a per-frame basis. I'd expect many of today's IPS panels could range down to 8.3 ms, but even ranging between, say 11 ms (equivalent to 90Hz) and 33 ms could be sufficient to make a nice impact on fluidity.
G-Sync means Nvidia has entered into the display ASIC business, and I expect them to remain in that business as long as they have some measure of success. Although they could choose to license this technology to other firms, G-Sync is just a first step in a long line of possible improvements in GPU-display interactions. Having a graphics company in this space driving the technology makes a lot of sense. Going forward, we could see deeper color formats, true high-dynamic range displays enabled by better and smarter LED backlights, new compression schemes to deliver more and deeper pixels, and the elimination of CRT-oriented artifacts like painting the screen from left to right and top to bottom. Nvidia's Tom Petersen, who was instrumental in making G-Sync happen, mentioned a number of these possibilities when we chatted on Friday. He even floated the truly interesting idea of doing pixel updates across the panel in random fashion, altering the pattern from one frame to the next. Such stochastic updates could work around the human eye's very strong pattern recognition capability, improving the sense of solidity and fluid motion in on-screen animations. When pressed, Petersen admitted that idea came from one Mr. Carmack.
AMD will need to counter with its own version of this tech, of course. The obvious path would be to work with partners who make display ASICs and perhaps to drive the creation of an open VESA standard to compete with G-Sync. That would be a typical AMD move—and a good one. There's something to be said for AMD entering the display ASIC business itself, though, given where things may be headed. I'm curious to see what path they take.
Upon learning about G-Sync, some folks have wondered about whether GPU performance will continue to matter now that we have some flexibility in display update times. The answer is yes; the GPU must still render frames in a timely fashion in order create smooth animations. G-Sync simply cleans up the mess at the very end of the process, when frames are output to the display. Since the G-Sync minimum update rate is 30Hz, we'll probably be paying a lot of attention to frames that take longer than 33.3 ms to produce going forward. You'll find "time beyond 33 ms" graphs in our graphics reviews for the past year or so, so yeah. We're ready.
G-Sync panels will not work with FCAT, of course, since FCAT relies on a standards-based video capture card. We can use FCAT-derived performance data to predict whether one would have a good experience with G-Sync, but ultimately, we need better benchmarking tools that are robust enough to make the transition to new tech like 4K and G-Sync displays without breaking. I've been pushing both Nvidia and AMD to expose the exact time when the GPU flips to a new frame via an API. With that tool, we could capture FCAT-style end-of-pipeline frame times without the aid of video captures. We'd want to verify the numbers with video capture tools whenever possible, but having a display-independent way to do this work would be helpful. I think game engine developers want the same sort of thing in order to make sure their in-game timing can match the display times, too. Here's hoping we can persuade AMD, Nvidia, and Intel to do the right thing here sooner rather than later.Here's why the CrossFire Eyefinity/4K story matters
Earlier this week, we posted a news item about an article written by Ryan Shrout over at PC Perspective. In the article, Ryan revealed some problems with using a Radeon CrossFire multi-GPU setups and multiple displays.
Those problems look superficially similar to the ones we explored in our Radeon HD 7990 review. They were partially resolved—for single displays with resolutions of 2560x1600 and below, and for DirectX 10/11 games—by AMD's frame pacing beta driver. AMD has been forthright that it has more work to do in order to make CrossFire work properly with multiple displays, higher resolutions, and DirectX 9 games.
I noticed that many folks reacted to our news item by asking why this story matters, given the known issues with CrossFire that have persisted literally for years. I have been talking with Ryan and looking into these things for myself, and I think I can explain.
Let's start with the obvious: this story is news because nobody has ever looked at frame delivery with multi-display configs using these tools before. We first published results using Nvidia's FCAT tools back in March, and we've used them quite a bit since. However, getting meaningful results from multi-display setups is tricky when you can only capture one video output at a time, and, rah rah other excuses—the bottom line is, I never took the time to try capturing, say, the left-most display with the colored FCAT overlay and analyzing the output. Ryan did so and published the first public results.
That's interesting because, technically speaking, multi-display CrossFire setups work differently than single-monitor ones. We noted this fact way back in our six-way Eyefinity write-up: the card-to-card link over a CrossFire bridge can only transfer images up to to four megapixels in size. Thus, a CrossFire team connected to multiple displays must pass data from the secondary card to the primary card over PCI Express. The method of compositing frames for Eyefinity is simply different. That's presumably why AMD's current frame-pacing driver can't work its magic on anything beyond a single, four-megapixel monitor.
We already know that non-frame-paced CrossFire solutions on a single display are kind of a mess. Turns out that the problems are a bit different, and even worse, with multiple monitors.
I've been doing some frame captures myself this week, and I can tell you what I've seen. The vast majority of the time, CrossFire with Eyefinity drops every other frame with alarming consistency. About half of the frames just don't make it to the display at all, even though they're counted in software benchmarking tools like Fraps. I've seen dropped frames with single-display CrossFire, but nothing nearly this extreme.
Also, Ryan found a problem in some games where scan lines from two different frames become intermixed, causing multiple horizontal tearing artifacts on screen at once. (That's his screenshot above.) I've not seen this problem in my testing yet, but it looks to be a little worse and different from the slight "leakage" of an old frame into a newer one that we observed with CrossFire and one monitor. I need to do more testing in order to get a sense of how frequently this issue pops up.
The bottom line is that Eyefinity and CrossFire together appear to be a uniquely bad combination. Worse, these problems could be tough to overcome with a driver update because of the hardware bandwidth limitations involved.
This story is a bit of a powder keg for several reasons.
For one, the new marketing frontier for high-end PC graphics is 4K displays. As you may know, current 4K monitors are essentially the same as multi-monitor setups in their operation. Since today's display ASICs can't support 4K resolutions natively, monitors like the Asus PQ321Q use tiling. One input drives the left "tile" of the monitor, and a second feeds the right tile. AMD's drivers handle the PQ321Q just like a dual-monitor Eyefinity setup. That means the compositing problems we've explored happen to CrossFire configs connected to 4K displays—not the regular microstuttering troubles, but the amped-up versions.
Ryan tells me he was working on this story behind the scenes for a while, talking to both AMD and Nvidia about problems they each had with 4K monitors. You can imagine what happened when these two fierce competitors caught wind of the CrossFire problems.
For its part, Nvidia called together several of us in the press last week, got us set up to use FCAT with 4K monitors, and pointed us toward some specific issues with their competition. One the big issues Nvidia emphasized in this context is how Radeons using dual HDMI outputs to drive a 4K display can exhibit vertical tearing right smack in the middle of the screen, where the two tiles meet, because they're not being refreshed in sync. This problem is easy to spot in operation.
GeForces don't do this. Fortunately, you can avoid this problem on Radeons simply by using a single DisplayPort cable and putting the monitor into DisplayPort MST mode. The display is still treated as two tiles, but the two DP streams use the same timing source, and this vertical tearing effect is eliminated.
I figure if you drop thousands of dollars on a 4K gaming setup, you can spring for the best cable config. So one of Nvidia's main points just doesn't resonate with me.
And you've gotta say, it's quite the aggressive move, working to highlight problems with 4K displays just days ahead of your rival's big launch event for a next-gen GPU. I had to take some time to confirm that the Eyefinity/4K issues were truly different from the known issues with CrossFire on a single monitor before deciding to post anything.
That said, Nvidia deserves some credit for making sure its products work properly. My experience with dual GeForce GTX 770s and a 4K display has been nearly seamless. Plug in two HDMI inputs or a single DisplayPort connection with MST, and the GeForce drivers identify the display and configure it silently without resorting to the Surround setup UI. There's no vertical tearing if you choose to use dual HDMI inputs. You're going to want to use multiple graphics cards in order to get fluid gameplay at 4K resolutions, and Nvidia's frame metering tech allows our dual-GTX 770 SLI setup to deliver. It's noticeably better than dual Radeon HD 7970s, and not in a subtle way. Nvidia has engineered a solution that overcomes a lot of obstacles in order to make that happen. Give them props for that.
As for AMD, well, one can imagine the collective groan that went up in their halls when word of these problems surfaced on the eve of their big announcement. The timing isn't great for them. I received some appeals to my better nature, asking me not to write about these things yet, telling me I'd hear all about AMD's 4K plans next week. I expect AMD to commit to fixing the problems with its existing products, as well as unveiling a newer and more capable high-end GPU. I'm looking forward to it.
But I'm less sympathetic when I think about how AMD has marketed multi-GPU solutions like the Radeon HD 7990 as the best solution for 4K graphics. We're talking about very expensive products that simply don't work like they should. I figure folks should know about these issues today, not later.
My hope is that we'll be adding another chapter to this story soon, one that tells the tale of AMD correcting these problems in both current and upcoming Radeons.Those next-gen games? Yeah, one just arrived
We've been swept away by a wave of talk about next-gen consoles since Sony unveiled the specs for the PlayStation 4, and we're due for another round when Microsoft reveals the next Xbox. The reception for the PS4 specs has largely been positive, even among PC gamers, because of what it means for future games. The PS4 looks to match the graphics horsepower of today's mid-range GPUs, something like a Radeon HD 7850. Making that sort of hardware the baseline for the next generation of consoles is probably a good thing for gaming, the argument goes.
Much of this talk is about potential, about the future possibilities for games as a medium, about fluidity and visual fidelity that your rods and cones will soak up like a sponge, crying out for more.
And I'm all for it.
But what if somebody had released a game that already realized that potential, that used the very best of today's graphics and CPU power to advance the state of the art in plainly revolutionary fashion, and nobody noticed?
Seems to me, that's pretty much what has happened with Crysis 3. I installed the game earlier this week, aware of the hype around it and expecting, heck, I dunno what—a bit of an improvement over Crysis 2, I suppose, that would probably run sluggishly even on high-end hardware. (And yes, I'm using high-end hardware, of course: dual Radeon HD 7970s on one rig and a GeForce GTX Titan on the other, both attached to a four-megapixel 30" monitor. Job perk, you know.)
Nothing had prepared me for what I encountered when the game got underway.
I've seen Far Cry 3 and Assasin's Creed 3 and other big-name games with "three" in their titles that pump out the eye candy, some of them very decent and impressive and such, but what Crytek has accomplished with Crysis 3 moves well beyond anything else out there. The experience they're creating in real time simply hasn't been seen before, not all in one place. You can break it down to a host of component parts—an advanced lighting model, high-res textures, complex environments and models, a convincing physics simulation, expressive facial animation, great artwork, and what have you. Each one of those components in Crysis 3 is probably the best I've ever seen in an interactive medium.
And yes, the jaw-dropping cinematics are all created in real time in the game engine, not pre-rendered to video.
But that's boring. What's exciting is how all of those things come together to make the world you're seeing projected in front of your face seem real, alive, and dangerous. To me, this game is a milestone; it advances the frontiers of the medium and illustrates how much better games can be. This is one of those "a-ha!" moments in tech, where expectations are reset with a tingly, positive feeling. Progress has happened, and it's not hard to see.
Once I realized that fact, I popped open a browser tab and started looking at reviews of Crysis 3, to find out what others had to say about the game. I suppose that was the wrong place to go, since game reviewing has long since moved into fancy-pants criticism that worries about whether the title in question successfully spans genres or does other things that sound vaguely French in origin. Yeah, I like games that push those sorts of boundaries, too, but sometimes you have to stop and see the forest full of impeccably lit, perfectly rendered trees.
Seems to me like, particularly in North America, gamers have somehow lost sight of the value of high-quality visuals and how they contribute to the sense of immersion and, yes, fun in gaming. Perhaps we've scanned through too many low-IQ forum arguments about visual quality versus gameplay, as if the two things were somehow part of an engineering tradeoff, where more of one equals less of the other. Perhaps the makers of big-budget games have provided too many examples of games that seem to bear out that logic. I think we could include Crytek in that mix, with the way Crysis 2 wrapped an infuriatingly mediocre game in ridiculously high-zoot clothing.
Whatever our malfunction is, we ought to get past it. Visuals aren't everything, but these visuals sure are something. A game this gorgeous is inherently more compelling than a sad, Xboxy-looking console port where all surfaces appear to be the same brand of shiny, blurry plastic, where the people talking look like eerily animated mannequins. Even if Crytek has too closely answered, you know, the call of duty when it comes to gameplay and storylines, they have undoubtedly achieved something momentous in Crysis 3. They've made grass look real, bridged the uncanny valley with incredible-looking human characters, and packed more detail into each square inch of this game than you'll find anywhere else. Crysis 3's core conceit, that you're stealthily hunting down bad guys while navigating through this incredibly rich environment, works well because of the stellar visuals, sound, and physics.
My sense is that Crysis 3 should run pretty well at "high" settings on most decent gaming PCs, too. If not on yours, well, it may be time to upgrade. Doing so will buy you a ticket to a whole new generation of visual fidelity in real-time graphics. I'd say that's worth it. To give you a sense of what you'd be getting, have a look at the images in the gallery below. Click "view full size" to see them in their full four-megapixel glory. About half the shots were take in Crysis 3's "high" image quality mode, since "very high" was a little sluggish, so yes, it can get even better as PC graphics continues marching forward.
I've been buried in my own work while preparing our Titan review, but lots has happened in the past few weeks, as many in the industry have moved toward adopting some form of game performance testing based on frame rendering times rather than traditional FPS. Feels like we've crossed a threshold, really. There's work to be done figuring out how to capture, analyze, and present the data, but folks seem to have embraced the basic approach of focusing on frame times rather than FPS averages. I'm happy to see it.
I've committed to writing about developments in frame-latency-based testing as they happen, and since so much has been going on, some of you have written to ask about various things.
Today, I'd like to address the work Ryan Shrout has been doing over at PC Perspective, which we've discussed briefly in the past. Ryan has been helping a very big industry player to test a toolset that can capture every frame coming out of a graphics card over a DVI connection and then analyze frame delivery times. The basic innovation here is a colored overlay that varies from one frame to the next, a sort of per-frame watermark.
The resulting video can be analyzed to see all sorts of things. Of course, one can extract basic frame times like we get from Fraps but at the ultimate end of the rendering pipeline. These tools also let you see what portion of the screen is occupied by which frames when vsync is disabled. You could also detect when frames aren't delivered in the order they were rendered. All in all, very useful stuff.
Interestingly, in this page of Ryan's Titan review, he reproduces images that suggest a potentially serious problem with AMD's CrossFire multi-GPU scheme. Presumably due to sync issues between the two GPUs, only tiny slices of some frames, a few pixels tall, are displayed on screen. The value of ever having rendered these frames that aren't really shown to the user is extremely questionable, yet they show up in benchmark results, inflating FPS averages and the like.
That's, you know, not good.
As Ryan points out, problems of this sort won't necessarily show up in Fraps frame time data, since Fraps writes its timestamp much earlier in the rendering pipeline. We've been cautious about multi-GPU testing with Fraps for this very same reason. The question left lingering out there by Ryan's revelation is the extent of the frame delivery problems with CrossFire. Further investigation is needed.
I'm very excited by the prospects for tools of this sort, and I expect we'll be using something similar before long. With that said, I do want to put in a good word for Fraps in this context.
I hesitate to do this, since I don't want to be known as the "Fraps guy." Fraps is just a tool, and maybe not the best one for the long term. I'm not that wedded to it.
But Ryan has some strongly worded boldface statements in his article about Fraps being "inaccurate in many cases" and not properly reflecting "the real-world gaming experience the user has." His big industry partner has been saying similar things about Fraps not being "entirely accurate" to review site editors behind the scenes for some time now.
True, Fraps doesn't measure frame delivery to the display. But I really dislike that "inaccurate" wording, because I've seen no evidence to suggest that Fraps is inaccurate for what it measures, which is the time when the game engine presents a new frame to the DirectX API.
Taking things a step further, it's important to note that frame delivery timing itself is not the be-all, end-all solution that one might think, just because it monitors the very end of the pipeline. The truth is, the content of the frames matters just as much to the smoothness of the resulting animation. A constant, evenly spaced stream of frames that is out of sync with the game engine's simulation timing could depict a confusing, stuttery mess. That's why solutions like Nvidia's purported frame metering technology for SLI aren't necessarily a magic-bullet solution to the trouble with multi-GPU schemes that use alternate frame rendering.
In fact, as Intel's Andrew Lauritzen has argued, interruptions in game engine simulation timing are the most critical contributor to less-than-smooth animation. Thus, to the extent that Fraps timestamps correspond to the game engine's internal timing, the Fraps result is just as important as the timing indicated by those colored overlays in the frame captures. The question of how closely Fraps timestamps match up with a game's internal engine timing is a complex one that apparently will vary depending on the game engine in question. Mark at ABT has demonstrated that Fraps data looks very much like the timing info exposed by several popular game engines, but we probably need to dig into this question further with top-flight game developers.
Peel back this onion another layer or two, and things can become confusing and difficult in a hurry. The game engine has its timing, which determines the content of the frames, and the display has its own independent refresh loop that never changes. Matching up the two necessarily involves some slop. If you force the graphics card to wait for a display refresh before flipping to a new frame, that's vsync. Partial frames aren't displayed, so you won't see tearing, but frame output rates are quantized to the display refresh rate or a subset of it. Without vsync, the display refresh constraint doesn't entirely disappear. Frames still aren't delivered when ready, exactly—fragments of them are, if the screen is being painted at the time.
What we should make of this reality isn't clear.
That's why I said last time that we're not likely to have a single, perfect number to summarize smooth gaming performance any time soon. That doesn't mean we're not offering much better results than FPS averages have in the past. In fact, I think we're light years beyond where we were two years ago. But we'll probably continue to need tools that sample from multiple points in the rendering pipeline, at least unless and until display technology changes. I think Fraps, or something like it, fits into that picture as well as frame capture tools.
I also continue to think that the sheer complexity of the timing issues in real-time graphics rendering and displays means that our choice to focus on high-latency frames as the primary problem was the right one. Doing so orders our priorities nicely, because any problems that don't involve high-latency frames necessarily involve relatively small amounts of time and are inescapably "filtered" to some extent by the display refresh cycle. There's no reason to get into the weeds by chasing minor variance between frame times, at least not yet. Real-time graphics has tolerated small amounts of variance from various sources for years while enjoying wild success.As the second turns: further developments
I told myself I'd try to keep pace with any developments across the web related to our frame-latency-based game benchmarking methods, but I've once again fallen behind. That's good, in a way, because there's lots going on. Let me try to catch you up on the latest with a series of hard-hitting bullet points, not necessarily in the order of importance.
Also, a word on words. Although I'm reading a Google translation, I can see that they used the word "microstuttering" to describe the frame latency issues on the Radeon. For what it's worth, I prefer to reserve the term "microstuttering" for the peculiar sort of problem often encountered in multi-GPU setups where frame times oscillate in a tight, alternating pattern. That, to me, is "jitter," too. Intermittent latency spikes are problematic, of course, but aren't necessarily microstuttering. I expect to fail in enforcing this preference anywhere beyond TR, of course.
The colored overlays that track frame delivery are nifty, but I'm pleased to see Ryan looking at frame contents rather than just frame delivery, because what matters to animation isn't just the regularity with which frames arrive at the display. The content of those frames is vital, too. As Andrew Lauritzen noted in his B3D post, disrupted timing in the game engine can interrupt animation fluidity even if buffering manages to keep frames arriving at the display at regular intervals.
Those folks who are still wary of using Fraps because it writes a timestamp at single point in the process will want to chew on the implications of that statement for a while. Another implication: we'll perhaps always need to supplement any quantitative results with qualitative analysis in order to paint the whole picture. So... this changes nothing!
Although it may be confusing to some folks, we will probably keep talking about frame rendering in terms of latency, just as we do with input lag. That's because I continue to believe game performance is fundamentally a frame-latency-based problem. We just need to remember which type of latency is which—and that frame latency is just a subset of the overall input-response chain.
That's all for now, folks. More when it happens.As the second turns: the web digests our game testing methods
A funny thing happened over the holidays. We went into the break right after our Radeon vs. GeForce rematch and follow-up articles had caused a bit of a stir. Also, our high-speed video had helped to illustrate the problems we'd identified with smooth animation, particularly on the Radeon HD 7950. All of this activity brought new attention to the frame latency-focused game benchmark methods we proposed in my "Inside the second" article over a year ago and have been refining since.
As we were busy engaging in the holiday rituals of overeating and profound regret, a number of folks across the web were spending their spare time thinking about latency-focused game testing, believe it or not. We're happy to see folks seriously considering this issue, and as you might expect, we're learning from their contributions. I'd like to highlight several of them here.
Perhaps the most notable of these contributions comes from Andrew Lauritzen, a Tech Lead at Intel. According to his home page, Andrew works "with game developers and researchers to improve the algorithms, APIs and hardware used for real-time rendering." He also occasionally chides me on Twitter. Andrew wrote up a post at Beyond3D titled "On TechReport's frame latency measurement and why gamers should care." The main thrust of his argument is to support our latency-focused testing methods and to explain the need for them in his own words. I think he makes that case well.
Uniquely, though, he also addresses one of the trickier aspects of latency-focused benchmarking: how the graphics pipeline works and how the tool that we've been using to measure latencies, Fraps, fits into it.
As we noted here, Fraps simply writes a timestamp at a certain point in the frame production pipeline, multiple stages before that frame is output to the display. Many things, both good and bad, can happen between the hand-off of the frame from the game engine and the final display of the image on the monitor. For this reason, we've been skittish about using Fraps-based frame-time measurements with multi-GPU solutions, especially those that claim to include frame metering, as we explained in our GTX 690 review. We've proceeded to use Fraps in our single-GPU testing because, although its measurements may not be a perfect reflection of what happens at the display output, we think they are a better, more precise indication of in-game animation smoothness than averaging FPS over time.
Andrew addresses this question in some depth. I won't reproduce his explanation here, which is worth reading in its entirety and covers the issues of pipelining, buffering, and CPU/driver-GPU interactions. Interestingly, Andrew believes that in the case of latency spikes, buffered solutions may produce smooth frame delivery to the display. However, even if that's the case, the timing of the underlying animation is disrupted, which is just as bad:
This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott's follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the affect of the game adjusting how far it steps the simulation in time each frame. . . . A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.
Disruptions in the timing of the game simulation, he argues, are precisely what we want to avoid in order to ensure smooth gameplay—and Fraps writes its timestamps at a critical point in the process:
Games measure the throughput of the pipeline via timing the back-pressure on the submission queue. The number they use to update their simulations is effectively what FRAPS measures as well.
In other words, if Fraps captures a latency spike, the game's simulation engine likely sees the same thing, with the result being disrupted timing and less-than-smooth animation.
There's more to Andrew's argument, but his insights about the way game engines interact with the DirectX API, right at the point where Fraps captures its data points, are very welcome. I hope they'll help persuade folks who might have been unsure about latency-focused testing methods to give them a try. Andrew concludes that "If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites."
With impeccable timing, then, Mark at AlienBabelTech has just published an article that asks the question: "Is Fraps a good tool?" He attempts to answer the question by comparing the frame times recorded by Fraps to those recorded by the tools embedded in several game engines. You can see Mark's plots of the results for yourself, but the essence of his findings is that the game engine and Fraps output are "so similar as to convey approximately the same information." He also finds that the results capture a sense of the fluidity of the animation. The frame time plot "fits very well in with the experience of watching this benchmark – a small chug at the beginning, then it settles down until the scene changes and lighting comes into play – smoothness alternates with slight jitter until we reach the last scene that settles down nicely."
With the usefulness of Fraps and frame-time measurements established, Mark says his next step will be to test a GeForce GTX 680 and a Radeon HD 7970 against each other, complete with high-speed video comparisons. We look forward to his follow-up article.
Speaking of follow-up, I know many of you are wondering how AMD plans to address the frame latency issues we've identified in several newer games. We have been working with AMD, most recently running another quick set of tests right before Christmas with the latest Catalyst 12.11 beta and CAP update, just to ensure the problems we saw weren't already resolved in a newer driver build. We haven't heard much back yet, but we noticed in the B3D thread that AMD's David Baumann says the causes of latency spikes are many—and he offers word of an impending fix for Borderlands 2:
There is no one single thing for, its all over the place - the app, the driver, allocations of memory, CPU thread priorities, etc., etc. I believe some of the latency with BL2 was, in fact, simply due to the size of one of the buffers; a tweak to is has improved it significantly (a CAP is in the works).
This news bolsters our sense that the 7950's performance issues were due to software optimization shortfalls. We saw spiky frame time plots with BL2 both in our desktop testing and in Cyril's look at the Radeon HD 8790M, so we're pleased to see that a fix could be here soon via a simple CAP update.
Meanwhile, if you'd like to try your hand at latency-focused game testing, you may want to know about an open-source tool inspired by our work and created by Lindsay Bigelow. FRAFS Benchmark Viewer parses and graphs the frame time data output by Fraps. I have to admit, I haven't tried it myself yet since our own internal tools are comfortingly familiar, but this program may be helpful to those whose Excel-fu is a little weak.
Finally, we have a bit of a debate to share with you. James Prior from Rage3D was making some noises on Twitter about a "problem" with our latency-focused testing methods, and he eventually found the time to write me an email with his thoughts. I replied, he replied, and we had a a nice discussion. James has kindly agreed to the publication of our exchange, so I thought I'd share it with you. It's a bit lengthy and incredibly nerdy, so do what you will with it.
Here's is James's initial email:
Alrighty, had some time to play with it and get some thoughts together. First of all, not knocking what you're doing - I think it's a good thing. When I said 'theres a big flaw' here's what I'm thinking.
When I look at inside the second, the data presentation doesn't lend itself to supporting some of the conclusions. This is not because you're wrong but because I'm not sure of the connection between the two. Having played around with looking at 99% time, I think that it's not a meaningful metric in gauging smoothness of itself, it shows uneven render time but not the impact of that on game experience, which was the whole point. It's another way of doing 'X number is better than Y number'.
I agree with you that a smoothness metric is needed. I concur with your thoughts about FPS rates not being the be-all end-all, and 60fps vsync isn't the holy grail. The problem is the perception of smoothness, and quantifying that. If you have a 25% framerate variation at 45fps you're going to notice it more than a 25% framerate variation at 90fps. 99% time shows when you have a long time away from the average frame rate but not that the workload changes, so is naturally very dependent on the benchmark data, time period and settings.
What I would (and am, but it took me 2 weeks to write this email, I'm so time limited) aim for is to find a way to identify a standard deviation and look for ways to show that. So when you get a line of 20-22ms frames interrupted by a 2x longer frame time and possibly a few half as long frame times (the 22, 22, 58, 12, 12, 22, 22ms pattern) you can identify it, and perhaps count the number of times it happens inside the dataset.
Next up would be 'why' and that can start with game settings - changing MSAA, AO, resolution, looking for game engine bottlenecks and then looking at drivers and CPU config. People have reported stuttering frame rates from different mice, having HT enabled, having the NV AO code running on the AMD card (or vice versa).
In summary - I think the presentation of the data doesn't show the problem at the extent it's an issue for gamers. I think it's too simplistic to say 'more 99% time on card a, it's no good'. But that's an editorial decision for you, not me.
The videos of skyrim were interesting but of no value to me, it's a great way to show people how to idenitify the problem but unless you frame sync the camera to the display and can find a way to reduce the losses of encoding to show it, it's not scientific. Great idea though, help people understand what you're describing.
Thanks for being willing to listen, and have a Merry Christmas :)
My response follows:
Hey, thanks for finally taking time to write. Glad to see you've considered these things somewhat.
I have several thoughts in response to what you've written, but the first and most important one is simply to note that you've agreed with the basic premise that FPS averages are problematic. Once we reach that point and are talking instead about data presentation and such, we have agreed fundamentally and are simply squabbling over details. And I'm happy to give a lot of ground on details in order to find the best means of analyzing and presenting the data to the reader in a useful format.
With that said, it seems to me you've concentrated on a single part of our data presentation, the 99th percentile frame time, and are arguing that the 99th percentile frame time doesn't adequately communicate the "smoothness" of in-game animation.
I'd say, if you look at our work over the last year in total, you'd find that we're not really asking the 99th percentile frame time to serve that role exclusively or even primarily.
Before we get to why, though, let's establish another fundamental. That fundamental reality is that animation involves flipping through a series of frames in sequence (with timing that's complicated somewhat by its presentation on a display with a fixed refresh rate.) The single biggest threat to smooth animation in that context is delays or high-latency frames. When you wait too long for the next flip, the illusion of motion is threatened.
I'm much more concerned with high-latency frames than I am with variance from a mean, especially if that variance is on the low side of the mean. Although a series of, say, 33 ms frames might be the essence of "smoothness," I don't consider variations that dip down to 8 ms from within that stream to be especially problematic. As long as the next update comes quickly, the illusion of motion will persist and be relatively unharmed. (There are complicated timing issues here involving the position of underlying geometry at render time and display refresh intervals that pull in different directions, but as long as the chunks of time involved are small enough, I don't think they get much chance to matter.) Variations *above* the mean, especially big ones, are the real problem.
At its root, then, real-time graphics performance is a latency-sensitive problem. Our attempts to quantify in-game smoothness take that belief as fundamental.
Given that, we've borrowed the 99th percentile latency metric from the server world, where things like database transaction latencies are measured in such terms. As we've constantly noted, the 99th percentile is just one point on a curve. As long as we've collected enough data, though, it can serve as a reliable point of comparison between systems that are serving latency-sensitive data. It's a single sample point from a large data set that offers a quick summary of relative performance.
With that in mind, we've proposed the 99th percentile frame time as a potential replacement for the (mostly pointless) traditional FPS average. The 99th percentile frame time has also functioned for us as a companion to the FPS average, a sort of canary in the coal mine. When the two metrics agree, generally that means that frame rates are both good *and* consistent. When they disagree, there's usually a problem with consistent frame delivery.
So the 99th percentile does some summary work for us that we find useful.
But it is a summary, and it rules out the last 1% of slow frames, so I agree that it's not terribly helpful as a presentation of animation smoothness. That's why our data presentation includes:
1) a raw plot of frame times from a single benchmark run,
2) the full latency curve from 50-100% of frames rendered,
3) the "time spend beyond 50 ms" metric, and
4) sometimes zoomed-in chunks of the raw frame time plots.
*Those* tools, not the 99th percentile summary, attempt to convey more useful info about smoothness.
My favorite among them as a pure metric of smoothness is "time spent beyond 50 ms."
50 milliseconds is our threshold because at a steady state it equates to 20 FPS, which is pretty slow animation, where the illusion of motion is starting to be compromised. (The slowest widespread visual systems we have, in traditional cinema, run at 24 FPS.) Also, if you wait more than 50 ms for the next frame on a 60Hz display with vsync, you're waiting through *four* display refresh cycles. Bottom line: frame times over 50 ms are problematic. (We could argue over the exact threshold, but it has to be somewhere in this neighborhood, I think.)
At first, to quantify interruptions in smooth animation, we tried just counting the number of frames that take over 50 ms to render. The trouble with that is that a 51 ms frame counts the same as a 108 ms frame, and faster solutions can sometimes end up producing *more* frames over 50 ms than slower ones.
To avoid those problems, we later decided to account for how far the frame times are over our threshold. So what we do is add up all of the time spent rendering beyond our threshold. For instance, a 51 ms frame adds 1 ms to the count, while an 80 ms frame adds 30 ms to our count. The more total time spent beyond the threshold, the more the smoothness of the animation has been compromised.
It's not perfect, but I think that's a pretty darned good way to account for interruptions in smoothness. Of course, the results from these "outlier" high-latency frames can vary from run to run, so we take the "time beyond X" for each of the five of the test runs we do for each card and report the median result.
In short, I don't disagree entirely with your notion that the 99th percentile frame time doesn't tell you everything you might need to know. That's why our data presentation is much more robust than just a single number, and why we've devised a different metric that attempts to convey "smoothness"--or the lack of it.
I'd be happy to hear your thoughts on alternative means of analyzing and presenting frame time data. Once we agree that FPS averages hide important info about slowdowns, we're all in the same boat, trying to figure out what comes next. Presenting latency-sensitive metrics is a tough thing to do well for a broad audience that is accustomed to much simpler metrics, and we're open to trying new things that might better convey a sense of the realities involved.
And here is James's reply:
First up, yes I absolutely agree that FPS averages aren't the complete picture. Your cogent and comprehensive response details the thinking behind your methodology very nicely. You are correct, I did choose to highlight 99% time as my first point, and your clarification regarding the additional data you review and present is well taken.
I agree with you about the 50ms/20fps ‘line in the sand’, for watching animated pictures. My personal threshold for smoothness in movies is about 17-18, my wife’s is 23.8. For gaming however, I find around 35fps / 29ms per frame is where I get pissed off and call it an unplayable slideshow unless it is an RTS - I was prepared to hate C&C locked at 30fps but found it quite pleasant. This was based on not only animation smoothness but smoothness of response to input. Human perception is a funny thing, it changes with familiarity and temperament.
So on that basis I concur that dipping from 22ms to 50ms is perceptible in ‘palm of the hand’ and 99% plus 50ms statistics address identify that nicely. Where I disagree with you is the moving from 22ms to say 11ms isn't noticeable, especially if it is an experientially significant amount of time for the latency consumer - the player. Running along at 22ms and switching to 11ms probably won’t be perceived badly, but the regression back to 22ms might be, especially if it happens frequently. I experienced this first hand when I benched Crossfire 7870’s in Eyefinity, with VECC added SSAA. The fraps average was high, in the 60’s, the min was around 38. The problem was the feel, it looked smooth, but the response was input was terrible. The perceived average FPS was closer to the minimum and wasn’t smooth and so despite being capable of stutter free animation, the playability was ruined due to frame rate variation from 38fps to ~90fps. The problem ended up being memory bandwidth, as increasing clocks improved the feel and reduced the variation; this was reinforced by moving from SSAA to AAA and standard MSAA; the less intensive modes were silky smooth, AAA being in the same general performance range.
This can be observed on the raw frame rate graph, a saw tooth pattern will be seen if the plot resolution is right, but when examining a plot covering perhaps minutes of data showing tens of frame render times per second then you need a systemic approach for consistency and time cost of the analyst.
The obvious answer is to restrict your input data, find a benchmark session that doesn’t do that but then you end up with a question of usefulness to your latency processor again - the player. Does the section of testing represent the game fairly? Is the provided data enough for someone to know that the card will cope with the worst case scenarios of the game, is there enough data for each category of consumer - casual player, IQ/feature enthusiast, game enthusiast, performance enthusiast, competitive gamer, system builder, family advisor, mom upgrading little Jonny’s gateway - to understand the experience?
Servers talk to servers, games talk to people. We can base analysis methodology on what comes from the server world, and then move on to finding a way to consistently quantify the experience so that the different experience levels show through.
I'll confess I still owe him a response to this message. We seem to have ended up agreeing on the most important matters at hand, though, and the issues he raises in his reply are a bit of a departure from our initial exchange. It seems to me James is thinking in the right terms, and I look forward to seeing how he implements some of these ideas in his own game testing in the future.
You can follow me on Twitter for more nerdy exchanges.Freshening up a home network can yield big bandwidth benefits
One of the funny things about being a PC enthusiast, for me, is how there's a constant ebb and flow of little projects that I end up tackling. At one point, I may be busy updating and tuning my HTPC, and shortly after that's finished, I'm on to something else. One way or another, it seems I'm almost always trying to fix or improve something.
My project lately has been optimizing my home network. By nature, my hardware testing work requires me to move lots of data around, whether it's deploying images to test rigs, downloading new games from Steam, or uploading videos to YouTube. I've noticed that I spend quite a bit of time waiting on various data transfer operations. Within certain limits, that's probably an indicator that some money could be well spent on an upgrade.
The first step in the process was getting my cable modem service upgraded. I'm too far out in the 'burbs to partake of the goodness of Google Fiber happening in downtown Kansas City, so I'm stuck with Time Warner Cable.
For a while, I'd been paying about 60 bucks a month for Road Runner "Turbo" cable modem service with a 15Mbps downstream and a 1Mbps upstream. We use a host of 'net based services like Netflix and Vonage, along with the aforementioned work traffic and hosting a Minecraft server for my kids, so both the upstream and downstream were feeling sluggish at times.
Time Warner Cable's website told me I could get 20Mbps downstream and 2Mbps upstream for $49.99 a month here in my area. There's also an option for 30Mbps down and 5Mbps up for $59.99. I was vaguely aware that my old-ish cable modem would have to be replaced with a newer model to enable the higher speed service, so I disconnected the modem and headed to the local Time Warner store, hoping to exchange it and upgrade my service.
When I got there, the salesperson informed me I could upgrade, but insisted that I'd need to pay an additional $15 per month above my current rate in order to get 20Mbps/2Mbps service. I asked if she was sure about that and whether there were any better pricing options, but she insisted. As she typed away, beginning the service change, I pulled up the Time Warner website on my phone, attempting to get that pricing info—which was conveniently hidden on the mobile site. I fumbled for a while as she kept typing, because apparently service tier changes require a 25-page written report. Only after my third inquiry, some bluster from me, and a whole lot more typing did she decide that she could give me the $49.99 price for 20Mbps/2Mbps service.
I later talked another rep into switching me to the 30Mbps/5Mbps service for $59.99, instead. Heh.
Anyhow, I eventually came home with a rather gigantic new cable modem and, for the about same price I'd been paying before, started enjoying double the downstream bandwidth and 5X the upstream. The difference is very much noticeable in certain cases, such as Steam downloads and YouTube uploads.
I suppose the morals of this story are: 1) if you have an older cable modem, you may be able to get faster service by swapping it out for a newer one, thanks to newer DOCSIS tech, and 2) you may also be eligible for better pricing if you do some research and prod your service provider sufficiently. Don't just take what they're giving you now or even the newer options they're offering to existing customers. Look into the offers they're making to new customers, instead, and insist on the best price.
Only days after I'd posted my shiny new Speedtest.net results on Twitter, I turned my attention to our internal home network. Although I really like my Netgear WNDR3700 router, we've never used it to its full potential. The 5GHz band is practically empty, either due to lack of device support or range issues. Signals in that band just won't reach reliably into most of the bedrooms, so it's a no-go for anything mobile.
The range is great on our 2.4GHz network, but transfer rates are kind of pokey. There are many reasons for that. At the top of the list is a ridiculous number of devices connected at any given time. Between phones, tablets, PCs, and other devices, I can count 12 off the top of my head right now. There may be more.
You may be in the same boat. I didn't plan for this; it just happened.
Also, we have a silly number of other devices throwing off interference in the 2.4GHz range, including wireless mice, game controllers, Bluetooth headsets, the baby monitor, apparently our microwave oven, and probably a can opener or something, too.
One particular client system, my wife's kitchen PC, really needed some help. We store all of our family photos and videos on my PC, and my wife accesses them over the Wi-Fi network. As the megapixel counts for digital cameras have grown, so has her frustration. The process of pulling up thumbnails in a file viewer was excruciating.
Her system had a 2.4GHz 802.11g Wi-Fi adapter in it, which caused several problems. One was its own inherent limit of 54Mbps peak transfer rates. The other was the fact that, in order to best accommodate it and other older Wi-Fi clients, I had switched my router's 2.4GHz Wi-Fi mode from its "Up to 130Mbps" default mode to "Up to 54Mbps"—that setting seemed to help the Kitchen PC, but at the cost of lower peak network speeds for wireless-n clients.
This problem should have been solved ages ago, but it had momentum on its side. The Kitchen PC's motherboard had a built-in Wi-Fi adapter with a nice integrated antenna poking out of the port cluster, and I was reluctant to change it. However, a quick audit of the devices on our network revealed something important: the Kitchen PC's 802.11g adapter was the only 802.11b/g client left on our network. Replacing its Wi-Fi adapter wouldn't just speed up its connection; it would also allow me to experiment with the higher-bandwidth 2.4GHz modes on my router.
Once I resolved to make a change, it was like the girls from Jersey Shore: stupidly cheap and easy. I decided to measure the impact of various options by noting the speed of Windows file copy to the Kitchen PC. With its built-in 802.11g adapter, which has a stubby antenna attached, file copies averaged 2MB/s.
I then disabled the internal adapter and switched to an insanely tiny USB-based 802.11n adapter that I happened to have on hand. These things cost ten bucks and have zero room for an antenna, but they seem to work. I also switched the router to "Up to 130Mbps" mode on the 2.4GHz band, since the last legacy device was gone. The changes didn't help much; copies averaged 1.88MB/s, practically the same. However, when I flipped the router into its 20/40Hz mode ("Up to 300Mbps"), transfer rates more than doubled, to 5MB/s.
Better, but not great.
To really improve, I needed to make use of that practically empty 5GHz bandwidth. As a stationary system not far from the router, the Kitchen PC was a perfect candidate. I ordered up a Netgear dual-band USB Wi-Fi adapter—20 bucks for a refurb—to make it happen. This adapter is large enough to have a decent-sized internal antenna, in addition to the dual-band capability. Once it was installed, Windows file copy speeds on the 5GHz band (in 20/40Hz mode) were a steady 14MB/s—fully seven times what they were initially. And that's with just four of out five bars of signal strength.
There are a couple of lessons here, too, I think. First, wireless-b and -g devices are really stinkin' old, and moving to better adapter hardware is worth the modest cost involved. Getting rid of those old clients may even help speed up your whole network. Second, if you have a dual-band router with lots of clients, make use of that 5GHz bandwidth where possible, especially on stationary systems that are in range of the base station.
Of course, the big takeaway for this entire episode was this: devoting some attention to your home network can yield some nice benefits, especially if you've neglected it a bit. And heck, I haven't even started down the path to 802.11ac. Yet.
|SanDisk's Ultra II SSD combines TLC NAND with clever caching||9|
|Tuesday Night Shortbread||8|
|Asus has a smartwatch up its sleeve, plans Sep. 3 unveilng||8|
|New Corsair contraption controls fans, temps, LEDs||11|
|Enermax's new card readers are perfect for empty external bays||30|
|A quick look at AMD's Radeon R7 SSD||61|
|Rumor: AMD to shake up FX series on Labor Day||81|
|Curved IPS panel powers ultra-wide LG monitor||59|
|New Star Citizen trailers show beautiful racing, FPS footage||53|