Earlier this week, we posted a news item about an article written by Ryan Shrout over at PC Perspective. In the article, Ryan revealed some problems with using a Radeon CrossFire multi-GPU setups and multiple displays.
Those problems look superficially similar to the ones we explored in our Radeon HD 7990 review. They were partially resolved—for single displays with resolutions of 2560x1600 and below, and for DirectX 10/11 games—by AMD's frame pacing beta driver. AMD has been forthright that it has more work to do in order to make CrossFire work properly with multiple displays, higher resolutions, and DirectX 9 games.
I noticed that many folks reacted to our news item by asking why this story matters, given the known issues with CrossFire that have persisted literally for years. I have been talking with Ryan and looking into these things for myself, and I think I can explain.
Let's start with the obvious: this story is news because nobody has ever looked at frame delivery with multi-display configs using these tools before. We first published results using Nvidia's FCAT tools back in March, and we've used them quite a bit since. However, getting meaningful results from multi-display setups is tricky when you can only capture one video output at a time, and, rah rah other excuses—the bottom line is, I never took the time to try capturing, say, the left-most display with the colored FCAT overlay and analyzing the output. Ryan did so and published the first public results.
That's interesting because, technically speaking, multi-display CrossFire setups work differently than single-monitor ones. We noted this fact way back in our six-way Eyefinity write-up: the card-to-card link over a CrossFire bridge can only transfer images up to to four megapixels in size. Thus, a CrossFire team connected to multiple displays must pass data from the secondary card to the primary card over PCI Express. The method of compositing frames for Eyefinity is simply different. That's presumably why AMD's current frame-pacing driver can't work its magic on anything beyond a single, four-megapixel monitor.
We already know that non-frame-paced CrossFire solutions on a single display are kind of a mess. Turns out that the problems are a bit different, and even worse, with multiple monitors.
I've been doing some frame captures myself this week, and I can tell you what I've seen. The vast majority of the time, CrossFire with Eyefinity drops every other frame with alarming consistency. About half of the frames just don't make it to the display at all, even though they're counted in software benchmarking tools like Fraps. I've seen dropped frames with single-display CrossFire, but nothing nearly this extreme.
Also, Ryan found a problem in some games where scan lines from two different frames become intermixed, causing multiple horizontal tearing artifacts on screen at once. (That's his screenshot above.) I've not seen this problem in my testing yet, but it looks to be a little worse and different from the slight "leakage" of an old frame into a newer one that we observed with CrossFire and one monitor. I need to do more testing in order to get a sense of how frequently this issue pops up.
The bottom line is that Eyefinity and CrossFire together appear to be a uniquely bad combination. Worse, these problems could be tough to overcome with a driver update because of the hardware bandwidth limitations involved.
This story is a bit of a powder keg for several reasons.
For one, the new marketing frontier for high-end PC graphics is 4K displays. As you may know, current 4K monitors are essentially the same as multi-monitor setups in their operation. Since today's display ASICs can't support 4K resolutions natively, monitors like the Asus PQ321Q use tiling. One input drives the left "tile" of the monitor, and a second feeds the right tile. AMD's drivers handle the PQ321Q just like a dual-monitor Eyefinity setup. That means the compositing problems we've explored happen to CrossFire configs connected to 4K displays—not the regular microstuttering troubles, but the amped-up versions.
Ryan tells me he was working on this story behind the scenes for a while, talking to both AMD and Nvidia about problems they each had with 4K monitors. You can imagine what happened when these two fierce competitors caught wind of the CrossFire problems.
For its part, Nvidia called together several of us in the press last week, got us set up to use FCAT with 4K monitors, and pointed us toward some specific issues with their competition. One the big issues Nvidia emphasized in this context is how Radeons using dual HDMI outputs to drive a 4K display can exhibit vertical tearing right smack in the middle of the screen, where the two tiles meet, because they're not being refreshed in sync. This problem is easy to spot in operation.
GeForces don't do this. Fortunately, you can avoid this problem on Radeons simply by using a single DisplayPort cable and putting the monitor into DisplayPort MST mode. The display is still treated as two tiles, but the two DP streams use the same timing source, and this vertical tearing effect is eliminated.
I figure if you drop thousands of dollars on a 4K gaming setup, you can spring for the best cable config. So one of Nvidia's main points just doesn't resonate with me.
And you've gotta say, it's quite the aggressive move, working to highlight problems with 4K displays just days ahead of your rival's big launch event for a next-gen GPU. I had to take some time to confirm that the Eyefinity/4K issues were truly different from the known issues with CrossFire on a single monitor before deciding to post anything.
That said, Nvidia deserves some credit for making sure its products work properly. My experience with dual GeForce GTX 770s and a 4K display has been nearly seamless. Plug in two HDMI inputs or a single DisplayPort connection with MST, and the GeForce drivers identify the display and configure it silently without resorting to the Surround setup UI. There's no vertical tearing if you choose to use dual HDMI inputs. You're going to want to use multiple graphics cards in order to get fluid gameplay at 4K resolutions, and Nvidia's frame metering tech allows our dual-GTX 770 SLI setup to deliver. It's noticeably better than dual Radeon HD 7970s, and not in a subtle way. Nvidia has engineered a solution that overcomes a lot of obstacles in order to make that happen. Give them props for that.
As for AMD, well, one can imagine the collective groan that went up in their halls when word of these problems surfaced on the eve of their big announcement. The timing isn't great for them. I received some appeals to my better nature, asking me not to write about these things yet, telling me I'd hear all about AMD's 4K plans next week. I expect AMD to commit to fixing the problems with its existing products, as well as unveiling a newer and more capable high-end GPU. I'm looking forward to it.
But I'm less sympathetic when I think about how AMD has marketed multi-GPU solutions like the Radeon HD 7990 as the best solution for 4K graphics. We're talking about very expensive products that simply don't work like they should. I figure folks should know about these issues today, not later.
My hope is that we'll be adding another chapter to this story soon, one that tells the tale of AMD correcting these problems in both current and upcoming Radeons.Those next-gen games? Yeah, one just arrived
We've been swept away by a wave of talk about next-gen consoles since Sony unveiled the specs for the PlayStation 4, and we're due for another round when Microsoft reveals the next Xbox. The reception for the PS4 specs has largely been positive, even among PC gamers, because of what it means for future games. The PS4 looks to match the graphics horsepower of today's mid-range GPUs, something like a Radeon HD 7850. Making that sort of hardware the baseline for the next generation of consoles is probably a good thing for gaming, the argument goes.
Much of this talk is about potential, about the future possibilities for games as a medium, about fluidity and visual fidelity that your rods and cones will soak up like a sponge, crying out for more.
And I'm all for it.
But what if somebody had released a game that already realized that potential, that used the very best of today's graphics and CPU power to advance the state of the art in plainly revolutionary fashion, and nobody noticed?
Seems to me, that's pretty much what has happened with Crysis 3. I installed the game earlier this week, aware of the hype around it and expecting, heck, I dunno what—a bit of an improvement over Crysis 2, I suppose, that would probably run sluggishly even on high-end hardware. (And yes, I'm using high-end hardware, of course: dual Radeon HD 7970s on one rig and a GeForce GTX Titan on the other, both attached to a four-megapixel 30" monitor. Job perk, you know.)
Nothing had prepared me for what I encountered when the game got underway.
I've seen Far Cry 3 and Assasin's Creed 3 and other big-name games with "three" in their titles that pump out the eye candy, some of them very decent and impressive and such, but what Crytek has accomplished with Crysis 3 moves well beyond anything else out there. The experience they're creating in real time simply hasn't been seen before, not all in one place. You can break it down to a host of component parts—an advanced lighting model, high-res textures, complex environments and models, a convincing physics simulation, expressive facial animation, great artwork, and what have you. Each one of those components in Crysis 3 is probably the best I've ever seen in an interactive medium.
And yes, the jaw-dropping cinematics are all created in real time in the game engine, not pre-rendered to video.
But that's boring. What's exciting is how all of those things come together to make the world you're seeing projected in front of your face seem real, alive, and dangerous. To me, this game is a milestone; it advances the frontiers of the medium and illustrates how much better games can be. This is one of those "a-ha!" moments in tech, where expectations are reset with a tingly, positive feeling. Progress has happened, and it's not hard to see.
Once I realized that fact, I popped open a browser tab and started looking at reviews of Crysis 3, to find out what others had to say about the game. I suppose that was the wrong place to go, since game reviewing has long since moved into fancy-pants criticism that worries about whether the title in question successfully spans genres or does other things that sound vaguely French in origin. Yeah, I like games that push those sorts of boundaries, too, but sometimes you have to stop and see the forest full of impeccably lit, perfectly rendered trees.
Seems to me like, particularly in North America, gamers have somehow lost sight of the value of high-quality visuals and how they contribute to the sense of immersion and, yes, fun in gaming. Perhaps we've scanned through too many low-IQ forum arguments about visual quality versus gameplay, as if the two things were somehow part of an engineering tradeoff, where more of one equals less of the other. Perhaps the makers of big-budget games have provided too many examples of games that seem to bear out that logic. I think we could include Crytek in that mix, with the way Crysis 2 wrapped an infuriatingly mediocre game in ridiculously high-zoot clothing.
Whatever our malfunction is, we ought to get past it. Visuals aren't everything, but these visuals sure are something. A game this gorgeous is inherently more compelling than a sad, Xboxy-looking console port where all surfaces appear to be the same brand of shiny, blurry plastic, where the people talking look like eerily animated mannequins. Even if Crytek has too closely answered, you know, the call of duty when it comes to gameplay and storylines, they have undoubtedly achieved something momentous in Crysis 3. They've made grass look real, bridged the uncanny valley with incredible-looking human characters, and packed more detail into each square inch of this game than you'll find anywhere else. Crysis 3's core conceit, that you're stealthily hunting down bad guys while navigating through this incredibly rich environment, works well because of the stellar visuals, sound, and physics.
My sense is that Crysis 3 should run pretty well at "high" settings on most decent gaming PCs, too. If not on yours, well, it may be time to upgrade. Doing so will buy you a ticket to a whole new generation of visual fidelity in real-time graphics. I'd say that's worth it. To give you a sense of what you'd be getting, have a look at the images in the gallery below. Click "view full size" to see them in their full four-megapixel glory. About half the shots were take in Crysis 3's "high" image quality mode, since "very high" was a little sluggish, so yes, it can get even better as PC graphics continues marching forward.
I've been buried in my own work while preparing our Titan review, but lots has happened in the past few weeks, as many in the industry have moved toward adopting some form of game performance testing based on frame rendering times rather than traditional FPS. Feels like we've crossed a threshold, really. There's work to be done figuring out how to capture, analyze, and present the data, but folks seem to have embraced the basic approach of focusing on frame times rather than FPS averages. I'm happy to see it.
I've committed to writing about developments in frame-latency-based testing as they happen, and since so much has been going on, some of you have written to ask about various things.
Today, I'd like to address the work Ryan Shrout has been doing over at PC Perspective, which we've discussed briefly in the past. Ryan has been helping a very big industry player to test a toolset that can capture every frame coming out of a graphics card over a DVI connection and then analyze frame delivery times. The basic innovation here is a colored overlay that varies from one frame to the next, a sort of per-frame watermark.
The resulting video can be analyzed to see all sorts of things. Of course, one can extract basic frame times like we get from Fraps but at the ultimate end of the rendering pipeline. These tools also let you see what portion of the screen is occupied by which frames when vsync is disabled. You could also detect when frames aren't delivered in the order they were rendered. All in all, very useful stuff.
Interestingly, in this page of Ryan's Titan review, he reproduces images that suggest a potentially serious problem with AMD's CrossFire multi-GPU scheme. Presumably due to sync issues between the two GPUs, only tiny slices of some frames, a few pixels tall, are displayed on screen. The value of ever having rendered these frames that aren't really shown to the user is extremely questionable, yet they show up in benchmark results, inflating FPS averages and the like.
That's, you know, not good.
As Ryan points out, problems of this sort won't necessarily show up in Fraps frame time data, since Fraps writes its timestamp much earlier in the rendering pipeline. We've been cautious about multi-GPU testing with Fraps for this very same reason. The question left lingering out there by Ryan's revelation is the extent of the frame delivery problems with CrossFire. Further investigation is needed.
I'm very excited by the prospects for tools of this sort, and I expect we'll be using something similar before long. With that said, I do want to put in a good word for Fraps in this context.
I hesitate to do this, since I don't want to be known as the "Fraps guy." Fraps is just a tool, and maybe not the best one for the long term. I'm not that wedded to it.
But Ryan has some strongly worded boldface statements in his article about Fraps being "inaccurate in many cases" and not properly reflecting "the real-world gaming experience the user has." His big industry partner has been saying similar things about Fraps not being "entirely accurate" to review site editors behind the scenes for some time now.
True, Fraps doesn't measure frame delivery to the display. But I really dislike that "inaccurate" wording, because I've seen no evidence to suggest that Fraps is inaccurate for what it measures, which is the time when the game engine presents a new frame to the DirectX API.
Taking things a step further, it's important to note that frame delivery timing itself is not the be-all, end-all solution that one might think, just because it monitors the very end of the pipeline. The truth is, the content of the frames matters just as much to the smoothness of the resulting animation. A constant, evenly spaced stream of frames that is out of sync with the game engine's simulation timing could depict a confusing, stuttery mess. That's why solutions like Nvidia's purported frame metering technology for SLI aren't necessarily a magic-bullet solution to the trouble with multi-GPU schemes that use alternate frame rendering.
In fact, as Intel's Andrew Lauritzen has argued, interruptions in game engine simulation timing are the most critical contributor to less-than-smooth animation. Thus, to the extent that Fraps timestamps correspond to the game engine's internal timing, the Fraps result is just as important as the timing indicated by those colored overlays in the frame captures. The question of how closely Fraps timestamps match up with a game's internal engine timing is a complex one that apparently will vary depending on the game engine in question. Mark at ABT has demonstrated that Fraps data looks very much like the timing info exposed by several popular game engines, but we probably need to dig into this question further with top-flight game developers.
Peel back this onion another layer or two, and things can become confusing and difficult in a hurry. The game engine has its timing, which determines the content of the frames, and the display has its own independent refresh loop that never changes. Matching up the two necessarily involves some slop. If you force the graphics card to wait for a display refresh before flipping to a new frame, that's vsync. Partial frames aren't displayed, so you won't see tearing, but frame output rates are quantized to the display refresh rate or a subset of it. Without vsync, the display refresh constraint doesn't entirely disappear. Frames still aren't delivered when ready, exactly—fragments of them are, if the screen is being painted at the time.
What we should make of this reality isn't clear.
That's why I said last time that we're not likely to have a single, perfect number to summarize smooth gaming performance any time soon. That doesn't mean we're not offering much better results than FPS averages have in the past. In fact, I think we're light years beyond where we were two years ago. But we'll probably continue to need tools that sample from multiple points in the rendering pipeline, at least unless and until display technology changes. I think Fraps, or something like it, fits into that picture as well as frame capture tools.
I also continue to think that the sheer complexity of the timing issues in real-time graphics rendering and displays means that our choice to focus on high-latency frames as the primary problem was the right one. Doing so orders our priorities nicely, because any problems that don't involve high-latency frames necessarily involve relatively small amounts of time and are inescapably "filtered" to some extent by the display refresh cycle. There's no reason to get into the weeds by chasing minor variance between frame times, at least not yet. Real-time graphics has tolerated small amounts of variance from various sources for years while enjoying wild success.As the second turns: further developments
I told myself I'd try to keep pace with any developments across the web related to our frame-latency-based game benchmarking methods, but I've once again fallen behind. That's good, in a way, because there's lots going on. Let me try to catch you up on the latest with a series of hard-hitting bullet points, not necessarily in the order of importance.
Also, a word on words. Although I'm reading a Google translation, I can see that they used the word "microstuttering" to describe the frame latency issues on the Radeon. For what it's worth, I prefer to reserve the term "microstuttering" for the peculiar sort of problem often encountered in multi-GPU setups where frame times oscillate in a tight, alternating pattern. That, to me, is "jitter," too. Intermittent latency spikes are problematic, of course, but aren't necessarily microstuttering. I expect to fail in enforcing this preference anywhere beyond TR, of course.
The colored overlays that track frame delivery are nifty, but I'm pleased to see Ryan looking at frame contents rather than just frame delivery, because what matters to animation isn't just the regularity with which frames arrive at the display. The content of those frames is vital, too. As Andrew Lauritzen noted in his B3D post, disrupted timing in the game engine can interrupt animation fluidity even if buffering manages to keep frames arriving at the display at regular intervals.
Those folks who are still wary of using Fraps because it writes a timestamp at single point in the process will want to chew on the implications of that statement for a while. Another implication: we'll perhaps always need to supplement any quantitative results with qualitative analysis in order to paint the whole picture. So... this changes nothing!
Although it may be confusing to some folks, we will probably keep talking about frame rendering in terms of latency, just as we do with input lag. That's because I continue to believe game performance is fundamentally a frame-latency-based problem. We just need to remember which type of latency is which—and that frame latency is just a subset of the overall input-response chain.
That's all for now, folks. More when it happens.As the second turns: the web digests our game testing methods
A funny thing happened over the holidays. We went into the break right after our Radeon vs. GeForce rematch and follow-up articles had caused a bit of a stir. Also, our high-speed video had helped to illustrate the problems we'd identified with smooth animation, particularly on the Radeon HD 7950. All of this activity brought new attention to the frame latency-focused game benchmark methods we proposed in my "Inside the second" article over a year ago and have been refining since.
As we were busy engaging in the holiday rituals of overeating and profound regret, a number of folks across the web were spending their spare time thinking about latency-focused game testing, believe it or not. We're happy to see folks seriously considering this issue, and as you might expect, we're learning from their contributions. I'd like to highlight several of them here.
Perhaps the most notable of these contributions comes from Andrew Lauritzen, a Tech Lead at Intel. According to his home page, Andrew works "with game developers and researchers to improve the algorithms, APIs and hardware used for real-time rendering." He also occasionally chides me on Twitter. Andrew wrote up a post at Beyond3D titled "On TechReport's frame latency measurement and why gamers should care." The main thrust of his argument is to support our latency-focused testing methods and to explain the need for them in his own words. I think he makes that case well.
Uniquely, though, he also addresses one of the trickier aspects of latency-focused benchmarking: how the graphics pipeline works and how the tool that we've been using to measure latencies, Fraps, fits into it.
As we noted here, Fraps simply writes a timestamp at a certain point in the frame production pipeline, multiple stages before that frame is output to the display. Many things, both good and bad, can happen between the hand-off of the frame from the game engine and the final display of the image on the monitor. For this reason, we've been skittish about using Fraps-based frame-time measurements with multi-GPU solutions, especially those that claim to include frame metering, as we explained in our GTX 690 review. We've proceeded to use Fraps in our single-GPU testing because, although its measurements may not be a perfect reflection of what happens at the display output, we think they are a better, more precise indication of in-game animation smoothness than averaging FPS over time.
Andrew addresses this question in some depth. I won't reproduce his explanation here, which is worth reading in its entirety and covers the issues of pipelining, buffering, and CPU/driver-GPU interactions. Interestingly, Andrew believes that in the case of latency spikes, buffered solutions may produce smooth frame delivery to the display. However, even if that's the case, the timing of the underlying animation is disrupted, which is just as bad:
This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott's follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the affect of the game adjusting how far it steps the simulation in time each frame. . . . A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.
Disruptions in the timing of the game simulation, he argues, are precisely what we want to avoid in order to ensure smooth gameplay—and Fraps writes its timestamps at a critical point in the process:
Games measure the throughput of the pipeline via timing the back-pressure on the submission queue. The number they use to update their simulations is effectively what FRAPS measures as well.
In other words, if Fraps captures a latency spike, the game's simulation engine likely sees the same thing, with the result being disrupted timing and less-than-smooth animation.
There's more to Andrew's argument, but his insights about the way game engines interact with the DirectX API, right at the point where Fraps captures its data points, are very welcome. I hope they'll help persuade folks who might have been unsure about latency-focused testing methods to give them a try. Andrew concludes that "If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites."
With impeccable timing, then, Mark at AlienBabelTech has just published an article that asks the question: "Is Fraps a good tool?" He attempts to answer the question by comparing the frame times recorded by Fraps to those recorded by the tools embedded in several game engines. You can see Mark's plots of the results for yourself, but the essence of his findings is that the game engine and Fraps output are "so similar as to convey approximately the same information." He also finds that the results capture a sense of the fluidity of the animation. The frame time plot "fits very well in with the experience of watching this benchmark – a small chug at the beginning, then it settles down until the scene changes and lighting comes into play – smoothness alternates with slight jitter until we reach the last scene that settles down nicely."
With the usefulness of Fraps and frame-time measurements established, Mark says his next step will be to test a GeForce GTX 680 and a Radeon HD 7970 against each other, complete with high-speed video comparisons. We look forward to his follow-up article.
Speaking of follow-up, I know many of you are wondering how AMD plans to address the frame latency issues we've identified in several newer games. We have been working with AMD, most recently running another quick set of tests right before Christmas with the latest Catalyst 12.11 beta and CAP update, just to ensure the problems we saw weren't already resolved in a newer driver build. We haven't heard much back yet, but we noticed in the B3D thread that AMD's David Baumann says the causes of latency spikes are many—and he offers word of an impending fix for Borderlands 2:
There is no one single thing for, its all over the place - the app, the driver, allocations of memory, CPU thread priorities, etc., etc. I believe some of the latency with BL2 was, in fact, simply due to the size of one of the buffers; a tweak to is has improved it significantly (a CAP is in the works).
This news bolsters our sense that the 7950's performance issues were due to software optimization shortfalls. We saw spiky frame time plots with BL2 both in our desktop testing and in Cyril's look at the Radeon HD 8790M, so we're pleased to see that a fix could be here soon via a simple CAP update.
Meanwhile, if you'd like to try your hand at latency-focused game testing, you may want to know about an open-source tool inspired by our work and created by Lindsay Bigelow. FRAFS Benchmark Viewer parses and graphs the frame time data output by Fraps. I have to admit, I haven't tried it myself yet since our own internal tools are comfortingly familiar, but this program may be helpful to those whose Excel-fu is a little weak.
Finally, we have a bit of a debate to share with you. James Prior from Rage3D was making some noises on Twitter about a "problem" with our latency-focused testing methods, and he eventually found the time to write me an email with his thoughts. I replied, he replied, and we had a a nice discussion. James has kindly agreed to the publication of our exchange, so I thought I'd share it with you. It's a bit lengthy and incredibly nerdy, so do what you will with it.
Here's is James's initial email:
Alrighty, had some time to play with it and get some thoughts together. First of all, not knocking what you're doing - I think it's a good thing. When I said 'theres a big flaw' here's what I'm thinking.
When I look at inside the second, the data presentation doesn't lend itself to supporting some of the conclusions. This is not because you're wrong but because I'm not sure of the connection between the two. Having played around with looking at 99% time, I think that it's not a meaningful metric in gauging smoothness of itself, it shows uneven render time but not the impact of that on game experience, which was the whole point. It's another way of doing 'X number is better than Y number'.
I agree with you that a smoothness metric is needed. I concur with your thoughts about FPS rates not being the be-all end-all, and 60fps vsync isn't the holy grail. The problem is the perception of smoothness, and quantifying that. If you have a 25% framerate variation at 45fps you're going to notice it more than a 25% framerate variation at 90fps. 99% time shows when you have a long time away from the average frame rate but not that the workload changes, so is naturally very dependent on the benchmark data, time period and settings.
What I would (and am, but it took me 2 weeks to write this email, I'm so time limited) aim for is to find a way to identify a standard deviation and look for ways to show that. So when you get a line of 20-22ms frames interrupted by a 2x longer frame time and possibly a few half as long frame times (the 22, 22, 58, 12, 12, 22, 22ms pattern) you can identify it, and perhaps count the number of times it happens inside the dataset.
Next up would be 'why' and that can start with game settings - changing MSAA, AO, resolution, looking for game engine bottlenecks and then looking at drivers and CPU config. People have reported stuttering frame rates from different mice, having HT enabled, having the NV AO code running on the AMD card (or vice versa).
In summary - I think the presentation of the data doesn't show the problem at the extent it's an issue for gamers. I think it's too simplistic to say 'more 99% time on card a, it's no good'. But that's an editorial decision for you, not me.
The videos of skyrim were interesting but of no value to me, it's a great way to show people how to idenitify the problem but unless you frame sync the camera to the display and can find a way to reduce the losses of encoding to show it, it's not scientific. Great idea though, help people understand what you're describing.
Thanks for being willing to listen, and have a Merry Christmas :)
My response follows:
Hey, thanks for finally taking time to write. Glad to see you've considered these things somewhat.
I have several thoughts in response to what you've written, but the first and most important one is simply to note that you've agreed with the basic premise that FPS averages are problematic. Once we reach that point and are talking instead about data presentation and such, we have agreed fundamentally and are simply squabbling over details. And I'm happy to give a lot of ground on details in order to find the best means of analyzing and presenting the data to the reader in a useful format.
With that said, it seems to me you've concentrated on a single part of our data presentation, the 99th percentile frame time, and are arguing that the 99th percentile frame time doesn't adequately communicate the "smoothness" of in-game animation.
I'd say, if you look at our work over the last year in total, you'd find that we're not really asking the 99th percentile frame time to serve that role exclusively or even primarily.
Before we get to why, though, let's establish another fundamental. That fundamental reality is that animation involves flipping through a series of frames in sequence (with timing that's complicated somewhat by its presentation on a display with a fixed refresh rate.) The single biggest threat to smooth animation in that context is delays or high-latency frames. When you wait too long for the next flip, the illusion of motion is threatened.
I'm much more concerned with high-latency frames than I am with variance from a mean, especially if that variance is on the low side of the mean. Although a series of, say, 33 ms frames might be the essence of "smoothness," I don't consider variations that dip down to 8 ms from within that stream to be especially problematic. As long as the next update comes quickly, the illusion of motion will persist and be relatively unharmed. (There are complicated timing issues here involving the position of underlying geometry at render time and display refresh intervals that pull in different directions, but as long as the chunks of time involved are small enough, I don't think they get much chance to matter.) Variations *above* the mean, especially big ones, are the real problem.
At its root, then, real-time graphics performance is a latency-sensitive problem. Our attempts to quantify in-game smoothness take that belief as fundamental.
Given that, we've borrowed the 99th percentile latency metric from the server world, where things like database transaction latencies are measured in such terms. As we've constantly noted, the 99th percentile is just one point on a curve. As long as we've collected enough data, though, it can serve as a reliable point of comparison between systems that are serving latency-sensitive data. It's a single sample point from a large data set that offers a quick summary of relative performance.
With that in mind, we've proposed the 99th percentile frame time as a potential replacement for the (mostly pointless) traditional FPS average. The 99th percentile frame time has also functioned for us as a companion to the FPS average, a sort of canary in the coal mine. When the two metrics agree, generally that means that frame rates are both good *and* consistent. When they disagree, there's usually a problem with consistent frame delivery.
So the 99th percentile does some summary work for us that we find useful.
But it is a summary, and it rules out the last 1% of slow frames, so I agree that it's not terribly helpful as a presentation of animation smoothness. That's why our data presentation includes:
1) a raw plot of frame times from a single benchmark run,
2) the full latency curve from 50-100% of frames rendered,
3) the "time spend beyond 50 ms" metric, and
4) sometimes zoomed-in chunks of the raw frame time plots.
*Those* tools, not the 99th percentile summary, attempt to convey more useful info about smoothness.
My favorite among them as a pure metric of smoothness is "time spent beyond 50 ms."
50 milliseconds is our threshold because at a steady state it equates to 20 FPS, which is pretty slow animation, where the illusion of motion is starting to be compromised. (The slowest widespread visual systems we have, in traditional cinema, run at 24 FPS.) Also, if you wait more than 50 ms for the next frame on a 60Hz display with vsync, you're waiting through *four* display refresh cycles. Bottom line: frame times over 50 ms are problematic. (We could argue over the exact threshold, but it has to be somewhere in this neighborhood, I think.)
At first, to quantify interruptions in smooth animation, we tried just counting the number of frames that take over 50 ms to render. The trouble with that is that a 51 ms frame counts the same as a 108 ms frame, and faster solutions can sometimes end up producing *more* frames over 50 ms than slower ones.
To avoid those problems, we later decided to account for how far the frame times are over our threshold. So what we do is add up all of the time spent rendering beyond our threshold. For instance, a 51 ms frame adds 1 ms to the count, while an 80 ms frame adds 30 ms to our count. The more total time spent beyond the threshold, the more the smoothness of the animation has been compromised.
It's not perfect, but I think that's a pretty darned good way to account for interruptions in smoothness. Of course, the results from these "outlier" high-latency frames can vary from run to run, so we take the "time beyond X" for each of the five of the test runs we do for each card and report the median result.
In short, I don't disagree entirely with your notion that the 99th percentile frame time doesn't tell you everything you might need to know. That's why our data presentation is much more robust than just a single number, and why we've devised a different metric that attempts to convey "smoothness"--or the lack of it.
I'd be happy to hear your thoughts on alternative means of analyzing and presenting frame time data. Once we agree that FPS averages hide important info about slowdowns, we're all in the same boat, trying to figure out what comes next. Presenting latency-sensitive metrics is a tough thing to do well for a broad audience that is accustomed to much simpler metrics, and we're open to trying new things that might better convey a sense of the realities involved.
And here is James's reply:
First up, yes I absolutely agree that FPS averages aren't the complete picture. Your cogent and comprehensive response details the thinking behind your methodology very nicely. You are correct, I did choose to highlight 99% time as my first point, and your clarification regarding the additional data you review and present is well taken.
I agree with you about the 50ms/20fps ‘line in the sand’, for watching animated pictures. My personal threshold for smoothness in movies is about 17-18, my wife’s is 23.8. For gaming however, I find around 35fps / 29ms per frame is where I get pissed off and call it an unplayable slideshow unless it is an RTS - I was prepared to hate C&C locked at 30fps but found it quite pleasant. This was based on not only animation smoothness but smoothness of response to input. Human perception is a funny thing, it changes with familiarity and temperament.
So on that basis I concur that dipping from 22ms to 50ms is perceptible in ‘palm of the hand’ and 99% plus 50ms statistics address identify that nicely. Where I disagree with you is the moving from 22ms to say 11ms isn't noticeable, especially if it is an experientially significant amount of time for the latency consumer - the player. Running along at 22ms and switching to 11ms probably won’t be perceived badly, but the regression back to 22ms might be, especially if it happens frequently. I experienced this first hand when I benched Crossfire 7870’s in Eyefinity, with VECC added SSAA. The fraps average was high, in the 60’s, the min was around 38. The problem was the feel, it looked smooth, but the response was input was terrible. The perceived average FPS was closer to the minimum and wasn’t smooth and so despite being capable of stutter free animation, the playability was ruined due to frame rate variation from 38fps to ~90fps. The problem ended up being memory bandwidth, as increasing clocks improved the feel and reduced the variation; this was reinforced by moving from SSAA to AAA and standard MSAA; the less intensive modes were silky smooth, AAA being in the same general performance range.
This can be observed on the raw frame rate graph, a saw tooth pattern will be seen if the plot resolution is right, but when examining a plot covering perhaps minutes of data showing tens of frame render times per second then you need a systemic approach for consistency and time cost of the analyst.
The obvious answer is to restrict your input data, find a benchmark session that doesn’t do that but then you end up with a question of usefulness to your latency processor again - the player. Does the section of testing represent the game fairly? Is the provided data enough for someone to know that the card will cope with the worst case scenarios of the game, is there enough data for each category of consumer - casual player, IQ/feature enthusiast, game enthusiast, performance enthusiast, competitive gamer, system builder, family advisor, mom upgrading little Jonny’s gateway - to understand the experience?
Servers talk to servers, games talk to people. We can base analysis methodology on what comes from the server world, and then move on to finding a way to consistently quantify the experience so that the different experience levels show through.
I'll confess I still owe him a response to this message. We seem to have ended up agreeing on the most important matters at hand, though, and the issues he raises in his reply are a bit of a departure from our initial exchange. It seems to me James is thinking in the right terms, and I look forward to seeing how he implements some of these ideas in his own game testing in the future.
You can follow me on Twitter for more nerdy exchanges.Freshening up a home network can yield big bandwidth benefits
One of the funny things about being a PC enthusiast, for me, is how there's a constant ebb and flow of little projects that I end up tackling. At one point, I may be busy updating and tuning my HTPC, and shortly after that's finished, I'm on to something else. One way or another, it seems I'm almost always trying to fix or improve something.
My project lately has been optimizing my home network. By nature, my hardware testing work requires me to move lots of data around, whether it's deploying images to test rigs, downloading new games from Steam, or uploading videos to YouTube. I've noticed that I spend quite a bit of time waiting on various data transfer operations. Within certain limits, that's probably an indicator that some money could be well spent on an upgrade.
The first step in the process was getting my cable modem service upgraded. I'm too far out in the 'burbs to partake of the goodness of Google Fiber happening in downtown Kansas City, so I'm stuck with Time Warner Cable.
For a while, I'd been paying about 60 bucks a month for Road Runner "Turbo" cable modem service with a 15Mbps downstream and a 1Mbps upstream. We use a host of 'net based services like Netflix and Vonage, along with the aforementioned work traffic and hosting a Minecraft server for my kids, so both the upstream and downstream were feeling sluggish at times.
Time Warner Cable's website told me I could get 20Mbps downstream and 2Mbps upstream for $49.99 a month here in my area. There's also an option for 30Mbps down and 5Mbps up for $59.99. I was vaguely aware that my old-ish cable modem would have to be replaced with a newer model to enable the higher speed service, so I disconnected the modem and headed to the local Time Warner store, hoping to exchange it and upgrade my service.
When I got there, the salesperson informed me I could upgrade, but insisted that I'd need to pay an additional $15 per month above my current rate in order to get 20Mbps/2Mbps service. I asked if she was sure about that and whether there were any better pricing options, but she insisted. As she typed away, beginning the service change, I pulled up the Time Warner website on my phone, attempting to get that pricing info—which was conveniently hidden on the mobile site. I fumbled for a while as she kept typing, because apparently service tier changes require a 25-page written report. Only after my third inquiry, some bluster from me, and a whole lot more typing did she decide that she could give me the $49.99 price for 20Mbps/2Mbps service.
I later talked another rep into switching me to the 30Mbps/5Mbps service for $59.99, instead. Heh.
Anyhow, I eventually came home with a rather gigantic new cable modem and, for the about same price I'd been paying before, started enjoying double the downstream bandwidth and 5X the upstream. The difference is very much noticeable in certain cases, such as Steam downloads and YouTube uploads.
I suppose the morals of this story are: 1) if you have an older cable modem, you may be able to get faster service by swapping it out for a newer one, thanks to newer DOCSIS tech, and 2) you may also be eligible for better pricing if you do some research and prod your service provider sufficiently. Don't just take what they're giving you now or even the newer options they're offering to existing customers. Look into the offers they're making to new customers, instead, and insist on the best price.
Only days after I'd posted my shiny new Speedtest.net results on Twitter, I turned my attention to our internal home network. Although I really like my Netgear WNDR3700 router, we've never used it to its full potential. The 5GHz band is practically empty, either due to lack of device support or range issues. Signals in that band just won't reach reliably into most of the bedrooms, so it's a no-go for anything mobile.
The range is great on our 2.4GHz network, but transfer rates are kind of pokey. There are many reasons for that. At the top of the list is a ridiculous number of devices connected at any given time. Between phones, tablets, PCs, and other devices, I can count 12 off the top of my head right now. There may be more.
You may be in the same boat. I didn't plan for this; it just happened.
Also, we have a silly number of other devices throwing off interference in the 2.4GHz range, including wireless mice, game controllers, Bluetooth headsets, the baby monitor, apparently our microwave oven, and probably a can opener or something, too.
One particular client system, my wife's kitchen PC, really needed some help. We store all of our family photos and videos on my PC, and my wife accesses them over the Wi-Fi network. As the megapixel counts for digital cameras have grown, so has her frustration. The process of pulling up thumbnails in a file viewer was excruciating.
Her system had a 2.4GHz 802.11g Wi-Fi adapter in it, which caused several problems. One was its own inherent limit of 54Mbps peak transfer rates. The other was the fact that, in order to best accommodate it and other older Wi-Fi clients, I had switched my router's 2.4GHz Wi-Fi mode from its "Up to 130Mbps" default mode to "Up to 54Mbps"—that setting seemed to help the Kitchen PC, but at the cost of lower peak network speeds for wireless-n clients.
This problem should have been solved ages ago, but it had momentum on its side. The Kitchen PC's motherboard had a built-in Wi-Fi adapter with a nice integrated antenna poking out of the port cluster, and I was reluctant to change it. However, a quick audit of the devices on our network revealed something important: the Kitchen PC's 802.11g adapter was the only 802.11b/g client left on our network. Replacing its Wi-Fi adapter wouldn't just speed up its connection; it would also allow me to experiment with the higher-bandwidth 2.4GHz modes on my router.
Once I resolved to make a change, it was like the girls from Jersey Shore: stupidly cheap and easy. I decided to measure the impact of various options by noting the speed of Windows file copy to the Kitchen PC. With its built-in 802.11g adapter, which has a stubby antenna attached, file copies averaged 2MB/s.
I then disabled the internal adapter and switched to an insanely tiny USB-based 802.11n adapter that I happened to have on hand. These things cost ten bucks and have zero room for an antenna, but they seem to work. I also switched the router to "Up to 130Mbps" mode on the 2.4GHz band, since the last legacy device was gone. The changes didn't help much; copies averaged 1.88MB/s, practically the same. However, when I flipped the router into its 20/40Hz mode ("Up to 300Mbps"), transfer rates more than doubled, to 5MB/s.
Better, but not great.
To really improve, I needed to make use of that practically empty 5GHz bandwidth. As a stationary system not far from the router, the Kitchen PC was a perfect candidate. I ordered up a Netgear dual-band USB Wi-Fi adapter—20 bucks for a refurb—to make it happen. This adapter is large enough to have a decent-sized internal antenna, in addition to the dual-band capability. Once it was installed, Windows file copy speeds on the 5GHz band (in 20/40Hz mode) were a steady 14MB/s—fully seven times what they were initially. And that's with just four of out five bars of signal strength.
There are a couple of lessons here, too, I think. First, wireless-b and -g devices are really stinkin' old, and moving to better adapter hardware is worth the modest cost involved. Getting rid of those old clients may even help speed up your whole network. Second, if you have a dual-band router with lots of clients, make use of that 5GHz bandwidth where possible, especially on stationary systems that are in range of the base station.
Of course, the big takeaway for this entire episode was this: devoting some attention to your home network can yield some nice benefits, especially if you've neglected it a bit. And heck, I haven't even started down the path to 802.11ac. Yet.AMD attempts to shape review content with staged release of info
Review sites like TR are a tricky business, let me say up front. We work constantly with the largest makers of PC hardware in order to bring you timely reviews of the latest products. Making that happen, and keeping our evaluations fair and thorough, isn't easy in the context of large companies engaging in cutthroat competition over increasingly complex technologies.
I know for a fact that many folks who happen across TR's reviews are deeply skeptical about the whole enterprise, and given everything that goes on in the shadier corners of the web, they have a right to be. That said, we have worked very hard over the years to maintain our independence and to keep our readers' interests first among our priorities, and I think our regular audience will attest to that fact.
At its heart, the basic arrangement that we have with the largest PC chip companies is simple. In exchange for early access to product samples and information, we agree to one constraint: timing. That is, we agree not to post the product information or our test results until the product's official release.
That's it, really.
There are a few other nuances, such as the fact that we're released from that obligation if the information becomes public otherwise, but they only serve to limit the extent of the agreement.
In other words, we don't consent to any other constraint that would compromise our editorial independence. We don't guarantee a positive review; we don't agree to mention certain product features; and we certainly don't offer any power over the words we write or the results we choose to publish. In fact, by policy, these companies only get to see our reviews of their products when you do, not before.
If you're familiar with us, we may be covering well-trodden ground here, but bear with me. Our status as an independent entity is key to what we do. Most of the PR types we work with tend to understand that fact, so we usually get along pretty well. There's ample room for dialog and persuasion about the merits of a particular product, but ultimately, we offer our own opinions. In fact, the basic arrangement we have with these firms has been the same for most of the 13 years of our existence, even during the darkest days of Intel's Pentium 4 fiasco.
You can imagine my shock, then, upon receiving an e-mail message last week that attempted to re-write those rules in a way that grants a measure of editorial control to a company whose product we're reviewing. What AMD is doing, in quasi-clever fashion, is attempting to shape the content of reviews by dictating a two-stage plan for the release of information. In doing so, they grant themselves a measure of editorial control over any publication that agrees to the plan.
In this case, the product in question is the desktop version of AMD's Trinity APUs. We received review samples of these products last week, with a product launch date set for early October. However, late last week, the following e-mail from Peter Amos, who works in AMD's New Product Review Program, hit our inbox:
We are allowing limited previews of the embargoed information to generate additional traffic for your site, and give you an opportunity to put additional emphasis on topics of interest to your readers. If you wish to post a preview article as a teaser for your main review, you may do so on September 27th, 2012 at 12:01AM EDT.
The topics which you are free to discuss in your preview articles starting September 27th, 2012 at 12:01AM EDT are any combination of:
- Gaming benchmarks (A10, A8)
- Speeds, feeds, cores, SIMDs and branding
- Experiential testing of applications vs Intel (A10 Virgo will be priced in the range of the i3 2120 or i3 3220)
- Power testing
We believe there are an infinite number of interesting angles available for these preview articles within this framework.
We are also aware that your readers expect performance numbers in your articles. In order to allow you to have something for the preview, while maintaining enough content for your review, we are allowing the inclusion of gaming benchmarks.
By allowing the publication of speeds, feeds, cores, SIMDs and branding during the preview period, you have the opportunity to discuss the innovations that AMD is making with AMD A-Series APUs and how these are relevant to today’s compute environment and workloads.
In previewing x86 applications, without providing hard numbers until October [redacted], we are hoping that you will be able to convey what is most important to the end-user which is what the experience of using the system is like. As one of the foremost evaluators of technology, you are in a unique position to draw educated comparisons and conclusions based on real-world experience with the platform.
The idea here is for AMD to allow a "preview" of the product that contains a vast swath of the total information that one might expect to see in a full review, with a few notable exceptions. Although "experiential testing" is allowed, sites may not publish the results of non-gaming CPU benchmarks.
The email goes on to highlight a few other features of the Socket FM2 platform before explaining what information may be published in early October:
The topics which you must be held for the October [redacted] embargo lift are:
- Non game benchmarks
The email then highlights each of these topic areas briefly. Here's what it says about the temporarily verboten non-gaming benchmarks:
Non game benchmarks
- Traditional benchmarks are designed to highlight differences in different architectures and how they perform. We understand that this is a useful tool for you and that your readers expect to see this data. The importance of these results is in your evaluation, as the leading experts, of what these performance numbers mean. We encourage you to use your analysis if you choose to publish a preview article and if you find that to be appropriate to your approach to that article. The numbers themselves must be held until the October [redacted] embargo lift. This is in an effort to allow consumers to fully comprehend your analysis without prejudging based on graphs which do not necessarily represent the experiential difference and to help ensure you have sufficient content for the creation of a launch day article.
Now, we appreciate that AMD is introducing this product in an incredibly difficult competitive environment. We're even sympathetic to the idea that the mix of resources included in its new APU may be more optimal for some usage patterns, as our largely positive review of the mobile version of Trinity will attest. We understand why they might wish to see "experiential testing" results and IGP-focused gaming benchmarks in the initial review that grabs the headlines, while burying the CPU-focused benchmarks on a later date. By doing so, they'd be leading with the product's strengths and playing down its biggest weakness.
And it's likely to work, I can tell you from long experience, since the first article about a subject tends to capture the buzz and draw the largest audience. A second article a week later? No so much. Heck, even if we hold back and publish our full review later (which indeed is our plan), it's not likely to attract as broad a readership as it would have on day one, given the presence of extensive "previews" elsewhere.
Yes, AMD and other firms have done limited "preview" releases in the past, where select publications are allowed to publish a few pictures and perhaps a handful of benchmark numbers ahead of time. There is some slight precedent there.
But none of that changes the fact that this plan is absolutely, bat-guano crazy. It crosses a line that should not be crossed.
Companies like AMD don't get to decide what gets highlighted in reviews and what doesn't. Using the review press's general willingness to agree on one thing—timing—to get additional control may seem clever, but we've thought it over, and no. We'll keep our independence, thanks.
The email goes on to conclude by, apparently, anticipating such a reaction and offering a chance for feedback:
We are aware that this is a unique approach to product launches. We are always looking at ways that we can work with you to help drive additional traffic to your articles and effectively convey the AMD message. We strive to provide the best products in their price points, bringing a great product for a great price. Please feel free to provide feedback on what you find, both with the product and with your experience in the AMD New Product Review Program. We try to ensure that we are providing you what you need and appreciate any feedback you have to offer on how we can do better.
I picked up the phone almost immediately after reading this paragraph and attempted to persuade both Mr. Amos and, later, his boss that this plan was not a good one. I was told that this decision was made not just in PR but at higher levels in the company and that my objections had been widely noted in internal emails. Unfortunately, although fully aware of my objections and of the very important basic principle at stake, AMD decided to go through with its plan.
Shame on them for that.
It's possible you may see desktop Trinity "previews" at other websites today that conform precisely to AMD's dictates. I'm not sure. I hope most folks have decided to refrain from participation in this farce, but I really don't know what will happen. I also hope that any who did participate will reconsider their positions after reading this post and thinking about what they're giving up.
And I hope, most of all, that the broader public understands what's at stake here and insists on a change in policy from AMD.
If this level of control from companies over the content of reviews becomes the norm, we will be forced to change the way we work the firms whose products we review. We will not compromise our independence. We believe you demand and deserve nothing less.
Update: AMD has issued a statement on this matter.
|The TR Podcast 175: the Zen of chipmaking and ARM's Cortex-A72 revealed||4|
|Elon Musk lays out vision for a battery-powered future||113|
|Inside ARM's Cortex-A72 microarchitecture||34|
|Asus' 144Hz MG279Q monitor may top out at 90Hz with FreeSync||57|
|Deal of the week: A Bay Trail netbook for $161, free case fans, and more||18|
|DirectX 12 Multiadapter shares work between discrete, integrated GPUs||96|
|Gigabyte's 9-series motherboards are Broadwell-ready||45|
|The TR Podcast will be live on Twitch shortly!||3|
|AMD delays FreeSync support for multi-GPU systems||40|