blog as the second turns the web digests our game testing methods

As the second turns: the web digests our game testing methods

A funny thing happened over the holidays. We went into the break right after our Radeon vs. GeForce rematch and follow-up articles had caused a bit of a stir. Also, our high-speed video had helped to illustrate the problems we’d identified with smooth animation, particularly on the Radeon HD 7950. All of this activity brought new attention to the frame latency-focused game benchmark methods we proposed in my "Inside the second" article over a year ago and have been refining since.

As we were busy engaging in the holiday rituals of overeating and profound regret, a number of folks across the web were spending their spare time thinking about latency-focused game testing, believe it or not. We’re happy to see folks seriously considering this issue, and as you might expect, we’re learning from their contributions. I’d like to highlight several of them here.

Perhaps the most notable of these contributions comes from Andrew Lauritzen, a Tech Lead at Intel. According to his home page, Andrew works "with game developers and researchers to improve the algorithms, APIs and hardware used for real-time rendering." He also occasionally chides me on Twitter. Andrew wrote up a post at Beyond3D titled "On TechReport’s frame latency measurement and why gamers should care."  The main thrust of his argument is to support our latency-focused testing methods and to explain the need for them in his own words. I think he makes that case well.

Uniquely, though, he also addresses one of the trickier aspects of latency-focused benchmarking: how the graphics pipeline works and how the tool that we’ve been using to measure latencies, Fraps, fits into it.

As we noted here, Fraps simply writes a timestamp at a certain point in the frame production pipeline, multiple stages before that frame is output to the display. Many things, both good and bad, can happen between the hand-off of the frame from the game engine and the final display of the image on the monitor. For this reason, we’ve been skittish about using Fraps-based frame-time measurements with multi-GPU solutions, especially those that claim to include frame metering, as we explained in our GTX 690 review. We’ve proceeded to use Fraps in our single-GPU testing because, although its measurements may not be a perfect reflection of what happens at the display output, we think they are a better, more precise indication of in-game animation smoothness than averaging FPS over time.

Andrew addresses this question in some depth. I won’t reproduce his explanation here, which is worth reading in its entirety and covers the issues of pipelining, buffering, and CPU/driver-GPU interactions. Interestingly, Andrew believes that in the case of latency spikes, buffered solutions may produce smooth frame delivery to the display. However, even if that’s the case, the timing of the underlying animation is disrupted, which is just as bad:

This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott’s follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the affect of the game adjusting how far it steps the simulation in time each frame. . . . A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.

Disruptions in the timing of the game simulation, he argues, are precisely what we want to avoid in order to ensure smooth gameplay—and Fraps writes its timestamps at a critical point in the process:

Games measure the throughput of the pipeline via timing the back-pressure on the submission queue. The number they use to update their simulations is effectively what FRAPS measures as well.

In other words, if Fraps captures a latency spike, the game’s simulation engine likely sees the same thing, with the result being disrupted timing and less-than-smooth animation.

There’s more to Andrew’s argument, but his insights about the way game engines interact with the DirectX API, right at the point where Fraps captures its data points, are very welcome. I hope they’ll help persuade folks who might have been unsure about latency-focused testing methods to give them a try. Andrew concludes that "If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites."

With impeccable timing, then, Mark at AlienBabelTech has just published an article that asks the question: "Is Fraps a good tool?" He attempts to answer the question by comparing the frame times recorded by Fraps to those recorded by the tools embedded in several game engines. You can see Mark’s plots of the results for yourself, but the essence of his findings is that the game engine and Fraps output are "so similar as to convey approximately the same information." He also finds that the results capture a sense of the fluidity of the animation. The frame time plot "fits very well in with the experience of watching this benchmark – a small chug at the beginning, then it settles down until the scene changes and lighting comes into play – smoothness alternates with slight jitter until we reach the last scene that settles down nicely."

With the usefulness of Fraps and frame-time measurements established, Mark says his next step will be to test a GeForce GTX 680 and a Radeon HD 7970 against each other, complete with high-speed video comparisons. We look forward to his follow-up article.

Speaking of follow-up, I know many of you are wondering how AMD plans to address the frame latency issues we’ve identified in several newer games. We have been working with AMD, most recently running another quick set of tests right before Christmas with the latest Catalyst 12.11 beta and CAP update, just to ensure the problems we saw weren’t already resolved in a newer driver build. We haven’t heard much back yet, but we noticed in the B3D thread that AMD’s David Baumann says the causes of latency spikes are many—and he offers word of an impending fix for Borderlands 2:

There is no one single thing for, its all over the place – the app, the driver, allocations of memory, CPU thread priorities, etc., etc. I believe some of the latency with BL2 was, in fact, simply due to the size of one of the buffers; a tweak to is has improved it significantly (a CAP is in the works).

This news bolsters our sense that the 7950’s performance issues were due to software optimization shortfalls. We saw spiky frame time plots with BL2 both in our desktop testing and in Cyril’s look at the Radeon HD 8790M, so we’re pleased to see that a fix could be here soon via a simple CAP update.

Meanwhile, if you’d like to try your hand at latency-focused game testing, you may want to know about an open-source tool inspired by our work and created by Lindsay Bigelow.  FRAFS Benchmark Viewer parses and graphs the frame time data output by Fraps. I have to admit, I haven’t tried it myself yet since our own internal tools are comfortingly familiar, but this program may be helpful to those whose Excel-fu is a little weak.

Finally, we have a bit of a debate to share with you. James Prior from Rage3D was making some noises on Twitter about a "problem" with our latency-focused testing methods, and he eventually found the time to write me an email with his thoughts. I replied, he replied, and we had a a nice discussion. James has kindly agreed to the publication of our exchange, so I thought I’d share it with you. It’s a bit lengthy and incredibly nerdy, so do what you will with it.

Here’s is James’s initial email:

Alrighty, had some time to play with it and get some thoughts together. First of all, not knocking what you’re doing – I think it’s a good thing. When I said ‘theres a big flaw’ here’s what I’m thinking.

When I look at inside the second, the data presentation doesn’t lend itself to supporting some of the conclusions. This is not because you’re wrong but because I’m not sure of the connection between the two. Having played around with looking at 99% time, I think that it’s not a meaningful metric in gauging smoothness of itself, it shows uneven render time but not the impact of that on game experience, which was the whole point. It’s another way of doing ‘X number is better than Y number’.

I agree with you that a smoothness metric is needed. I concur with your thoughts about FPS rates not being the be-all end-all, and 60fps vsync isn’t the holy grail. The problem is the perception of smoothness, and quantifying that. If you have a 25% framerate variation at 45fps you’re going to notice it more than a 25% framerate variation at 90fps. 99% time shows when you have a long time away from the average frame rate but not that the workload changes, so is naturally very dependent on the benchmark data, time period and settings.

What I would (and am, but it took me 2 weeks to write this email, I’m so time limited) aim for is to find a way to identify a standard deviation and look for ways to show that. So when you get a line of 20-22ms frames interrupted by a 2x longer frame time and possibly a few half as long frame times (the 22, 22, 58, 12, 12, 22, 22ms pattern) you can identify it, and perhaps count the number of times it happens inside the dataset.

Next up would be ‘why’ and that can start with game settings – changing MSAA, AO, resolution, looking for game engine bottlenecks and then looking at drivers and CPU config. People have reported stuttering frame rates from different mice, having HT enabled, having the NV AO code running on the AMD card (or vice versa).

In summary – I think the presentation of the data doesn’t show the problem at the extent it’s an issue for gamers. I think it’s too simplistic to say ‘more 99% time on card a, it’s no good’. But that’s an editorial decision for you, not me.

The videos of skyrim were interesting but of no value to me, it’s a great way to show people how to idenitify the problem but unless you frame sync the camera to the display and can find a way to reduce the losses of encoding to show it, it’s not scientific. Great idea though, help people understand what you’re describing.

Thanks for being willing to listen, and have a Merry Christmas 🙂

My response follows:

Hey, thanks for finally taking time to write. Glad to see you’ve considered these things somewhat.

I have several thoughts in response to what you’ve written, but the first and most important one is simply to note that you’ve agreed with the basic premise that FPS averages are problematic. Once we reach that point and are talking instead about data presentation and such, we have agreed fundamentally and are simply squabbling over details. And I’m happy to give a lot of ground on details in order to find the best means of analyzing and presenting the data to the reader in a useful format.

With that said, it seems to me you’ve concentrated on a single part of our data presentation, the 99th percentile frame time, and are arguing that the 99th percentile frame time doesn’t adequately communicate the "smoothness" of in-game animation.

I’d say, if you look at our work over the last year in total, you’d find that we’re not really asking the 99th percentile frame time to serve that role exclusively or even primarily.

Before we get to why, though, let’s establish another fundamental. That fundamental reality is that animation involves flipping through a series of frames in sequence (with timing that’s complicated somewhat by its presentation on a display with a fixed refresh rate.) The single biggest threat to smooth animation in that context is delays or high-latency frames. When you wait too long for the next flip, the illusion of motion is threatened.

I’m much more concerned with high-latency frames than I am with variance from a mean, especially if that variance is on the low side of the mean. Although a series of, say, 33 ms frames might be the essence of "smoothness," I don’t consider variations that dip down to 8 ms from within that stream to be especially problematic. As long as the next update comes quickly, the illusion of motion will persist and be relatively unharmed. (There are complicated timing issues here involving the position of underlying geometry at render time and display refresh intervals that pull in different directions, but as long as the chunks of time involved are small enough, I don’t think they get much chance to matter.) Variations *above* the mean, especially big ones, are the real problem.

At its root, then, real-time graphics performance is a latency-sensitive problem. Our attempts to quantify in-game smoothness take that belief as fundamental.

Given that, we’ve borrowed the 99th percentile latency metric from the server world, where things like database transaction latencies are measured in such terms. As we’ve constantly noted, the 99th percentile is just one point on a curve. As long as we’ve collected enough data, though, it can serve as a reliable point of comparison between systems that are serving latency-sensitive data. It’s a single sample point from a large data set that offers a quick summary of relative performance.

With that in mind, we’ve proposed the 99th percentile frame time as a potential replacement for the (mostly pointless) traditional FPS average. The 99th percentile frame time has also functioned for us as a companion to the FPS average, a sort of canary in the coal mine. When the two metrics agree, generally that means that frame rates are both good *and* consistent. When they disagree, there’s usually a problem with consistent frame delivery.

So the 99th percentile does some summary work for us that we find useful.

But it is a summary, and it rules out the last 1% of slow frames, so I agree that it’s not terribly helpful as a presentation of animation smoothness. That’s why our data presentation includes:

1) a raw plot of frame times from a single benchmark run,
2) the full latency curve from 50-100% of frames rendered,
3) the "time spend beyond 50 ms" metric, and
4) sometimes zoomed-in chunks of the raw frame time plots.

*Those* tools, not the 99th percentile summary, attempt to convey more useful info about smoothness.

My favorite among them as a pure metric of smoothness is "time spent beyond 50 ms."

50 milliseconds is our threshold because at a steady state it equates to 20 FPS, which is pretty slow animation, where the illusion of motion is starting to be compromised. (The slowest widespread visual systems we have, in traditional cinema, run at 24 FPS.) Also, if you wait more than 50 ms for the next frame on a 60Hz display with vsync, you’re waiting through *four* display refresh cycles. Bottom line: frame times over 50 ms are problematic. (We could argue over the exact threshold, but it has to be somewhere in this neighborhood, I think.)

At first, to quantify interruptions in smooth animation, we tried just counting the number of frames that take over 50 ms to render. The trouble with that is that a 51 ms frame counts the same as a 108 ms frame, and faster solutions can sometimes end up producing *more* frames over 50 ms than slower ones.

To avoid those problems, we later decided to account for how far the frame times are over our threshold. So what we do is add up all of the time spent rendering beyond our threshold. For instance, a 51 ms frame adds 1 ms to the count, while an 80 ms frame adds 30 ms to our count. The more total time spent beyond the threshold, the more the smoothness of the animation has been compromised.

It’s not perfect, but I think that’s a pretty darned good way to account for interruptions in smoothness. Of course, the results from these "outlier" high-latency frames can vary from run to run, so we take the "time beyond X" for each of the five of the test runs we do for each card and report the median result.

In short, I don’t disagree entirely with your notion that the 99th percentile frame time doesn’t tell you everything you might need to know. That’s why our data presentation is much more robust than just a single number, and why we’ve devised a different metric that attempts to convey "smoothness"–or the lack of it.

I’d be happy to hear your thoughts on alternative means of analyzing and presenting frame time data. Once we agree that FPS averages hide important info about slowdowns, we’re all in the same boat, trying to figure out what comes next. Presenting latency-sensitive metrics is a tough thing to do well for a broad audience that is accustomed to much simpler metrics, and we’re open to trying new things that might better convey a sense of the realities involved.


And here is James’s reply:

First up, yes I absolutely agree that FPS averages aren’t the complete picture. Your cogent and comprehensive response details the thinking behind your methodology very nicely. You are correct, I did choose to highlight 99% time as my first point, and your clarification regarding the additional data you review and present is well taken.

I agree with you about the 50ms/20fps ‘line in the sand’, for watching animated pictures. My personal threshold for smoothness in movies is about 17-18, my wife’s is 23.8. For gaming however, I find around 35fps / 29ms per frame is where I get pissed off and call it an unplayable slideshow unless it is an RTS – I was prepared to hate C&C locked at 30fps but found it quite pleasant. This was based on not only animation smoothness but smoothness of response to input. Human perception is a funny thing, it changes with familiarity and temperament.

So on that basis I concur that dipping from 22ms to 50ms is perceptible in ‘palm of the hand’ and 99% plus 50ms statistics address identify that nicely. Where I disagree with you is the moving from 22ms to say 11ms isn’t noticeable, especially if it is an experientially significant amount of time for the latency consumer – the player. Running along at 22ms and switching to 11ms probably won’t be perceived badly, but the regression back to 22ms might be, especially if it happens frequently. I experienced this first hand when I benched Crossfire 7870’s in Eyefinity, with VECC added SSAA. The fraps average was high, in the 60’s, the min was around 38. The problem was the feel, it looked smooth, but the response was input was terrible. The perceived average FPS was closer to the minimum and wasn’t smooth and so despite being capable of stutter free animation, the playability was ruined due to frame rate variation from 38fps to ~90fps. The problem ended up being memory bandwidth, as increasing clocks improved the feel and reduced the variation; this was reinforced by moving from SSAA to AAA and standard MSAA; the less intensive modes were silky smooth, AAA being in the same general performance range.

This can be observed on the raw frame rate graph, a saw tooth pattern will be seen if the plot resolution is right, but when examining a plot covering perhaps minutes of data showing tens of frame render times per second then you need a systemic approach for consistency and time cost of the analyst.

The obvious answer is to restrict your input data, find a benchmark session that doesn’t do that but then you end up with a question of usefulness to your latency processor again – the player. Does the section of testing represent the game fairly? Is the provided data enough for someone to know that the card will cope with the worst case scenarios of the game, is there enough data for each category of consumer – casual player, IQ/feature enthusiast, game enthusiast, performance enthusiast, competitive gamer, system builder, family advisor, mom upgrading little Jonny’s gateway – to understand the experience?

Servers talk to servers, games talk to people. We can base analysis methodology on what comes from the server world, and then move on to finding a way to consistently quantify the experience so that the different experience levels show through.

I’ll confess I still owe him a response to this message. We seem to have ended up agreeing on the most important matters at hand, though, and the issues he raises in his reply are a bit of a departure from our initial exchange. It seems to me James is thinking in the right terms, and I look forward to seeing how he implements some of these ideas in his own game testing in the future.

You can follow me on Twitter for more nerdy exchanges.

0 responses to “As the second turns: the web digests our game testing methods

  1. hardocp uses the fps values produced by fraps, not frame times. so you were wrong. just take it and move on, this is a sign of maturity and confidence.

    Dmitry Sofronov did it exactly right, and might be the original source some of us took inspiration from.

  2. No little teen, I’m not wrong. And as a matter of fact, this methodology is used before in this review from 2006

    [url<][/url<] pages 1 to 3, I'm not going to tell you which page, see if you can be a good boy and figure it out yourself...

  3. that is a common response… possibly because hardware reviewers are university drop-outs who failed high school math. Just a guess.

    We all need to be honest with each other and admit: the work Scott is doing is difficult and not obvious to normal people who are not scientists and statisticians who collect and process data for a living.

    We also need to admit that the hardware reviewing profession has not tended to attract people with those skills. So it will take some online university courses to catch the gearheads up on what is happening and what can be done with this (these) data.

  4. you were completely wrong, but that is okay. Just admit it and move on! You’ve learned something

  5. So I did a little more research into LCD technology and it would appear that holding a single frame for too long can be a problem for the most common types of LCDs. However, I was unable to find out exactly how long that is so I don’t know whether it would prevent this from working or not.

    Fortunately however, there are other types of LCDs for which this isn’t a problem such as Bi-Stable LCDs or Sharp’s new IGZO technology. I would also think that this wouldn’t be a problem at all for LED technologies. So hopefully we might see this in the near future.

  6. i’ve found this


  7. Potentially yes, although as Andrew Lauritzen explains:

    “A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.”

    Also any spikes regarding input to the command buffer implies spikes for the frames that are being output. UNLESS the frames inbetween are co-incidentally spiking by the inverse amount, this is unlikely.

    Not only do the frames have to be output in a consistent manner, but the game engine and drivers need to be able to predict frame times in advance.

  8. [url<][/url<] [quote<] The driver guys did flag this up to me last night; evidently FRAPS starts the capture at the point the application calls Present, not the time that the GPU renders it. For applications that are sufficiently GPU bound, given that DX can allow up to 3 frames of command buffer to be gathered, the higher latency plots captured from FRAPS don't necessarily translate into an uneven render time and display to the end user. From our analysis this appears to be the case Sleeping Dogs where you see a high latency frame quickly followed by a low latency one; this is due to how we batch some things in the driver and FRAPS records that, but as the app (in these settings) is GPU bound this isn't representative of the render output as we still have sufficient GPU workload in the queue. [/quote<]

  9. Hold on, we don’t know how much the individual frames are fluctuating between fraps’ readings!

    Yes it is just a matter of enabling it, but HardOCP are simply not showing the same data that techreport are. They COULD, but they AREN’T. So saying HardOCP have been doing this kind of testing for years is just plain wrong.

  10. For the sake of the argument, let’s say those aren’t the individual frame times and instead the fps recorded each second or whatever. Now that game is Hitman so below 60 fps doesn’t mean much, so let’s take BF3 in multiplayer instead. I can still clearly see that the 670gtx is struggling and is below the 7970 which seldom get below 40fps. And that’s all there is to this “new testing”. It’s nothing but a way too inefficient and complicated way of communicating what the graph shows you. Now you might say “hold on, we don’t know how much the individual frames are fluctuating between fraps’ readings!”. And yes that’s true but like I said in the other post, it’s just a matter of enabling it in fraps.

  11. It’s the frame rate over time for each single frame so yes. Of course if hardocp have been lazy and not selected it to write frametimes then you are right, but that’s another matter. The point still stands..a visual image is way more telling than this way.

  12. Hardocp have been doing this for ages with their graphs and probably other sites too.
    [url<][/url<] Just showing the minimum, average and maximum fps is like said here, just half the story because we want smooth gameplay. So by looking at the pictures you can clearly see how much fps drop we have and how often. And I actually prefer that way to just saying what the ms threshold was for producing 99% of the frames, because you have no idea how low your fps went. All you know is that we had drops.

  13. Sleeping Dogs was part of the test suite and two other AMD Gaming Evolved titles.
    Surely it is fairest to benchmark a mix of titles like they have done, not to mention games that are likely to be played during the holiday season.

  14. I read one forum post by a site admin (overclockers net I believe) that basically said gamers were stupid and latency testing was too hard to understand and would never be implemented on his site.

  15. Compared to the 7970, since inception.
    [quote<]When did the 7950 become a "lesser card"?[/quote<]

  16. This thing so deserves more thumbs up.

    Some AMD fanboys were incredibily obnoxious and yet [i<]they[/i<] will reap the rewards of TR's hard work. Will any of them write even a simple apology? Maybe a very brief [i<]"thank you, TR!"[/i<]? What about the fact that AMD acknowledged these issues long ago - again, thanks to TR - and yet some of them screamed like mad banshees [i<]after[/i<] that? And, before I forget: thank you, TR.

  17. I can’t tell you how excited and grateful I am that you are working on this, and developing test procedures that all review sites can adapt. I’ve been complaining about the FPS tests as a measurement of video card quality for years, and it is sooo good to see someone in a position to do something about it actually doing something about it!

    Humans are very adaptable creatures, and we get used to lower quality as a standard; we think “it’s just fine the way it is”. Well, yeah, it’s fine the way it is – but that doesn’t mean we can’t or shouldn’t strive for something better. People seeing 120Hz monitors suddenly understand what many of us have been talking about, for example. We can do better, and we can tell the industry what we want instead of letting it define what we get.

    Having good tools to quantify what we want can only help the process of getting better video cards, monitors, and games. Thanks!

  18. I don’t believe dropped frames are a part of the normal operation of a video card. They happen in encoding when frames can’t be output fast enough (such as if you’re maxing out a connection while streaming), but you never have such a limitation while playing a game on a normal computer. Monitors (if that’s what you’re referring to by screen) never request frames either, they simply output what is being sent to them.

    Unless you’re talking about how fast the game loop wants to draw a frame and then comparing it to how fast the render loop actually draws. But you’d actually need to have access to the code in a game as I’ve never seen a game output the game loop and the render loop as two independent measures.

  19. None of what I mentioned in the post above talked about FPS or latency. I was talking about motion estimation using an encoder.

  20. [quote<]Stop mentioning the 7970 people, even the techreport has reviews of them with no problems, it's mainly the lesser cards like the 7950 with latency/stutter issues.[/quote<] When did the 7950 become a "lesser card"?

  21. Time to finish:
    MP3 ~ 10hrs
    Sleeping Dogs ~12 hrs
    Borderlands 2 ~30 hrs
    Skyrim ~200 hrs

    Metacritic scores:
    Skyrim: 94
    Borderlands 2: 89
    Max Payne 3: 87
    Sleeping Dogs: 80

    What conspiracy?

  22. Congrats Scott! I’m sure that having your work acknowledged and praised is always a good thing, even more so if it becomes a standard across a multitude of hardware review sites 🙂

  23. Well, what I was thinking was capturing the number of UNIQUE frames drawn to a screen. I was thinking of just connecting the DVI/HDMI connector to some device that reads the bit pattern, and does a compare to the previous pattern. If it matches, then no new frame was sent for that screen refresh. In this way, you could capture how latency affects the game by being able to see how many frames are actually being dropped.

    For instance, look at the following example (yes, I know things are MUCH more complicated then this):

    Frame 0 is drawn to screen
    Screen requests new Frame (16ms total)
    Screen requests new Frame (32ms total)
    Frame 1 is created (6ms) (38ms total)
    Frame 2 is created (4ms) (42ms total)
    Frame 3 is created (4 ms) (46ms total)
    Screen requests new Frame (48 ms total)
    Frame 3 is drawn to screen

    Is obviously bad; you get one frame being drawn over three screen refreshes, with two frames dropped in between. But is this also a bad thing:

    Frame 0 is drawn to screen
    Frame 1 created (6ms)
    Frame 2 created (4 ms) (10ms total)
    Frame 3 Created (4 ms) (14ms total)
    Screen requests new Frame (16ms total)
    Frame 3 is drawn to screen (most recently created frame)

    Still have two frames being dropped, but over the course of a single screen refresh. How jarring is this if it happens consistently? (Note FPS in this case would be 180 FPS). Or if sometimes only two frames are created, and you jump between two and three frames being skipped (150 FPS)?

  24. Remember to, cable specifications. HDMI 1.3 can’t handle anything more then 60Hz @ 1080p. And the circuity to handle arbitrary refresh rates is significant.

  25. I think I figured it out. Use a splitter to split the HDMI or DVI signal. Then capture the signal with a second computer. Have that computer either encode it in lossless or at very high quality settings (10). Use a very light encoding preset for the processor.

    Then take that encode and graph the bit rate over time. That is essentially fluidity quantified. All else being equal a higher bitrate is better, when overlayed on the FPS graph. A computer that can deliver more action will result in a higher bit rate in low FPS situations.

    A obvious caveat to this is you have no way of actually knowing if that action is jiberish or not. But there is no way of figuring that out without having the original computer reference frames against the frames present on the second computer and then looking for error or noise.

    I’m still trying to figure this out entirely, but this is most definitely then simply displaying FPS. As the computer that is capturing has absolutely no idea what is happening on the original computer. Encoders function in a similar fashion to human cognition as their sole purpose is for making content that humans can perceive in motion scenes at low bit rates. So they’re highly dependent upon maintaining quality in action scenes. What we’re after though is how the codec deals with motion and perceives it in a quantifiable state (that being the eventual bit rate if it’s variable, while encoding in real time).

    Wouldn’t that just be inverted FPS? No. It’s entirely possible to have high or low FPS and have the bit rate fluctuate based on the action on the screen. The bitrate would be dependent exclusively on the amount of action on the screen. A good example of this would be setting the host with Vsync on. It would deliver a steady 60fps, but the bit rate would not be a steady rate with all else being equal.

    Another example of this is low action scenes, such as a RTS, where a high or low FPS may be present, but would not influence the bit rate as the bit rate would be dependent upon the amount of action on the screen. RTS’s having a low level of action compared to a FPS. Bit rate is based off human cognition and the ability to maintain a fluid picture at certain settings.

    Possibly a better way of doing this if you had access to an encoder code is pulling the reference bit rate out without it needing to encode. Basically what sort of bit rate it would need to maintain a picture at a certain quality level.

    Ideally you’d want to overlay the bit rate on top of the FPS at the highest resolution possible (millisecond level). There will most definitely be a correlation between bit rate and FPS. All else being equal a lower bit rate between two different test computers would quantify qualitative data as it would mean there is less action happening and less to encode. If the two scenarios are exactly the same, the better performing computer should have a higher bit rate. High or low fps wouldn’t be a determining factor, just how the encoder perceives the action and how the host computers render the action.

    The encoding computer would serve as a human eye.


    Y-splitter off host
    Second computer encodes
    Resolution would match the host
    FPS would be set to something higher (like 120fps)
    Encoder would be set to the lightest possible encoding to avoid the encoding computer being a bottle neck
    There wouldn’t be a buffer as we want to see the spikes and valleys
    Quality would be maxed or a lossless encoder would be used
    This would be done in real time to maintain comparability between the host and the encoder (as opposed to re-encoding or transcoding)
    Bit rate would be used to determine the level of action present on the host computer.
    Bit rate would be variable.

  26. Stop mentioning the 7970 people, even the techreport has reviews of them with no problems, it’s mainly the lesser cards like the 7950 with latency/stutter issues.

  27. I own a 7970 running 1200mhz and have experienced nothing but buttery smooth gameplay at 1080p with 12.11 drivers. If the card does stutter in certain instances I have yet to notice it. My two 6870’s would stutter all the time that is why I went to 7970 its the best decison I have ever made. The average gamer would need and electron microscope to notice the micro stutter that they are talking about on a single card. I know I am going to get flamed for this post, have at it.

  28. This year will be an interesting one for video card reviews, since the whole internet is likely to be having a different conversation about performance. This site has been doing this for about a year and a half. I wonder if we won’t be seeing significant improvements in this area from NVIDIA, AMD, and Intel this year. If not, this will likely force them to give their attention to this matter and we’ll see something over the next few years. At any rate, at least we’ll see some more honest reviews this year if nothing else, thanks to TR!

  29. I noticed that PC Perspective are looking at expanding on the current method used by TR so well done to Scott for getting the ball rolling.


  30. Of course money is an issue, but gamers have shown themselves willing to pay quite a lot for a video card so I would think that if it added less than $10 to the monitor cost there would be a ton of gamers who would want it. Furthermore, if my understanding of LCD technology is correct then there should be little problem with simply holding a frame longer. Even a slower refresh rate would be much better if it made the animation smoother. After all, few people have a problem with the 24Hz of a film movie.

    Just to make something clear, this technology would effectively do away with monitor refresh rates entirely. The screen would simply hold the current frame until the card told it to display a new one. This way slightly slow calculations don’t result in an entire refresh time of delay and the display time should more closely match the simulation time. I apologize that my original post wasn’t more clear on that point.

  31. What about simply ranking the largest change in time difference between one pair of frames and the next over the duration of the benchmark or, alternatively or in addition to, the largest time between consecutive pairs of frames between increments on a continuous time scale (ie, per second or tenth of a second). In simplistic way, this would roughly capture maximum variance between frames and be easy to graph or chart.

    Not only would this give a very simple way of comparing relative “smoothness”, but its use in conjunction with the slow-motion camera should, after a bit of trial and error help determine at which point the lag duration between frames generally contributes to noticeable stutter.

    To those worrying about the monitor response time/refresh rate contribution to this issue, it’s practically moot, as long as the same monitor is used for each test.

  32. Thanks for your reaction Damage.

    [quote<]Now, in terms of perception, a quicker display subsystem will likely let your eye pick out smaller differences more readily.[/quote<] I don't think so, I rather think that as the framerate goes up, the differences would (in absolute terms) have to be bigger to be perceived. I mean, I think it's far more obvious on a 120hz display if a game that usually does a consistent 60fps suddenly drops to 30fps, than it is when a game usually does 120fps and drops to 60fps. And it heavily depends on the game. For FPS games where looking around with the mouse is essential, large swings between 120fps and 60fps would probably be just as noticeable and irritating as drops from 60fps to 30fps, but for many less demanding games the effect would be much less pronounced. I mean, when playing something like BF3, CoD or Quake Live, I'm annoyed when the framerate drops to ~60fps at it feels sluggish compared to 100+fps. But for every BF3, there's a game where I'd be hard pressed to notice anything higher than 60fps. Anyway, I don't think it's an issue worth of an in-depth article. The benefits of a higher refresh rate are pretty obvious for those who care, and simply not important for those who don't. The only thing I'd personally make a point of in a short article (and video) is tearing. While playing Planetside 2, I played at 60hz for a while instead of the usual 120hz, which resulted in rather painful and obvious tearing at roughly 70fps.

  33. congrats Scott and TR other websites are taking notice and adding to the frame “movement”

    PCper has an interesting take on frame latecy testing

    [url<][/url<] looking forward to the collaborative results between the different testing methods its not often the tech industry undergoes this type of testing shift cheers!

  34. It’s ok to be angry if you’re frustrated because you don’t understand something you would like to. It’s not ok to call “it” lies because it doesn’t make sense to you. Give it time and i’m certain a proper way to describe it in layman terms will be made available.

  35. Lies, damned lies, and statistics!

    I still tend to go by FPS when selecting a vid card. I assume the latency problems are mainly driver issues, and common to the cards in that family. Typically, I deal with jerky animation by turning off vsync or fiddling with antialiasing settings, rather than swapping out the card. I sometimes wonder why reviews use such high AA settings, when it’s the first thing I turn off when I want smoother framerates. Although, I suppose years of software mode Quake and it’s non-AA’d successors have made me less bothered by jaggies than the average person.

    Frame times still do not mean a whole lot to me (how fast is a ms?), but it is my favorite graph in the reviews. The lower, thinner plot is the better card, of course. I would just prefer the y-axis be converted to FPS. Kind of an “instantaneous FPS”. I’ve never been a fan of “lower is better”.

    When I see graphs of percentiles, they don’t mean anything to me unless I read the corresponding 2 paragraphs of text, where they are translated into FPS terms. Statistics are more objective, but FPS are what everyone understands.

  36. What you guys are doing with the percentiles is just group up data in a simple statistic manner. That is not a metric for “how much time difference there is for the frames”.

    First, you could mix up things and not use strict statistical methods (create your own metric) with the percentiles. For instance; in a graph where you actually see 60FPS avg, that means ~16ms to draw the frame. You could go, frame by frame, measuring the distance/difference to that average and the bigger the number, the more awkward the frame delivery is. Smooth out the curve and then calculate the raw data against it, point by point. That will spit out a number than can be directly translated as “this thing doesn’t deliver frames in a smooth way to the screen”; the bigger the number, the worst it should be.

    Had to post this, because it’s something I’ve been wanting to say since the very beginning of the “inside the second” measurements 😛

    First time posting here, by the way, haha.


  37. Guys, the short answer is that although they are interrelated in the total operation of the system, frame production rates and the display refresh loop are generally treated as independent variables. That is just how things have worked, with buffering and vsync helping to smooth out the differences.

    Since we test with vsync disabled, there would literally be no difference in our gaming test results captured with Fraps if our display were connected to a 120Hz monitor. So there’s very little for us to address in terms of “adjusting” for a 120Hz display.

    Now, in terms of perception, a quicker display subsystem will likely let your eye pick out smaller differences more readily. The ideal for a 120Hz monitor is constant frame delivery at 8.3 ms, so we could choose to be pickier about frame times if we were targeting faster displays. There’s no question in my mind that some animations at 120Hz are perceptibly smoother than at 60Hz. I’ve seen it

    However, when you get into the difference between generally consistent ~16 ms frame delivery and ~8 ms frame delivery inside of a modern 3D game… well, heh, I remember running Quake at 90Hz on my CRT and being amazed. But I think it is very much a question of “great vs. amazing” and not even “good vs. great.” Maybe we can target an article at this issue at some point, but honestly, it would have to be more about human perception and differences in game tuning than anything else.

  38. A CAP profile just adds or alters an application profile for things that we can “tweak” in the driver without adding new code for. For the most part they pertain to Crossfire profiles, but there are cases where we will release an update that will affect single GPU profiles as well. The latest CAP updates are all rolled in to the next driver release, but CAPs can be released between drivers much quicker and outside of a full-scale QA / release cycle.

  39. Mark at least a third as curious. I feel like 120 Hz might aid cards with periodic high latency spikes, since several fast frames tend to follow slow ones, making the recovery a bit smoother. It would also cure some visual errors since the frame will be displayed closer to instantly, as far as the human eye is concerned.

    At some point, doubling the refresh rate will be unoticeable to the human eye because it will be too fast, but I would love to just see 60, 120, 240, and 480 Hz comparisons to figure out how the returns diminish.

  40. Nice article Scott. It was well written and informative.

    There is no doubt that the newly coined term “Excel – fu” will be a part of the legacy of your work at TR.

  41. Something else that is important to note is that trying to quantify camera data in real life would result in confounded results due to the monitor. The only way to rectify this is with a completely lag free/instant monitor. A CRT would most definitely fit this role or maybe a plasma. CRTs unfortunately aren’t that high res anymore (except for some oddities, someone mentioned one that IBM made at one time) and have a refresh rate around LCDs (although some hit 1600×[email protected]). Plasmas are almost exclusively stuck at 1080p and they’re big, although that shouldn’t matter as long as they deliver accurate testing data.

  42. That’s not as easy as it might sound. FRAPS seems to do okay, but there’s still a concern that it might affect the test results. You can’t measure anything without somehow altering that which is tested, and adding something between the video card and monitor could easily affect how a game feels.

    Hi-Speed video, as long as it’s fast enough (240 FPS would work), will be able to capture those individual frames. The real issue is not in syncronization, but how exactly to analyze the data, which is why FRAPS (paired with Excel-fu) is the tool of choice. It is, however, the only way to capture a real frame without adding a testing block.

    But you can improve analysis without adding a testing block. For starters, lilbuddhaman pointed out that you can take the differential between frames (previous frame time – current frame time) and calculate a “choppiness” with a few Excel clicks. But since there is some variance in the refresh requencies of monitors (even like monitors), you can still calculate a “displayed” or “real” frame by virtually taking a frame every 16.7 ms. Yes, this isn’t quite the same, but you would be able to penalize cards for having too many “non-displayed” frames AND penalize every time a frame is shown more than once.

    My main goal would be to improve the analysis without requiring additional hardware, software, and data captures.

  43. …A RTS game for instance can be streamed very easily at [email protected] with a 2000 bit rate or even a MOBA, but a FPS has trouble running at [email protected] with 3000 bit rate (I do stream).

    The amount of action in scenes and required bandwidth for them is quite a bit different, time being held equal. The encoding preset also effects the amount of compression your CPU tries to apply to the scene (which correlates to worse in game performance, but reduces bandwidth), but that’s different from quality.

    This still wouldn’t offer a metric for latency; This would simply be a metric for gauging levels of motion. Pretty much any of the above variables or combinations of them could be setup as independent and the others used as dependent variables. So to recap you have:

    Quality (sacrifices quality at the expense of fluidity when the encoder hits the bit rate cap)
    Buffer (quality can eat this if it goes over before you start dropping frames and it can be replenished if the scene no longer requires as much bandwidth)
    Bit rate
    FPS (for the stream)
    Encoder preset (deals with bandwidth and CPU usage)
    In game performance (FPS or your other latency focused metric)

    There are definitely a lot of confounding variables. This particularly would cover real life jitter/fluidity (at a certain threshold, not taking into account subjective differences among people those that can see +/- actions in a scene), but a tool for computer latency would have to be developed. So you’d have three different areas playing around with each other when you zoom out to the benchmark level.

    Props if you actually read all of this.

    “There is no one single thing for, its all over the place – the app, the driver, allocations of memory, CPU thread priorities, etc., etc.”

    *Insert a comment about Core Parking and Powertune*, Eat a dick Arag0n.

  44. Insightful writeup Scott.

    I agree with James and it’s interesting how he’s saying a lot of the same things I’ve also said. I also agree average FPS is a poor measure of fludity past initial impressions. That’s where Std. D and Variance come in, but those aren’t absolute measures either. They’re given in tow of average. You can even use them with error bars, which make a graph very easy to read. Perception is all about variance, not necessarily about low FPS or even really high FPS (relatively speaking).

    You guys have put a lot of emphasis on 99th percentile frame time, it’s even used as the big bad graph at the end on your conclusion page. At one point it completely replaced avg fps. I don’t think it’s a good measure of fluidity or even how ‘bad’ one card is from another. You’d be better off taking the average of 10 percent of the worst frames and then displaying them after the normal average, although Beyond X already sorta does this. Beyond X doesn’t give you a sense of how often they’re happening (that’s what a percent does). Sorta like a inverse of what you’re already doing. This isn’t the same as standard deviation though. Standard deviation would show variance, where -1percentile frame time would show the nasty outliers. Neither you’d want a lot of.

    Yup, James goes on to talk about oscillations and variance, which aren’t good. I described such a pattern in the second look on the 7950 problems comments section. Std. Dev plays a vital role in diagnosing those even if 99th percentile and Beyond X show up peachy.

    Discusses generalizeability at the end too, also agree on that. For video games I do agree that 20fps is too low and 35fps is the bare minimum. This is also increasing as games and graphics get more and more complex. There are less solid shapes and more to process all at the same time.

    A really good example of this is the new Hobbit movie. How many people have seen it in 48fps? Did you see it in 24fps?

    I saw it in 24fps and I can safely say the movie was completely designed around 48fps. Big battle scenes have so much happening on them that you can’t make out all that is happening because you’re trying to piece together bits and pieces. Two examples I can think of off the top of my head was the Rabbit Sled chase scene and the Goblin battle on the catwalks. Both are ridiculously blurry messes and hard to keep track of because the frame rate simply isn’t keeping up with our perception.

    I described this in a different thread as well. Fluidity and needed framerate is basically defined by the changes per second, if there was such a metric. That is another metric that currently doesn’t exist. Basically the amount of ground whatever is in the scene covers in comparison to the view point of the camera. So in the Catwalk scene, you have the camera panning across the catwalk while they’re fighting their way across it, new people walk into the scene the complete opposite direction of the camera panning and consequently skip ahead twice as fast as the people walking along with the direction the camera is moving. So you essentially have reinforcements that ‘hop’ into the scene and completely disrupt the flow.

    Adding to this is the extra needed FPS to process higher quality. So you have a pan, then you have a fight going one direction, reinforcements coming from the complete opposite direction, THEN on top of it you have another fight on the Catwalk on top of that catwalk moving 1.5x faster in the direction the original group is going. So any variance between the first group, the second group, the reinforcements, and the pan is extrodinarily noticeable. This is why you never see scenes like this used or with this much action.

    Resolution compounds this issue even more. The higher the resolution more likely you are to notice things that are off or will also use it as a queue to pick up on where a object is if you lose track of it. So if it’s really high resolution you need a higher FPS to compliment it. I should clarify that resolution doesn’t just apply to screen resolution, but rather how detailed stuff is in the game. BF3 is a very good example of this. It has so much detail in it that it’s really hard to keep track of things, even at high FPS with a good monitor.

    So you’d need a metric that analyzes actions per second on the screen relative to the viewer (essentially what your high speed camera does, only that’s not quantitative). Adding another problem on top of this, simply moving imposes a artificial sensory cap that the gamer can’t overcome. If you move too fast the screen simply turns into a blurry mess. This is especially apparent when you spin the camera, which arguably is the most noticeable change you can do, because it’s immediate, it makers the card render a bunch of stuff at the same time, and the user is highly sensitive to the input from their high DPI mouse and how it correlates to a spin on the screen.

    Movies now days and video games are extremely different from the way movies used to be. Beyond the threshold of things looking fluid, such as no longer noticing frames switching in a traditional reel film, you start noticing flame switching through detail. A ‘talkie’ from the good’ol days where someone simply walked on stage and said lines is immensely different from action packed films like the Hobbit and Transformers, which is another good one that this all applies to.

    Monitor refreshes simply compound this issue as they add another unsynchronized layer on top of everything, unless you use Vsync which throws in it’s own set of problems.

    Another thing to consider that James touched in, is even if you discover a way of quantifying actions per second on the screen, you still need to take into account that the threshold a human can process is different for each and every one. So going back to the 20ms baseline, it doesn’t cut it for me and I’m sure a lot of other users. We all have different levels of recognition. So ‘errors’ that are generated visually are easier to overcome for some people compared to others. Like putting together that blurry box and turning it into a person; basic shape recognition.

    This is definitely all good news. From the sound of it from AMD they will be focusing more heavily on latency, which we may see a completely new way graphics cards are designed around. Where professional cards are designed around throughput and gamer/consumer cards are designed around latency.

    It’s important to note Frame Time IS NOT latency. Frame time is simply FPS measured with the X and Y axis flipped. It’s exactly the same thing. What changed in the frame time benchmarks was how often frames were measured (seconds per frame, only in milliseconds), which increased the resolution and allowed is to spot issues with variance.

    Latency would be something focused around the time it takes from when you input something with your mouse to when it changes on your screen, not how many frames a computer can render.

    Honestly if I was to throw you guys a bone I would say a new way of quantifying motion in frames would be found in encoding. Variable bit rate encoders change their bit rate based on the amount of motion in a scene. So you can accurately gauge the amount of action in the scene based on the bit rate required to sustain it at a certain FPS, at a certain quality standard, at a certain compression ratio. If you completely exclude compression you end up with a benchmark based on a quality threshold at a certain bit rate. If you hold quality and compression equal, you end up with a bit rate that is a accurate measure of how much action is happening.

    This is what Streaming does. Quality is equal, so bit rate and performance changes based on the scene. In a lot of scenarios FPS and bit rate is held constant (due to bandwidth caps), so you end up eating the performance difference on your computer and consequently on the stream. The amount of bandwidth a user needs changes drastically based on how much action is happening on the screen. A RTS game for instance can be streamed very easily at [email protected] with a 2000 bit rate or even a MOBA, but a FPS has trouble running at [email protected] with 3000 bit rate

  45. I’ve never downloaded CAP profiles… I always thought they were a Crossfire thing… I’m sure I’m not alone in that boat. If they were part of the driver, why aren’t they just included by default?

  46. What’s wrong with what he said is that the videos which used Skyrim weren’t intended to test Skyrim per se, they were intended to show the validity of the testing method and that the data collected reflects what is actually experienced when viewing the screen. He’s also wrong in the implication that only games which favor nVidia were tested, but Scott (“Damage”) has replied to that sufficiently and it’s not true.

  47. Rage is tricky because the game engine works to maintain smooth 60Hz frame delivery, dynamically adjusting image quality on the fly. We’ve never used it for comparative benchmarking because we couldn’t be confident it was operating the same on one video card as it does on another.

    However, the short answer to your question is that 16-25 ms frame delivery is very quick, and 33 ms isn’t bad, either. An occasional frame at this rate, or even a group of them, isn’t likely to feel slow. Our default threshold for “slow” frame delivery is 50 ms, and even then, we worry about lots of time spent beyond that threshold, not small amounts.

    Remember that a lot of the work we’re doing is comparative, too, so we get picky about how one card compares to the other, because that is the nature of the task at hand. But look at what we said about the Radeon HD 7950 vs. GeForce GTX 660 Ti in Borderlands 2, where the Radeon clearly has more frequent and larger latency spikes:

    “These results are somewhat heartening. Although the Radeon does spend twice as long above our threshold as the GeForce, neither card wastes much time at all working on especially long-latency frames. In other words, both cards offer pretty good playability in this test scenario. Subjectively, I prefer the smoother gameplay produced by the GeForce, but the Radeon doesn’t struggle too mightily.”

    These tools let us detect minute differences, but we must keep them in perspective. If the game is playing well for you, enjoy it and don’t fret!

  48. I agree that the number of UNIQUE frames displayed to the monitor is the be all end all. I don’t care how many frames the GPU can spit out, I care about how many are displayed to the screen. [Note this disregards the jumping due to too MANY frames being created; lets put that aside for a second].

    The *easy* solution would be to come up with a way to read the raw DVI/HDMI image data as it comes across every 16.7ms, and look for changes. If the data is the same, then no new image is sent across. Every 60 frames = 1 second. Easier then using a high speed camera and trying to sync it to the monitor refresh rate…

  49. Money. Circuitry is a lot simpler if you only support a few refresh rates (less circuitry/timing mechanisms). For LCD’s, you also have to factor in the time it takes for the liquid crystals to actually change (Response Time, which is NOT a flat curve by any means), which limits you to refresh rates of about 60 for most mid-tier displays.

  50. I tryed to run msi afterburner together with Rage (fully patched) but it made the game go to a blackscreen (it didn’t crash tough). I think it’s more of a problem between Rage, afterburner and possibly video driver.

  51. Great article, Scott – so glad to see the wider PC gaming community beginning to see the light.

    I have one question. I recently bought RAGE because of your recommendation ($5 Steam Sale FTW), and it has been running very well on my PC. i3-2100, 8GB RAM, HD 6870, @1440×900. I have it set to the highest settings allowed, with Vsync and 4xMSAA enabled, and it’s been an incredible experience playing this beautiful game at a consistent 60fps (with no screen tearing and very little texture pop-in). However, since MSI recently inserted a frametime metric in Afterburner, I’ve been keeping an eye on those when I play games, and I do see some instances in RAGE where it gets moderately spiky (16-25ms) – during more intensive scenes. BUT I do not see any slowdowns or stutters on-screen. Everything appears to stay at 60fps fluidity. I’m grateful but I can’t help but wonder why – shouldn’t the display output show intermittent 33ms frames, causing a visible stutter, even if it’s not visible on the frametime metric? Maybe RAGE automatically uses triple-buffering when Vsync is set to ON?

    This also makes me wonder if the HD 7950 would be perfectly smooth with Vsync on in those games you tested against the GTX 660 Ti where they’re both averaging well over 60fps. (Believe me, I am not defending AMD – I’m upgrading to either a GTX 660 or 760 later this year. Just wondering.)

  52. I don’t see how you could benchmark PS2, unless you put 2 systems head to head with 2 accounts logged in at the same time in the same party bus (or galaxy), and use a 3rd account to drive said bus to an engagement. Very convoluted, not reproducible and only useful in a strict head to head comparison of 2 complete systems. And that’s hoping that the playing culling doesn’t mess things up!

  53. I too am curious to see if a 120hz monitor has any effect on the benchmark results. I have a hunch that the answer would be ‘no’ though, unless trickery like vsync is involved.

  54. AFAIK Skyrim is still broken at framerates > 60fps. As such, the tests don’t really make sense for any real playing of the game. It would be interesting to see if the problem of high latency frames still exists when the minimum latency is 16.6ms.

  55. The main reason I was looking at cameras is cost. There are consumer cameras out there that can do high-speed recording on a budget, like the Casio ZR100. I figured that there might be cheaper cameras of lower quality out there, cheap enough so I could just go buy one and start experimenting with it.

  56. Skyrim and BL2 were 2 of my favourite games of the year and play wonderfully well on my 6950 driving an Ultrasharp 3011 at native 2560×1600 with settings cranked up except for PhysX in BL2 which is definitely an NVIDIA feature. If anything my aging Q6600 at stock 2.4GHz (overclocking seems to stuff the RAID system) is the bottleneck but I’ve been spending my desktop upgrade money on multiple U3011’s and a pretty mean storage subsystem 🙂 Or maybe I’m just not too fussy about this frame rate perception thing.

    I also tend to play the big blockbuster FPS’s that seem commonly tested. I think the benchmark mix is pretty relevant to my gaming tastes.

  57. So is there anything factually wrong with what he said? Is Skyrim actually broken with vsync off?

  58. Wow. Really? So you want more GE and less TWIMTBP games? You don’t like that TWIMTBP game plays better on nvidia, but you are happy with GE game which plays better on AMD. Last time I heard that AMD is pushing GE program not Nvidia.

    So how about games without GE/TWIMTBP for fair reviews?

  59. Could you guys test Planetside 2? I know there isn’t a benchmark yet and the results can’t be reproduced due to the random nature of the load put on the system depending where you are ingame, but even so i”d like to see an article about it. Just make 3 or 4 systems the same except the video card, have a person for each system to play, form a squad, jump in a Sunderer and see what gives.

  60. [quote<]Why would games such as borderlands 2 and skyrim be tested? Those have favored nvidia since their inception, I won't even go into the obviousness of borderlands 2. Gearbox received assistance from NV during development. Running better on NV is hardly surprising.[/quote<] I've always found this line of argument a bit silly. Games like Borderlands 2 and Syrim are tested because PEOPLE PLAY THEM. Why the hell would you NOT benchmark them? May I remind you that we buy videocards to play games and not to win benchmark dick-waving contests? If the GPUs of one manufacturer provide better or more consistent performance then I want to know about it. That's what I read reviews for. So if nVidia expends significant resources on collaborating with game developers to make sure that the games run well on nVidia videocards then that's something the reviews should bloody damn well reflect.

  61. Hi Scott

    I had a read of your linked “Inside the Second” article. I don’t believe it discusses where the problem really is. In that article you describe how the game engine hands the frame to the DirectX (or OpenGL) driver then “stuff happens” before the image finally goes out the DVI connector.

    Actually, how it works is that the game engine signals the frame END to the driver using glSwapBuffers() (for OGL) . At this point the graphics card driver flushes the pipeline, commits any outstanding commands and then signals that frame buffer can be swapped. This doesn’t take much time as most of the work has already been done and significant delays would be vsync or other necessary delays.

    Where most of the work is done and time taken is during the “game engine” part which I think is labelled T_game in the diagram. After glSwapBuffers() the graphics driver is ready to receive a bunch of new commands to start rendering the next frame. And this takes time and lag is happening here = T_game. So, the variation and unhappiness is happening here and not so much T_ready to T_display.

    This can be due to bad tuning and interaction between game and driver. If the driver doesn’t know about the game in advance – the patterns of data and command streams – then it may sub-optimally allocate buffers, schedule commits or other things which may affect real time performance. Realtime is hard to do when you’ve got heaps and other difficult (in sense of getting predictable performance) data structures.

    ps I don’t even know why there’s a T_render in that picture. Rendering is happening all the time during T_game. It is only the unflushed commands at the time of glSwapBuffers() being rendered there which as I’ve said is probably a very small proportion of the total commands to draw the scene.

    Pancake (graphics driver programmer years ago)

  62. Scott,

    Have you taken a look at graphing the first differential of the latency as a measure of smoothness? Highly positive values would seem to capture the “stretched out” frame presentation (where the pipeline is momentarily clogging or whatever other reason) that is indicative of a poor gameplay experience.

    Great discussion!

  63. I actually looked into it, and there’s not really a need for a camera, IMO. A simple capture card like a Blackmagic Decklink could be used to capture HDMI frames and look at a specific part of the image, where the info would be encoded as an RGB value. If you have a good way of adding an overlay with a valid (FRAPS-like) timestamp for each frame, you could figure out when it is displayed, modulo display processing delay.

    Issue with something like the Blackmagic: no support for 120 Hz, AFAIK.

  64. Jesus christ. You can’t play skyrim with vsync turned off, unless of course you want flying inanimate objects everywhere and the day/night cycle screwed up. The game physics engine is tied to vsync – if you turn it off for THIS GAME AND THIS GAME ONLY it completely FUBARS the game. ANYONE who has played skyrim for any length of time (you know, people who actually play games, not just benchmarkers) will tell you this. You can google it as well. Apparently, techreport are benchmarkers – they don’t play games.

    The game is broken with vsync turned off. You can easily google this. Which is why benchmarking THIS game in particular with vsync off is completely disingenuous.

    So here you have nvidia marketers and focus group members begging websites to do skyrim tests with vsync turned off. Nevermind that vsync off screws the game up and breaks it, test it anyway.

    How about testing some games that aren’t known to run substantially better on nvidia anyway? Excuse me while I roll my eyes at the borderlands 2 tests. A TWIMTBP game plays better on nvidia. SURPRISING NEWS.

  65. Well Scott, it is good to see TR’s work being acknowledged.

    I have a samsung 2233rz true 120hz monitor, and I find it quite a bit smoother than my 60hz monitor, even for 2D desktop use.

    So the next question would be, is a 120hz monitor required or would it be more useful for these tests you run than a 60hz monitor would be, or are there other factors at play????

  66. For the record: tearing is much reduced with a 120hz refreshrate. I switched back to 60hz yesterday for one gaming session (computer -> hdmi -> tv connection was acting up so I switched my monitor to hdmi to make sure it wasn’t the cable or graphics card). I play Planetside2 which won’t run at very high framerates because of some engine inefficiencies, but even at 50 to 70 fps, the difference between 60hz and 120hz is huge. With 60hz, tearing is all over the place, with 120hz the game feels normal again. It’s not quite as smooth as a absolutely constant 60fps with vsync though.

  67. Two points. It isn’t a secret that nvidia’s focus marketing group is making a big push to get TWIMTBP games tested for these purposes.
    They publically state that they’re purposely emailing sites like yours and targetting specific games. What I would like to see is other games such as Sleeping dogs, Max Payne 3, etc etc instead of just games that are known to favor nvidia. Why would games such as borderlands 2 and skyrim be tested? Those have favored nvidia since their inception, I won’t even go into the obviousness of borderlands 2. Gearbox received assistance from NV during development. Running better on NV is hardly surprising.

    On that note, you have a lot of conflicting data on this issue. Prior benchmarks show much better frametime results in the same games, yet oddly enough since nvidia focus group has been emailing you, this has changed. What accounts for the difference in results? Look at your own results from the release reviews of the GTX 680 and 7970.

    Second point. This isn’t clear in your writeup, but many websites are testing skyrim with vsync disabled. This makes sense in most games for benchmarking, however skyrim is a special case – Disabling vsync in skyrim is known to cause anomalies, physics errors and stuttering in skyrim, why would anyone intentionally benchmark a game that is broken when vsync is disabled? It seems as if some specialize in benchmarking games but don’t actually play them. Anyone that isn’t completely oblivious would realize that disabling vsync in skyrim breaks the physics engine in particular and alters the day/night cycles.

    Long story short, disabling vsync in skyrim breaks the game (as anyone who has played through it and witnessed flying inanimate objects can attest) so benchmarking skyrim in this fashion makes no sense. The game is broken when you apply the ini change to disable vsync.

  68. Yes, you need to use the CAP’s to get the latest game profile updates. This includes stability and bug fixes as well as performance improvements.

  69. Not a surprise that James Prior’s “huge problems!!!!!” were, in fact, mountains made from mole hills. Took him weeks to articulate a fairly simple disagreement.

  70. Hi Scott, the ultimate solution is to have FRAPS paint the fame number on the screen, record the screen output at the end of the HDMI connector using HDfury or some such device, and correlate the watermark from FRAPS to the frame number in the video.

  71. So, all the AMD fanboys furious that their beloved was exposed can be thankful to Tech Report for getting AMD to look into and fix their issues – and in turn giving the same AMD fanboys better frame times / user exerience in their games.

  72. Sounds like you two are close to meeting in the middle;

    He’s looking for a way to measure perceived input/feedback consistency, of which frame latencies are a good indicator – being a critical part of input/feedback loop

    You’re looking for a way to measure perceived animation/motion consistency, which is a good indicator of how quickly the user input/feedback loop is performing.

    It harks back to the early days of VSnyc in competetive gaming – for me that was the move from Dial-up to broadband/LAN gaming in Quake2 and Q3A; [list<][*<]VSync *ON* was smooth, but introduced cyclical animation inconsistencies because you could force the game engine to render internally at different speeds, the competetive advantage being at 125fps to correspond with an 8ms game engine update.[/*<][*<]VSync *OFF* was not smooth to look at, tearing everywhere but those of us with CRT's that could run at 120Hz were blessed with an input/feedback [i<]feel[/i<] which borders on perfection.[/*<][/list<] No matter how much I understand about the rendering pipeline, that is only half the battle since the consistency at which the internal game engine calculations are running play an equally significant role. I am just glad that more people like you guys are addressing the most important issue - which is to reduce inconsistency. Latency is ideally nonexistant, but even with higher latencies, they can be tolerated as long as they are consistent. Keep asking the questions and keep us updated, no matter how nerdy or technical they get 🙂

  73. [quote<]You probably couldn't tell exactly what frame is shown on the monitor, but you would still measure freezes and jumps in animation.[/quote<] If Fraps could add an overlay in which the current frame number (as measured by fraps) is shown (multiple times along the left or right edge of the screen), you could use a high speed camera and measure the actual time at which a given frame is shown, as well as see if any frames are being skipped or shown twice. edit: I was looking online for cheap cameras with which you could capture this and process it. I figure that if you add the framenumber as a big barcode, you could use any low resolution camera (640x480 or something like that) as long as it gets 240+ fps. It'll probably take quite a lot of processing power to read the barcodes out of the individual frames, especially if you use something like OpenCV, but it doesn't sound too far-fetched..

  74. I can’t manage to link this, but here’s something from the comments on the “revisited” article.

    [quote=”Damage”<]I've not been persuaded that mean-plus-deviation results are entirely helpful, either, for several reasons. For one, we are addressing a broad audience and want to make the info we present easy to interpret for all. Also, back to the real-time system issue, the problem we wish to highlight is long frame times, not short ones--and we are interacting with a display subsystem that has certain limits. Placing a negative value on "variance" itself without reference to our goal of consistent frame delivery within certain time limits doesn't necessarily accomplish what we need. Variance on the low side of the mean isn't a bad thing, for one.[/quote<] Yes, I think looking into variance might catch variability, but I think Scott is right to focus on penalizing higher frame times, as fast frames don't generally cause any issues.

  75. maybe you could introduce variance as yet another measure ? I think both average FPS, 99th percentile, and frames over 50ms are relevant. The FPS/latency variability James discusses would be covered by variance.

  76. It’s not the same. ‘Time over X ms’ just adds up the total of frame times over the threshold. What CampinCarl suggests is a delta (change) measurement [i<]between[/i<] frames. It wouldn't replace 'time over X ms' which is still important for absolute numbers, but it would more clearly show smoothness because it would deal specifically with instances where frametimes change greatly from one frame to the next.

  77. isn’t that essentially what time over 33.3ms or 50ms shows? the long frames that cause a pause/stutter.

    i like in the email how he backtracked quickly. then brought up well is it fair to only benchmark X number of seconds in a certain portion of the game? well you could make that argument about any video card benchmark. then the best line of the whole article “mom upgrading little Jonny’s gateway – to understand the experience”. ahah you know he lost right there. its clear to me TR does the best video card reviews.

  78. One thing I’m unclear on – is there a suggestion (or suspicion or whatever) that the act of injecting the stamp might influence downstream performance ?

  79. Why don’t modern monitors and video cards support arbitrary refresh times? That way you could refresh the screen exactly when the frame is finished. Even if your card can finish one frame per monitor refresh time you should still get more smooth animation if the display time exactly matches the simulation time. And if it does take just slightly longer you don’t have to wait an entire monitor refresh time for it to be displayed. For the videophiles out there this would also make it possible to get those odd frame rates, such as NTSC’s 29.97 FPS.

    There would be two limitations of course. The first would be the maximum and minimum refresh rates of the monitor. The second would be that it would only be useful for full screen display. Otherwise you might have multiple windows demanding different refresh rates.

  80. Actually, I was saying something similar, but this would be far easier to implement (it’s just taking the differential). I would still penalize “Time spent beyond 50 ms” because long waits would be worse for animation, but this would still capture stuttering, and would (rightly) double penalize when really long frames are followed or preceeded by very short ones.

    I would only be careful with this because the only thing that really matters is what actually makes it to the mintor. Frames shown several times are going to break the flow much worse than getting four or five in the 16.7 ms it takes to refresh.

  81. Seems like the best thing might be to quantify the amount of frames that go from a lower frame time to a higher frametime; this seems to be the definition of ‘non-smooth’ gameplay. From there, you’d have to define a percentage of those that would actually give something non-smooth; i.e. from 11ms to 13ms is probably not going to really bother most people, but 11ms to 17ms might be. I think at this point we’d end up with some sort of 2D spectral density plot that might be difficult to digest.

  82. The overall approach is undoubtedly right, and the presentation of the results is now pretty much spot on. The really interesting point from Andrew is that we can rely on FRAPS to reveal latency problems (though as he also points out, the absence of a problem in the FRAPS data doesn’t guarantee the absence of a problem at the display).

    The interesting point that James arrives at in his second email is whether a result from a particular benchmark is necessarily representative. This is a thorny point and likely doesn’t have an easy solution. I think it’s the behaviour of each graphics card (and driver combination) as they reach their limits of smooth performance that is of most interest. This may or may not be captured in a specific benchmark condition chosen before time.

    Clearly it’s impractical to test every card under every possible operating condition to find the tipping point, but it’s not clear what the alternative might be. This almost suggests something along the lines of the HardOCP test technique, but driven by hard frametime statistical analysis rather than being driven entirely subjectively.

    The first step in this direction would be to define a cut-off result in the statistical analysis that delineates “smooth” from “not smooth”. Using your favourite measure, this could be a limiting amount of total frametime above 50ms within each 100s of benchmark time (for example). Once you hit this pre-determined limit, you can define that condition on that card as “not smooth”.

    Whichever way you slice it, this ends up looking like a lot of work, though perhaps not that much more than a “traditional” review if you can automate the frametime capture and analysis process sufficiently. Good luck!

  83. Thought: Wouldn’t plotting the difference between frametimes frame to frame give you a pretty good visual indicator of potential stutter points? A second scoring such as “Frame Difference over 15ms” would be a great partner to the 99th percentile metric, IMO.

    These seems like the direction the conversation was going…

  84. Does anyone here download the catalyst application profiles (CAP) with a single, desktop GPU?

    I’ve heard you “should” to get the most out of your card but I never had. Thoughts?

  85. I do think Mr. Prior has a point regarding fast frame times and how changes in frame timing can be jarring. I think something like this happened in the 7950’s high speed video. When the worst “jumps” in animation happened, the player seemed to jump forward significantly, as if several frames were rendered but never shown on the monitor.

    Ultimately, the frames actually shown on the monitor are what matter most.

    I think the best solution would be very high speed camera footage of the monitor output (the real result), but you might be able to simulate that effect by choosing a refresh time (16.7 ms for a 60 Hz monitor) and determining the “real” frames shown. This extends the “Time beyond 50 ms” metric because you could count how many frames are “lost” while the animation is waiting (and the same frame is shown more than once on the monitor), but you could also start counting how many frames are rendered but never shown because they happen inside the refresh rate.

    You probably couldn’t tell exactly what frame is shown on the monitor, but you would still measure freezes and jumps in animation.

  86. This little tool beats ’em all:

    [url<][/url<] A perfect addition to FRAPS.

  87. Because it’s 2:24am right now, this will be short. Nice post (yes I’ve read the whole thing) and it seems that TR have really started something…wonderful in benchmarking/testing.

  88. When what happens inside the second does matter! SSK’s wife however has other things to say about what happens inside their second. 🙂