Etc.

Howdy, all. I know you’ve not seen many articles coming out of Damage Labs lately, but I assure you I am hard at work. In fact, I probably shouldn’t be stopping to write this post, but I couldn’t resist sharing a little tweak I’ve been considering making to our game performance analysis.

You may know that we’ve changed the way we test games in order to better focus on frame latencies rather than FPS averages. One of the struggles we’ve had in this process is finding the best way to convey a sense of frame latency to the reader. We’ve made the case for looking at the 99th percentile frame latency—that is, the point at which 99% of all frames in the test run were rendered—and we think that’s a helpful metric. However, that number is only a snapshot at a single point. I’ve been wanting to capture the overall latency picture more fully. Here’s my latest attempt.  See what you think:

I think it tells the story of frame latencies a little more completely, hopefully without information overload. (And yes, obviously, the GTX 560 Ti has a problem here: it’s running out of video RAM capacity at the resolution and settings we tested.)

This is one of several new ideas I’m cooking up. We should have them all implemented in a tasty new review soon.

Comments closed
    • Damage
    • 8 years ago

    Ok, here we go, guys. Frame latency plotted as a continuous function.

    [url<]http://bit.ly/GHsKl9[/url<]

      • willmore
      • 8 years ago

      *PERFECT*! I think you can safely say that you’re using the most advanced methods in the tech reporting industry.

      • Bensam123
      • 8 years ago

      I just splooged all over my keyboard. Will TR go inside the % to offer a better look at the last 5% too?

      • jensend
      • 8 years ago

      Nicely done. I can see at a glance that the new version, with its many additional data points and consistent x axis, does a [b<]much[/b<] better job of helping us understand the contrast between the 5870 and the 560 Ti 448 than the old diagram did. I said before "if you want to concentrate on the right tail of the frame distribution you can just change the scale of the probability axis." I'll elaborate. Your first version above, with the seven points, did have one advantage over your new one- it reflected the fact that we're concerned enough about the rarer high frame times that we want to be able to closely distinguish the details of what goes on in that last few percent. You could just produce a zoomed-in graph showing only the last 5% or so, but I think your original decision to have the median frame time as the starting point of the graph was a good one-- it provides an important sense of perspective, and we don't want to lose it. You could produce two graphs, one zoomed and one exactly like the one you just did, but we've already got a lot of graphs. One solution would be to transform the x axis. There are infinitely many alternative ways to do that, but the most commonly done is a "lin-log" semilogarithmic plot. If you've not heard of this before you'll get the basic idea if I say equally spaced tick marks on the x axis might be labeled 1/1000, 1/100, 1/10, 1, 10.. or 2, 4, 8, 16, 32.... Here we'd have to do kind of the reverse - for example, equally spaced tick marks might be labeled 50%, 75%, 87.5%, 93.8%, 96.9%, 98.4%, 99.2%, 99.6%. [url=http://minus.com/mKK4Ggvh1#1<]Here's a quick example using Octave to show the quantile function of a standard normal distribution.[/url<] linscale.png is the normal quantile plot starting from the median; revlogscale.png allows us to better see the detail at the right tail.

        • willmore
        • 8 years ago

        Oops, sorry, I was so psyched to see the X axis continuous that I missed the switch away from being a semi-log plot. Gotta have that.

          • jensend
          • 8 years ago

          Well, the original wasn’t a semi-log plot either, it just happened that his choice of seven points kinda-sorta-approximated one.

          If you remove the point at 66% and insert one at ~97.7% it becomes quite close to the right log scale for seven points from 50% to 99.5% (50%, 76.8%, 89.2%, 95%, 97.7%, 98.9%, 99.5%).

          Also, 50% and 99.5% were reasonable choices for endpoints when he was just doing seven. As I already said, I like the choice of starting with the median to give us perspective. How do we choose the right endpoint? The further you go the less data you’re basing your plot points on. Since Damage is collecting 2000-8000 frames of data on each run, there’s data enough to plot points out to at least 99.95%, like I did on my above demonstration graph, but the specifics of the last few outliers may vary too much from run to run to give us reproducible results beyond ~99.8%.

          (Worth mentioning in connection with the right endpoint- with the most natural ways of doing the log scale transformation, 100% moves off to infinity.)

      • jensend
      • 8 years ago

      BTW have you ever considered releasing one of your raw datasets? I can see plenty of reasons why you might not want to do that all the time, but if you had one example where the raw frametime sequences behind a plot like this were available for download, I bet I’m not the only one who would be interested in taking a look at it and tinkering around with it.

    • Tamale
    • 8 years ago

    This is awesome stuff, but I do agree that the selected points are a potential pitfall.. much better to plot the whole function on a logarithmic Y axis from 50% to 100%.

    • Bensam123
    • 8 years ago

    I still find the use of the term ‘frame time’ as confusing and rather unintuitive. Why not just use latency in this particular instance? You don’t need to look up the definition and the units would even be listed on the chart instead of %s.

    I understand TR is trying to push frametime as a buzzword, but still. :l

    • Chrispy_
    • 8 years ago

    I’ve referred people to TR’s inside the second review since I first read it, notably a couple of bloggers in the game developer business.

    It’s by far the best way to represent smoothness to date

    I have no problems interpreting the ms graphs, but would it be possible to put two lines on the graph, or to add a band highlighting the desirable range from say 17ms to 33ms? Less than a consistent 30fps is undesirable, and for most people the vsync is 60fps, meaning that extra performance is wasted.

    • Applecrusher
    • 8 years ago

    Love the new data you are all giving out for the tests, almost feel spoilt getting to see it.

    The only Issue I ever have though, and its an issue most sites have, is that I am colour blind and quite often I am unable to tell one line from the other without reading the text and refering back several times.

    EG: When I first glanced the graph I could not tell 560ti from the 7870 and the 6970 from the 560ti.

    Although its not a real issue, as I said a quick read of text and just knowing where a card should sit in the graphs is enough to get over it, I was wondering if the graphs could be made interactive?

    Maybe when hovering over the 5870 in the key it could highlight it on the graph? Or click items in the key to turn them off?

    I understand this would be a timely undertaking so do not really expect anything… But I would love it if it or something similar was implemented.

      • BIF
      • 8 years ago

      I am not color blind, but I support this request.

      Maybe implement a feature that makes a line “wiggle” when you float your cursor over it?

      That way you wouldn’t need to provide multiple color palettes.

      • yogibbear
      • 8 years ago

      Another suggestion to fix colourblind issues is to make the lines have X’s O’s and diamonds and what not overlaid with different colours…. yes it’s a PITA, but if there’s enough people out there that have this problem then I guess you could consider it. (At my work my reports are read by a lot of people so I have had to modify graphics for colour blind people in the past which pissed me off at the time and made the graphics look more fugly, but now more people can read it.)

      • internetsandman
      • 8 years ago

      I actually love this idea. I’m not color blind, but in some of the tests the lines are so thin (and my monitor has such a high DPI) that telling different shades of green or red apart for the latency graphs is near impossible. Having a way to mouse over a line and have that line highlighted on the graph and in the key, while the rest of the graph fades out for clarity, would be amazing

    • tygrus
    • 8 years ago

    Suggestions:
    So the 50% mark is the median, could you add the mean (avg=average) to the left and the minimum (100% proportion rendered faster than #ms) to the right in the x-axis. It’s going to get hard to see separation of the lines when you try to show more cards or a really slow card re-scales the y-axis. Could also be expressed as a % of frames that exceed a FPS rate. eg. 10% were faster than 100fps, 25% were faster than 80fps, 50% were faster than 60fps, 90% were faster than 40fps.

    • tbone8ty
    • 8 years ago

    [quote<]"We should have them all implemented in a tasty new review soon."[/quote<] GTX 680 here we come!

    • jamsbong
    • 8 years ago

    This is a very good way to highlight latencies. As mentioned before, latencies is the bit that affects the gaming experience while high high fps values simply gives you bragging rights.

    Hopefully, this method will highlight strange framerates in xfire/sli card.

    You can also use Log scale on the y-axis (instead of linear) to highlight the faster cards vs slower ones more clearly. But I’m not sure the general audience will understand log scale charts though.

    Good work! Do you guys know if Nvidia/Ati or game dev use latencies as an indicator to optimise their games?

    • marraco
    • 8 years ago

    Tech Report keeps getting better and better, and gets the edge over so many websites reporting all the same information.

    This concept avoids forcing an arbitrary limit on the latencies that matter.

    I only have a concern over the percentile: It should not loose the vital information of the worst frame latencies hidden on an averaged percentile, nor the total number of frames. Let’s say that a card gives 10 perceivable low latency freezes, but they are on the 99,9% percentile because it reach so many frames. Meanwhile other card is much slower, but gives only one low latency event, just because is slow. That information may be lost averaged on a 99% percentile. The worst frames should be sorted and compared right away, apples to apples.

    Also, the concept of percentile may be too complex for some people, alienating some readers. That’s why I would show all the frames. That would show the worst frames, and also the number of frames rendered. A zoom may be necessary on the lower latency side to show the information on the key place of the chart.

    Information is better when is rich, but also simple.

      • eofpi
      • 8 years ago

      [quote<] Let's say that a card gives 10 perceivable low latency freezes, but they are on the 99,9% percentile because it reach so many frames. Meanwhile other card is much slower, but gives only one low latency event, just because is slow. That information may be lost averaged on a 99% percentile.[/quote<] That would require the faster card to be [b<]ten times faster[/b<] than the slow card, on average. If a review is comparing cards that people are likely to be cross-shopping, that will never happen. And even if a review had such a wide range of cards (e.g. old vs. new upgrade prospects), other graphs would show it in a clearly obvious way, particularly on the graph of each frame time for each card.

        • marraco
        • 8 years ago

        I made up an extreme example just for illustration.

          • eofpi
          • 8 years ago

          And my second point still stands in less extreme examples: the frame time histogram will still show approximately how many latency spikes there were per card, as well as the number of total frames, and you can ballpark the frame counts from those graphs to use as a mental scaling factor on the percentiles.

          As I understand it, this new proposed percentile graph is supposed to be in addition to the other graphs (except maybe the percentile bar graphs), so the percentile line graphs won’t actually remove any current data presentations from future reviews.

      • BobbinThreadbare
      • 8 years ago

      The vertical scale is how long it takes to render a frame, not some arbitrary latency rating. If a card is slow, it’s going to have a higher bar the entire length of the graph.

    • MrJP
    • 8 years ago

    I very much like the new plot. You’re rapidly heading back towards this kind of thing: [url<]https://techreport.com/r.x/core-i7/cm3d-phenom-x4-9950.gif[/url<] P.S. I concur with all the calls for frame rate axes as well as or (preferably) instead of frame times.

    • Vulk
    • 8 years ago

    I really like this. This is a lot more useful than some of your earlier attempts to show this data.

    As to some of the criticism on the thread:

    I understand why people are calling for frame rates, namely they’re familiar and no one else is really releasing information like this so it’s hard to compare reviews if this is all you have. This clearly shows useful information that FPS doesn’t. I’m sure the 560 looks great in FPS, yet this is highlighting it’s memory limitations and how they may affect game play.

    I have no problem reading this as is. I’m not sure inverting it so that the top performers are on top really brings anything to the table.

    My concerns:

    It’s difficult to spot when 2 cards are nearly neck and neck, like the 580 and 7870 appear to be.

    I have a fairly good handle on my primary monitor monitors (18ms start to finish, which is bad for gaming) so I can roughly calculate when a given card would saturate the screen refresh rate. It might be nice to get some rough baselines (you couldn’t get too exact I’m assuming given the range I’d expect from the various monitors from various manufactures would of course have a standard deviation, but a general statement of what you can expect) for 60hz TN, 120hz TN, IPS, because if I just do the quick math and divide 1000/60, once we dip below 16 2/3 MS we’re above the ability for most LCD monitors to display information, at which point I’m guessing it’s time to turn up the detail settings, get faster monitors, or get more pixels to display because the fancy new card isn’t doing much to improve the visual experience, correct? I’m assuming that my math isn’t strictly speaking true either since I’m assuming that is the rate the controller spews information to the LCD lattice which then takes additional time to twist into the correct form to display the information… Hence my request, a rough baseline for a standard monitor of a given type, if that’s possible. If you could then display those as broken lines across the graph to give some sort of view able metric, you might not need to worry about individual FPS for each card. Even a static line showing the timing to achieve 60hz might make things clearer to most viewers because it would indicate 60fps if nothing else.

    Just a thought, I’ve been wondering at what point we stop worrying about high end performance because we’re just discarding most frames anyway because they can’t be displayed by the monitor, and although the frame render times makes that somewhat moot now, I’m wondering how much longer that lasts, because the display will become the ultimate constraint on the illusion of movement eventually. At a certain point all you’re just buying bragging rights, future proofing, or the ability to drive ridiculously sized monitor(s), and that might give some indicator of when that is.

    • Mourmain
    • 8 years ago

    I like it, but I have a suggestion for something that might be better. The thing that seems awkward to me is the way we have the worse values for the larger percentile numbers.

    Maybe using a histogram with frame duration on the horizontal, and number of frames on the vertical instead would be more easily read: we’d be able to see how the frame durations spread out. The problem might be that small numbers of frames would be hidden by the overwhelming averages. So the vertical axis will need to be logarithmic, to bring the tail end of the histogram up into view.

      • BobbinThreadbare
      • 8 years ago

      I don’t see how you show multiple cards at once with a histogram.

    • jensend
    • 8 years ago

    What you’re trying to do here is basically just plotting the quantile function (inverse cumulative distribution function) of the frame distributions, but with only seven arbitrarily selected unevenly spaced probability points, drawing misleading straight lines between those points.

    Why only use these seven points? Why not do a full quantile function plot? It’s easy to do, and it presents a lot more information in a less-misleading way. If you want to concentrate on the right tail of the frame distribution you can just change the scale of the probability axis.

      • jensend
      • 8 years ago

      See also Tufte’s “The Visual Display of Quantitative Information” re: graphical integrity, “data-ink,” and data density. (Since you’re in the business of journalism- reporting and interpreting data- if you don’t already have a well-thumbed copy it will be the most important $25 you will ever spend.)

        • JustAnEngineer
        • 8 years ago

        I have a couple of Tufte’s books. They should be mandatory reading for reviewers.

        I believe that what Damage really wants here is a histogram.

          • jensend
          • 8 years ago

          I agree that a histogram or a [url=http://en.wikipedia.org/wiki/Kernel_density_estimation<]kernel density estimator[/url<] (an empirical approximation of the [url=http://en.wikipedia.org/wiki/Probability_density_function<]PDF[/url<]) would definitely be a good way to show more of the information from a single frame time distribution. You could do pairwise comparisons this way too. However, as you increase the number of histograms/KDEs in the same plot beyond two or three I think you'd quickly get incomprehensible graphs. Whether a card has more/fewer frames in a single bin than its rivals doesn't tell you anything about whether it's worse/better in isolation; you have to look at more of the distribution-- but that would be really hard, because there will be tons of overlapping/crossing points. Integrating (resulting in the empirical CDF) gets rid of many of the crossings and also allows for a pointwise comparison to be meaningful; inverting/flipping axes means that we have more readily-interpretable "scores" on the y-axis. So I think the quantile function (which is what Damage is grasping at here but not reaching yet) is a reasonable choice here for both showing a lot of information about the distribution and being able to compare a group of cards.

            • jensend
            • 8 years ago

            The following idea may help with intuition about why integrating the PMF helps. Seeing exactly how it applies to the above discussion about arriving at the quantile function is more of a mess, and to some extent I’m just bringing it up to try to increase visibility of this idea, which I’ve brought up before and which I don’t think the TR eds really noticed.

            A good way to think about how frame time distributions give us meaningful comparisons between cards is that the “badness” of a card should be the integral of some weighting function b(x) (how perceptually bad a delay of x between frames is) with respect to the distribution of frame times.

            (Our empirical distribution is discrete rather than continuous, so an integral with respect to it is just a sum over all frame times of b(x)*p(x) where p(x) is the probability mass function i.e. the proportion of frames which took exactly x ms.)

            The metrics we’ve been using so far [i<]all fall into this pattern - it's just that they use relatively simplistic and unrealistic choices of b(x)[/i<]. Mean frame time is b(x)=x (avg fps just unnecessarily inverts the result). The "frames above n ms" score uses a step function: b(x)=0 for 0<x=<n, b(x)=1 for x>n. "Time spent above n ms" uses b(x)=x for x>n instead. Ultimately, the best results would come from using a b(x) that was derived from careful perceptual testing. However, we don't need any testing to know that it definitely should be monotonically increasing, strictly increasing once it's >0, and continuous. It should have a more than linear growth rate i.e. b(x+y) > b(x)+b(y) for all x,y>0- it's always better to output more than one frame in a given amount of time- and that would automatically take care of problems comparing cards that produced different numbers of frames in their test periods.

            • ImSpartacus
            • 8 years ago

            Are you like a Stats student or something? I’m an Actuarial undergrad, so I’m learning basic probability in my third year, but I’ve never heard of a kernel density estimator (or anything practical for that matter…).

            • jensend
            • 8 years ago

            I’m in the early stages of a math grad program, trying to figure out what kind of direction I want to go. I’ve had a couple grad-level measure theory based probability theory classes, and I learned some more practical stuff incl. basic Bayesian analysis, the Metropolis and Gibbs sampling algorithms, etc in an undergrad research project group. While in that group they had us do a fair bit of stuff in Matlab, and so I was using Matlab’s ksdensity function for about a month before I wondered “what is the magic behind that, anyway?” and looked up more information about kernel density estimators on the web.

            • jensend
            • 8 years ago

            BTW the simplest b(x) which would fulfill the above requirements would just be b(x)=x^2. So just square the frame times, add them all up, and divide by the number of frames produced. This should produce a better metric for comparing cards than anything we’ve used so far.

            Some others (esp. Bensam123) have suggested using the variance, which is similar (squaring the difference between the frame time and the mean rather than the frame time). While that would be a useful statistic, especially if it came with a histogram/KDE so we could see other characteristics of the distribution, it’s like having b(x) actually decrease until it’s zero at the mean frame time, which is clearly not going to give us the kind of single metric I’m talking about. Frame times x ms above the mean are much much worse than frame times x ms below the mean, and frame distributions are all skewed differently.

          • Bensam123
          • 8 years ago

          I second this.

          I talked quite a bit about using distributions and reporting the variance when frame time was first introduced, but I think the overall goal of the whole ‘frame time’ thing is to generate buzz. A lot of what Scott is doing has already been done in statistics as Jensend pointed out and you are too.

      • ImSpartacus
      • 8 years ago

      You took the words out of my Probability-failing mouth. And by ‘fail,’ I mean a C+ and by mouth, I mean my keyboard.

      But seriously, the entire function would be very nice.

      • cynan
      • 8 years ago

      Theoretically, you are correct, it makes much more sense to just plot the full quantile function plot on some continuous scale (rather than just at arbitrary percentage points). However, i think that such plots would be rather informationally dense for the average reader to digest.

      I think in order for such plots to be meaningful to a wide audience, there would have to be some sort of validation testing to determine the proportion of frames (not) rendered within a given unit of time that can be detectible (to most people). To do this, the x-axis scale might reflect an ordinal scale comprised of the proportion of people that notice delayed or dropped frames within a given unit of time. If you were plotting a single GPU at a time, you could include confidence intervals to represent the variance of this metric over a number of repeated trials using the same GPU and configuration. But this would require a lot more resources than tech journalism sites could muster.

      In summary, I think what Scott has presented is an improvement over the status quo. While what you suggest is theoretically superior, I don’t know If it will equip the average reader any better given the overall arbitrariness of these metrics to begin with without some sore of validated metric that actually places these sorts of measures within context that relates the number of dropped frames that is actually perceptible (to a given proportion of people).

        • jensend
        • 8 years ago

        Have a look at his revised plot which shows a continuous quantile function. I really don’t think it’s too dense for people– indeed, I think the continuous curve is quite obviously easier, not harder, to interpret than the version with seven data points, an abruptly and arbitrarily changing scale, and confusing (non-informative and slightly misleading) straight lines between the data points.

        People usually get overwhelmed when information is badly presented, when too much of the graphical detail doesn’t connect to meaningful information, etc. Much more common than “information overload” is an overload of [i<]non-information[/i<], of things that don't convey any data, present irrelevant data, or misrepresent the important data. However, I totally agree that we haven't hit on a perfect metric yet and that we need to pay attention to perceptual issues. Did you read my above comment (reply to myself in replying to JustAnEngineer) about metrics and a realistic "badness function"?

          • cynan
          • 8 years ago

          I saw the revised plot just after posting. It is an improvement (though in Scott’s particular example I would choose a scale that better depicts the variation between 95 and 100% of frames rendered as this is where all the variation occurs). This metric is better than anything I’ve seen or could think of to compare graphics cards’ ability to render fluid motion. But this comparison is only relative, and without further validation, largely esoteric. I don’t know what Skyrim rendered with a 20 ms per frame peak rate and 1.5% dropped frames versus a 15 ms per frame peak rate and 2% dropped frames looks like. Do you?

          I’ll admit that I did not have a good grasp of what you meant when I first skimmed your post about your idea for a “badness” function. However, in order to validly (accurately within your target audience) relate these sorts of metrics to human perception, I don’t see how you could avoid the need for repeated trials within a large enough sample to first estimate when people can perceive disruption in fluid motion and estimate the amount of variation or confidence about these estimates. The reason for this is because it is unlikely that the human perception threshold of non-fluid 3D rendered motion is attributable to a single physical quantity (ie, the number of frames per second the average human eye can perceive, even adjusted for age or visual acuity). Furthermore, one rendered scene on a given platform may look perfectly fluid with a 15 ms per frame peak rate and 2% dropped frames, while another scene may not due to differences in motion blur or brightness/darkness contrast between frames…

          Hence, you would need to decide on a gold standard rendered sequence and play it to a large enough sample (probably with repetition among subjects to account for intra-individual variation) at a series of peak frame times per ms and percentages of frames dropped in order to actually determine what is meaningful to most people. Once you have these data, you would then pick or derive the best function that would best describe the relationship between these quantities. But as I said in my previous meandering post, I don’t think is actually feasible in the realm of tech journalism, though it may make for and interesting experiment/analyses.

            • jensend
            • 8 years ago

            On the scale question, see my reply to Damage’s post, where I propose a log scale and show what the basic graphical effect would be.

            I don’t know what you mean by “dropped frames.” This isn’t like streaming media over an unreliable connection. It’s not as though the game engine says “well, it’s been 24ms and that frame hasn’t finished, let’s just trash it and move on to the next one” either.

            There’s a good reason you “don’t see how (I) could avoid the need for repeated trials within a large enough sample” of observers rating the perceived smoothness/responsiveness of gameplay- I specifically acknowledged that using such testing to help establish an empirically-based “badness function” would be necessary to get the best results. Perceived quality is ultimately the only thing that counts; rigorous and conclusive blind tests are extremely costly and time-consuming, so to make it possible to give meaningful and timely hardware reviews without employing huge armies of full-time testers we need metrics that correlate well with perceptual testing.

            You say that you don’t think such an empirically-based metric is feasible in tech journalism. I heartily disagree. Many other areas where technology interfaces with human perception have seen tons of effort go into developing metrics and improving their correlation with perceptual testing. See for instance PEAQ (general audio quality), PESQ (speech quality), SSIM and PEVQ (video quality), etc. In fact, I find it startling that an effort like that hasn’t already been made for gameplay/variable framerate 3d.

            Nevertheless, you don’t need absolutely perfect correlation with perceptual testing to have a useful metric– or a metric that improves dramatically on the somewhat-useful ones in current use. I really think the simple b(x)=x^2 is a solid improvement on what has been tried so far.

            Although a perfect metric might need to take many other factors outside of the frame time distribution into account (you mentioned motion blur and lighting), I think a very high correlation with perceived quality can be achieved just looking at the distribution. If we find a need to go further than that my guess is that the next step would be to try to figure how different frame time sequences/orderings affect the perceived quality (e.g. how does the impact of a cluster of high frame times compare to having one long frame time every five seconds). Motion blur effects and lighting will impact all cards somewhat similarly so I don’t think they’re at the top of the list of factors.

    • mcnabney
    • 8 years ago

    Nice chart. Since you identified the problem with the 560ti as a RAM capacity issue, could you identify what the resolution and settings being tested were?

    • Peldor
    • 8 years ago

    This level of detail, while interesting in some respects, has jack and squat to do with the overall picture of “Which card is better and which card am I going to buy” (or at least drool over).

    For an optimization guide for a single game, this might potentially be useful (if you are willing to make several dozen runs under various conditions). For an overall comparison on cards across games, I think you are creating very large datasets that provide little practical advantage over the average and minimum (or 99th percentile) framerate. There’s only one place those lines cross in any way that might be considered meaningful, and then only with a +5 girdle of pedantry.

    Put another way, if you have to dig this hard to find a difference, the difference is not actually meaningful.

      • mattthemuppet
      • 8 years ago

      I don’t know why you got down voted for this. Sure, more information is nice, but the original point of showing frame render times was to illustrate which cards produced noticeable drops in rendering “smoothness”. If this new way of presenting data further helps differentiate between cards in terms of visible/ noticeable slowdowns, that’s great. Otherwise, it’s just noise.

    • Stargazer
    • 8 years ago

    I think I like this graph.

    It might be nice if you had smaller granularity though. (oddly enough, it *looks* like many of the curves are “only” using values for 50%, 75%, 95%, 99% and 99.5%. Just the way things work out I guess…)

    Since you have the raw data, wouldn’t it be possible to automatically generate continuous values instead of limiting yourself to a certain number of points (or at least use a much finer distribution)?

    • BobbinThreadbare
    • 8 years ago

    This seems perfect.

    Once again the solution is more graphs 🙂

    • jjj
    • 8 years ago

    I’m so not gonna have time to study every graph….

      • derFunkenstein
      • 8 years ago

      yes they’re giving too much info. How dare they! I mean, look at this big pile of data! Why can’t they do my thinking for me?

      Or just check out the graphs that you find pertinent.

        • Farting Bob
        • 8 years ago

        They should just give everything a rating out of 10 (starting at 7, naturally).

        [quote<]So here it is you guys, our review for the nvidia 680: 9/10. Buy.[/quote<]

          • Stargazer
          • 8 years ago

          I think it should go to 11.

          • dpaus
          • 8 years ago

          In some cases, it could be simplified further:

          “Our review of New iPad: Rating: irrelevant, it’s an Apple product; buy it or be scorned by hipsters.”

      • marraco
      • 8 years ago

      That’s why I say that charts should show sorted instantaneous FPS instead of percentiles of latency. Migrating readers should not be confused and scared.

    • ste_mark
    • 8 years ago

    I always think that better performance should be on top. It is more intuitive. If you translate the peak frame times into equivalent fps the best cards go on top…

      • cobalt
      • 8 years ago

      Agreed.

      Furthermore, I know intuitively what it means when you get a framerate dip into the 13 FPS range, but I don’t intuitively know what it means when you get a frame latency spike into the 70s. I have to pull out a calculator and actually do the division (1000/latency) to change the numbers on the chart into a number I can wrap my head around.

        • EtherealN
        • 8 years ago

        While I agree that it is (currently) a bit unwieldy to wrap my head around the frametimes, I don’t feel there’s anything intrinsic to the metric that makes it hard to understand at the intuitive level – it’s only the fact that it’s a new way of doing things.

        Compare to the whole mph/kmh thing for automobiles, or (for that matter) knots/kmh in aircraft. IRL I get confused whenever I have to deal with something that isn’t kmh, be it in an automobile or an aircraft – but that doesn’t mean mph or knots or feet/minute etcetera are bad or confusing intrinsically, it just means I’m not used to them. If I were to move to the US and live there for a couple years I’m sure it would “fix itself”, so to speak. Same goes for the frametimes vs FPS thing, except that here we can distinctly show a specific advantage to using one over the other.

        Ensuring that we don’t jump around too abruptly between “higher is better” and “lower is better” graphs definitely has some merit though, but I’m not sure how I’d go about doing that.

          • cobalt
          • 8 years ago

          [quote<]I don't feel there's anything intrinsic to the metric that makes it hard to understand at the intuitive level - it's only the fact that it's a new way of doing things.[/quote<] Of course! I totally agree: I could, eventually, get used to frame times instead of frame rates. At least for the kilometers/miles thing, there's some justification (namely the metric system) for why one a better metric. In the case of frame rates versus frame times, though, everyone generally uses frame rates, including other charts here at TR, so this is one of the few -- or only -- times we see frame times used. That makes it just a bit harder to internalize.

            • EtherealN
            • 8 years ago

            True, but I wouldn’t mind seeing frametimes become the norm.

            What I mean is the difference when comparing the FPS/Frametime thing with mph/kmh is that in the latter, there is no intrinsic advantage to using either of them. The metric system is of course superior in scientific applications, but as a “regular joe” there’s no difference. But switching from FPS to Frametimes does actually give new and valuable information. That’s basically what I was going at. 🙂

            I think the idea someone put up of having a simpler marking of FPS equivalents is a good idea though, at least until Frametimes become standard. For example, the graph up in the original piece here could have the frametime bars at the left, and on the right side where there is nothing they could put in the FPS equivalents. Might work I think, and does so without cluttering the graph itself too much.

            • jensend
            • 8 years ago

            But there *is* plenty of justification here. I’m just going to quote myself here: since what we really care about for game performance is whether frames are rendered quickly enough to give satisfactory reaction times etc, using frames per second is completely misleading. We need the inverse measure.

            Another example where the same “inverted measure” thing happens is fuel consumption: we keep talking about miles per gallon, but what we primarily care about is the fuel consumed in our driving, not the driving we can do on a given amount of fuel, so this is misleading. To use wikipedia’s example, people would be surprised to realize that the 4mpg move from 15mpg to 19mpg (saving 1.4 gallons per 100 miles) has [i<]more than twice the environmental and economic impact[/i<] of the 10mpg move from 34mpg to 44mpg (saving 2/3 of a gallon per 100 miles). In metric nations they've made this adjustment: they use liters per km instead of km/L. Similarly, moving from 24 fps to 32 fps has a considerably bigger impact on the illusion of motion, fluidity, and response times than moving from 40 fps to 60 fps or from 60 fps to 120fps (10.4 ms difference vs 8.3 ms difference in frame times). Seeing the first as an 8fps difference and the latter as 20 and 60 fps differences is completely misleading.

            • cobalt
            • 8 years ago

            I like the example of why L/100km is better than MPG (or km/L). Obviously I’ve not visited that wikipedia page, but it’s pretty convincing. However, the reason it works there is because of the assumption that you generally drive a fixed distance, and by having that in the denominator, a direct subtraction (of how many liters of gas you use for a given instance) gives you a meaningful result.

            I’m not sure it’s quite as applicable here, because many thresholds (like the oft-quoted 30FPS and 60FPS numbers, for example, as well as a monitor’s refresh rate if vsync is enabled) are fixed values. For example, I often hear from reviewers that going from 20FPS to 30FPS (a 16.7ms latency difference) is much more important than going from 30FPS to 60FPS (also a16.7ms latency difference).

            Having said that, I’ll repeat again: I’m fine with using frame times instead of frame rates. And you’ve actually done a good job convincing me we should.

            However, all the arguments I’ve heard that this the one particular type of measurement is somehow special have been specious. This particular measurement is not different than other measurements; just because we use percentiles other than 50% (the median frame rate, or an average frame rate) and we use the 99.5th percentile instead of the 100th percentile (the min frame rate) doesn’t make this the one place deserving of the exact opposite metric than every other chart in the reviews.

            In other words, if we’re going to agree that frame times are better, let’s go ahead and use them throughout!

      • superjawes
      • 8 years ago

      You have a point…you know…the conversion is easy. Just do a 1/(Frame Time) and you will get frames per second…I would have to get into Excel and make sure that it didn’t do anything wonky with that, but assuming it goes smoothly, you could directly extract that information from frame times.

      • Palek
      • 8 years ago

      [quote<]I always think that better performance should be on top. It is more intuitive. If you translate the peak frame times into equivalent fps the best cards go on top...[/quote<] No need to translate anything. Just flip the Y axis upside down.

    • guruMarkB
    • 8 years ago

    Graphing the performance this way does make it easier to see which GPU gets bogged down at HD resolution. I understand that time limitations prevent you from showing more GPUs but you have chosen a 5870, 6970,7870, and 7970 to test the Radeon performance in Skyrim. Since the 6870 is outperforming the 7770 (shame on you AMD for crippling it with a 128-bit memory interface) and is in the sweet spot in the $160 – $200 price range for mid-level performance, it would be nice to have it represented. For example, I own an MSI Hawk R6870 and at a 930 Mhz GPU clock speed it is a fantastic choice to represent the 6870 GPU. You probably will need to use the MSI Twin Frozr II to have a stock 6870 clock (or you can throttle the Hawk down to match baseline GPU speed). Your performance testing methodology and results were a major factor in deciding which GPU was to replace my aging 5770 so thank you very much for all the hard work. Just remember to get enough sunlight daily so the hardware testing doesn’t drive you to insanity.

    • xeridea
    • 8 years ago

    I Heart.

    • ALiLPinkMonster
    • 8 years ago

    Scott, you’re a damn prophet in the world of PC hardware reviews.

    • Farting Bob
    • 8 years ago

    I very much like it. Also related today i preordered a 7850, and Skyrim is coming in the post (yes i didnt jump on the bandwagon because my current GPU would be a slideshow at any decent res).

    In situations where all the cards are very similar (like if you took out the 560 here), you could zoom in more on the 99%+ points to really see the difference (so maybe start the chart at 95% rather than 50).

    • TravelMug
    • 8 years ago

    That’s pretty sweet. Are these graphs always going to start from 50% for the sake of consistency?

    • sweatshopking
    • 8 years ago

    It’s a solid improvement. Implement it, and id second flips request about fps as well

    • Silus
    • 8 years ago

    Tasty review being the GTX 680 (and probably GTX 670).

    • flip-mode
    • 8 years ago

    That’s excellent!

    I’d still ask you to consider putting the corresponding frame rate in parenthesis next to the frame time on the graph labels.

    And I also want to point out that the thickness of the lines that you used for the graph legend above is ONE MILLION times better than the extremely thin lines that are typically used in your graph legends. Sometimes I just cannot make out which color is which in the legend because the lines are so thin and they are probably only using one sub-pixel on my monitor.

      • dpaus
      • 8 years ago

      Who down-voted you for that?!?

      Fixed.

      • SomeOtherGeek
      • 8 years ago

      I totally concur on both points.

      Scott and Team, keep up the awesome work!

      • Alchemist07
      • 8 years ago

      Yeah, simple enough.

      1000/frame time.

      i.e. 1000/20 = 50 fps.

      Of course this assumes that you maintain than frame time for each second….what would that be.. frame per second per second lol (50 fps/s or fpss) could add another Y-axis on the right for it

        • derFunkenstein
        • 8 years ago

        For that last paragraph: it would not be FPS/S (which I assume is why someone downvoted you). It would be FPS average over the course of time X.

          • Alchemist07
          • 8 years ago

          If you divide 1000/frame time, that is frames per second, per second.

          That wouldn’t change regardless of time. For example 2 seconds would be 2000/20 = 100 frames per 2 seconds = 50 frames per second, per second……

          maybe i’m being dense but that makes sense to me…

            • Alchemist07
            • 8 years ago

            why the down votes lol?

            • flip-mode
            • 8 years ago

            Dunno. While mathematically I think you are correct, I think FPS/S doesn’t help conceptualize the metric in a human way. I think it’s more helpful to conceptualize it with the term “instantaneous FPS” which basically means that if all frames took the exact amount of time to render as the frame in question, then there would be X many FPS rendered.

            • jensend
            • 8 years ago

            No, he’s absolutely not correct. Inverting ms/frame gives you frames/ms. Multiplying by 1000 gives you frames per second. The question about what kind of granularity you’re measuring this with has nothing to do with what units you’re measuring.

            Whether you measure 3456000 frames in a day or one frame in 25ms you’re still measuring 40fps; it’s just that the granularity of your measurement is different.

            Whether you find ten people in one acre or 64,000 people in ten square miles you’re measuring 6400 people per square mile either way. You’re not measuring “people per square mile per acre.”

          • marraco
          • 8 years ago

          I would call it “instant FPS”. A mathematician would die in anger, ranting about Heinsenberg, but it is an intuitive simple concept, and nerdys going soo deep don’t need a world simplified for them.

          The common man understands that an instantaneous 20 FPS means an annoying temporary freeze. The other people don’t need personal care.

            • jensend
            • 8 years ago

            Resident mathematician here. It is definitely FPS, and while calling it “instantaneous FPS” isn’t “exactly correct” it’s perfectly fine- nobody’s going to be mislead by that and it conveys what you mean.

            The reason it isn’t “exactly correct” has nothing to do with Heisenberg. It’s just that “instantaneous FPS” should mean the derivative of the function f(t)=number of frames produced before time t, and since the number of frames jumps from one natural number to the next (0,1,2… we can’t have produced pi frames or even 3.5 frames, we’ve either produced 3 or 4) it’s not continuous, therefore not differentiable.

            A derivative is the limit of difference quotients as they become infinitely fine-grained, and when you invert each frame’s frame time that’s the finest-grained difference quotient that makes any sense for our discontinuous function, so “instantaneous fps” isn’t misleading.

            (BTW our time axis is also only discrete times rather than a continuous variable- the frame times we measure come from system timers with a particular precision limit, probably around a ten-thousandth of a millisecond, which is many orders of magnitude above the point where we would start worrying about Heisenberg.)

        • stdRaichu
        • 8 years ago

        In short, no.

        Just because something is generating 1000fps doesn’t necessarily make it appear smooth to the eye; if a graphics card renders 100 frames in the first tenth of a second but then locks up for 0.9s on frame 101, it’ll appear as a stutter to the user but the average will still keep reporting 100fps. Prospective customers see “Wahey! 100fps in FPS Du Jour” and equate that to “smooth as warm butter from a duck’s arse” but if every Nth frame takes over 1/100th of a second to render, then it isn’t *really* 100fps.

        100fps is one thing, 100fps with each frame being rendered in 1/100th of a second is another. TR’s methodology is that merely listing average frame rates isn’t telling the whole story, and frame rates over time is a more realistic (and critical) benchmark. Listing games/cards/drivers where the time to render a frame frequently exceeds a certain value is a) a great guide for users and b) more of an impetus to get devs/card manufacturers to fix their gubbins.

          • Alchemist07
          • 8 years ago

          Ugh…I thought the idea was to give the user an IDEA of what a 20 ms frame time means in terms of fps.

          converting 20 ms to fps, will do that.

          Of course if you have bumps along the way, that will create lag in the video and that is why we look at the frame times, which don’t gloss over the “bumps”.

          For example its useful to know that if you average a frame time of 50 ms, that is equal to about 20 fps – that might be smooth but it would be like playing in slow motion.

          (1000/50 = 20 fps)

          • MrJP
          • 8 years ago

          No-one’s talking about average frame rates. Everyone agrees that that’s a simplistic measure that smooths out the improtant variations.

          What flip-mode is suggesting (and others have tried to suggest in previous article comments without much success) is that the frame times should also be reported as equivalent instantaneous frame rates since that is an easier measure to understand. There is no averaging required over any period of time, in exactly the same way that your car tells you instantaneous mph without having to average over an hour.

          In your example, your notional frame 101 has an instantaneous fps of 1.1, which clearly is bad.

      • tfp
      • 8 years ago

      Agreed

      • helix
      • 8 years ago

      Or just have a dotted reference line marked TFT60 at 1000/60Hz = 16.7ms.
      Perhaps TFT120 at 8.3ms and Cinema24 at 41.5ms as well.
      At least for a transition period so we get a chance to wrap our heads around the new way.
      I really appreciate that TR is on the frontline of this.
      Having your results readily comparable to those of other review sites is important for transparency. You want a balance between promoting this clearly superior way of measuring game performance and being readily comparable to the scales used by others.

      What this graph does not capture is the frequency of “occational lockups”. Even 1000 fluid frames does not make a good experience if there is one with a 500ms lockup in between them.

        • derFunkenstein
        • 8 years ago

        assuming they’re keeping the histogram that has been around since the beginning, you’ll see the “occasional lockups” covered as well. If they’re dumping that in favor of this, then you’re right.

        • Stargazer
        • 8 years ago

        Lines at 60, 30 and 20 FPS might be nice.

        If they’d go to the 99.9% percentile, you’d be able to pick out one “bad” frame amongst a thousand. The problem is that as you go to higher percentiles, the “worst” values will often end up being “large” compared to the rest of the values, and this will mess up the scale.

        edit: ok, skip 20.

          • flip-mode
          • 8 years ago

          I guess lines would be OK but it seems more straightforward and less subjective to just make the graph labels like so:

          ms(fps)
          40 (25)
          30 (33)
          20 (50)

          [spoiler<]did I do those conversions correctly?[/spoiler<]

            • superjawes
            • 8 years ago

            [spoiler<]Yes, you got those converstions correct [/spoiler<]

            • Stargazer
            • 8 years ago

            Some frame rates are more interesting than others though, and the two arguably most interesting ones (60 and 30) are not equivalent to nice and even millisecond values.

          • Vulk
          • 8 years ago

          This! I love the line idea.

          • marraco
          • 8 years ago

          I think that the chart should show instantaneous FPS, and latency between parentheses.

        • marraco
        • 8 years ago

        “Having your results readily comparable to those of other review sites is important for transparency”

        That’s very important. Readers need to relate this advanced methodology to other web sites information, so they can understand this site right away.

      • marraco
      • 8 years ago

      That’s a great idea. It should be mandatory.

      Maybe looking like this

      [url<]http://i44.tinypic.com/339ttox.png[/url<] It would zoom on the worst instantaneous FPS.

      • willmore
      • 8 years ago

      Thick lines help a bit, but for colorblind folk like me, I still can’t reliably tell them apart. I keep hoping that I can mouse over the line and have the associated legend entry get highlighted–and vise versa. That may be asking too much.

Pin It on Pinterest

Share This