AMD shows off DirectX 12 performance with new 3DMark benchmark

Low-overhead graphics APIs like Vulkan (née Mantle) and DirectX 12 are probably the biggest news in game programming in some time, and 3DMark producer Futuremark has developed a benchmarking routine to demonstrate the potential performance gains from these new APIs: the 3DMark API Overhead benchmark. AMD has put the new benchmark to the test, and the company shared its results in a blog post today.

Before we discuss AMD's numbers, let's have a look at how Futuremark summarizes the API Overhead benchmark:

Games make thousands of draw calls per frame, but each one creates performance-limiting overhead for the CPU. APIs with less overhead can handle more draw calls and produce richer visuals. The 3DMark API Overhead feature test is the world's first independent test for comparing the performance of DirectX 12, Mantle, and DirectX 11.

Now, on to AMD's results. These graphs are obviously designed to present AMD's products in a favorable light, but they're still pretty interesting.

The first test result shows raw draw call throughput on the R9 290X and R7 260X. With DirectX 12, the R9 290X handles about 16 times as many draw calls as it does under DirectX 11, while the R7 260X turns in about 9.5 times as many.

DirectX 12 also has implications for performance scaling on multi-threaded CPUs. On an eight-core AMD FX-8350 CPU, the number of draw calls that can be issued under DirectX 12 scales with the number of cores on tap (up to six cores), while DirectX 11 doesn't take advantage of more than two cores. AMD says DirectX 12's multi-threaded command buffer recording capability is to thank for this performance scaling on the CPU.

Some of the primary goals of low-overhead APIs like DirectX 12 are to increase graphics processing while reducing CPU overhead, so it's nice to see those expectations play out in synthetic benchmarks. It remains to be seen how these benchmark numbers will translate into real-world performance gains, however.

Comments closed
    • ptsant
    • 4 years ago

    [url<]https://pbs.twimg.com/media/CBBu9COWwAAPzZB.jpg:large[/url<]

      • Ninjitsu
      • 4 years ago

      Are lists and buffers the same thing?

    • BobbinThreadbare
    • 4 years ago

    How does this compare with the number of drawcalls AMD can push with Mantle?

      • Ninjitsu
      • 4 years ago

      Read the articles on PCper and AnandTech. Basically, Mantle does better with more than 4 cores, otherwise DX12 is better.

      EDIT: For now, of course. Mantle’s development isn’t going to change, nor are Mantle drivers I suppose. DX12 drivers will only improve, and then there’s Vulkan too, which is based off Mantle but will still be different enough in the end.

        • BobbinThreadbare
        • 4 years ago

        Thanks for the response.

        I don’t see why mantle and mantle drivers wouldn’t improve unless you mean it will just be replaced with Vulkan. In which case that’s more a measure of semantics than anything.

          • Ninjitsu
          • 4 years ago

          I mean that Mantle as an AMD branded API won’t be developed much further, and I doubt Mantle drivers will either (hopefully game bugs will be fixed by devs).

          And no, Khronos will develop Vulkan, which will run on Nvidia, AMD, Intel, ARM, PowerVR and Qualcomm hardware with inputs from everyone – definitely not just a different name to the same API.

          I think the whole “Vulkan is Mantle” thing needs to calm down a bit going forwards.

          • nanoflower
          • 4 years ago

          Well, AMD has already had a long time to work on improving the Mantle drivers so even if they were to keep working on them I wouldn’t expect any significant improvements. The same isn’t true for DX12 drivers since DX12 is still being worked on, as are the drivers. I wouldn’t be shocked at big improvements in both AMD and Nvidia’s DX12 performance before the final release of Windows 10 and DX12.

    • maxxcool
    • 4 years ago

    DAMAGE JEFF! pay the 25$ .. (i’m too cheap) love to see a throw down!

      • Ninjitsu
      • 4 years ago

      TR already has 3DMark (I remember seeing FireStike results some time earlier), probably have something in the works…

        • nanoflower
        • 4 years ago

        Given how they typically do things I expect a more thorough coverage of possible results (probably I7/I5 AMD/Nvidia and possibly even I3/G3258 and APU/IGP results) to give a clearer picture of how the results scale from The Tech Report in the coming weeks. It all depends on what else is already planned out.

    • Glix
    • 4 years ago

    I’d be interested in knowing what was the draw calls for DX9 and 10. Of course that would only raise more questions if the results varied rather than showed a gain with each revision.

      • LostCat
      • 4 years ago

      They might add that in a later update. I don’t imagine it was a priority.

      • Klimax
      • 4 years ago

      DX 10 might be quite close, but DX 9 will have giant overhead because it was managing both programmable and fixed pipeline GPUs, which requires fairly massive code. (Only barebones multithreading and various state management code + strongly feature segmented)

    • Ninjitsu
    • 4 years ago

    My results:
    [b<]Core 2 Quad Q8400[/b<] @ 2.76 GHz GTX 560 @ stock clocks, driver v347.88 Windows 7 x64 [u<]720p[/u<] DX11 Single Thread: [b<]739,554[/b<] draw calls per second DX11 Multi Thread: [b<]1,000,657[/b<] dc/s [u<]1080p[/u<] DX11 ST: [b<]741,370[/b<] dc/s DX11 MT: [b<]989,915[/b<] dc/s

      • chuckula
      • 4 years ago

      You are pushing almost a million draw calls a second with obsolete hardware at 1080p?

        • Ninjitsu
        • 4 years ago

        Yes, thanks to Nvidia drivers. πŸ˜€

        (Seriously, I was reading AnandTech’s article, they’re getting only 1.1M dc/s with a 290X and a 4960X in DX11 MT. lol)

        EDIT: it was a 4960X not a 5960X.

          • xeridea
          • 4 years ago

          Sweet you can stick with DX11 clinging to the Nvidia driver gods, the rest of us can enjoy smoother gameplay.

            • Ninjitsu
            • 4 years ago

            Wot? Where did this come from?

            • xeridea
            • 4 years ago

            Many people don’t really feel the next gen APIs are needed, and they should just use Nvidia since they have more optimized but still far from perfect drivers. It is irritating. Perhaps I was a bit harsh though.

            • Ninjitsu
            • 4 years ago

            That wasn’t even my point. But the main reason I’m seeing a million draw calls on DX11 is because of Nvidia’s driver optimisations. That’s all I’m saying.

            I’d love to get a few million more draw calls per second more with DX12/Vulkan, but my existing games will need to run well too.

            • nanoflower
            • 4 years ago

            Which is why people talk about sticking with Nvidia. That way they get great DX11 support AND will get great DX12 support. Sure, they won’t get all of the DX12 goodies supported in hardware but neither will Intel or AMD fans. Everyone is going to need new hardware to get full hardware support of DX 12 but I’m sure we will see drivers from all three that do a good job of supporting DX12 features with their existing hardware.

            • Klimax
            • 4 years ago

            Well, it means far smaller workload on programmers. And more future proof.

            And it shows that with proper drivers, there is no real need to eliminate abstraction. (Since appearance of DirectX nothing changed to abolish need for it.)

        • BobbinThreadbare
        • 4 years ago

        Doesn’t the chart above show scaling from ~1 million DC/s to ~15 million?

        Why are we surprised his numbers match what AMD is reporting.

      • maxxcool
      • 4 years ago

      GRRR requires pay\advanced or professional copy …

        • Ninjitsu
        • 4 years ago

        It was only $12 or so…

          • maxxcool
          • 4 years ago

          ? web pages lists 25$ ? is it on special somewhere else ? i’ll pay 12$ and post scores in a hour ..

            • Ninjitsu
            • 4 years ago

            Steam sales πŸ˜›

            • maxxcool
            • 4 years ago

            missed it by 12 hours … πŸ™

      • Klimax
      • 4 years ago

      Core i7-5960x 3,9GHz (OC) 2,2m dc/s
      [url<]http://www.3dmark.com/3dm/6399839[/url<]

      • Lans
      • 4 years ago

      Not sure what you are trying to say with these results.

      What I can see is AMD’s DX11 driver is probably single threaded (2 core DX11 result looks like about 500k dc/s vs. your DX11 MT result). Which is still less than your DX11 ST result.

      Assuming you are really just using 2 threads/cores and estimating the FX-8350 is 1.91x faster than your Q8400 (stock)… 6m/1.91 dc/s is still a little more than 3x your ~1m dc/s result @ 720p. That is ignoring the scaling with up to 6 cores.

      So that I say that is very welcomed news. Maybe Nvidia do better but derFunkenstein indicates PCPer is showing the two camps about same now. Maybe Nvidia hasn’t been able optimize the heck out of their drivers for DX12 and only time will tell…

      EDIT: reference for the 1.91x CPU pref diff: [url<]http://www.cpu-world.com/Compare/563/AMD_FX-Series_FX-8350_vs_Intel_Core_2_Quad_Q8400.html[/url<]

    • Klimax
    • 4 years ago

    Looks like it’s time I should bring out heavy duty tools and check their DX 11 code path. Since StarSwarm I don’t trust anybody. Not yet.

    And I am curious how things will look when Pascal and later archs will come. Which code path will handle things better – DX 11 or 12? (Reminder: Cost of low API is severely increased inflexibility)

      • xeridea
      • 4 years ago

      Low overhead API has significantly more flexibility because you aren’t limited by abstraction nearly as much.

        • Klimax
        • 4 years ago

        Flexibility for current arch, which will be immediate loss on introduction of new arch or alternative. Abstraction however allows driver to make adaption and thus get best out of HW without game engine having to know a thing about it.

        So called “limitations are in reality no limitations. They are necessary to allow different implementations. Remove abstraction and massive homogenization occurs (one of two possibilities). Almost by definition. And likely weaker players will lose completely…

        People do not apparently realize where this thing is going and how bad it can be. Either repeat of 90s or Intel-like dominance of single company. Nearly no other option exist as graphics as such nature in parallelization that OOO-like technology which can workaround inefficiency of instruction tram can’t really exist. (too big)

        tl.dr. Low level APIs are good only and only on consoles (fixed HW), never on computers. Exposes too much of HW implementation.

        ETA2: Note: Problem is not in low overhead APIs themselves, but that new APIs are not low overhead, but outright low-level APIs. And there is giant problem.

      • Klimax
      • 4 years ago

      Preliminary results: Although don’t have some basic statistics for full assessment, they might have exposed some anomalies in DX11 runtime. (Unusually too much time in finalizer of a Batch)

    • Jigar
    • 4 years ago

    Need to analys the results, if there is a gain by shifting to HT on CPU than i would sell my i5 4670K and jump on i7 4790K …

    • TopHatKiller
    • 4 years ago

    Ignored naturally from all comments: thread contention on AMD 15h appears to have more or less vanished. If they could have produced mantle & their co-dependents of dx12 and vulcan at the beginning…. And now of course AMD moves away from that architecture to Zen/Xen & K12. Typical brilliant AMD timing their. Waiting for everyone to minus me like’s its going out of fashion, but truth hurts.

      • LostCat
      • 4 years ago

      I only minused you because you asked so nicely.

    • tks
    • 4 years ago
    • vargis14
    • 4 years ago

    I hate that PCper does not show Intel 4c with HT and a direct comparison against AMD if they could that is.

    • p3ngwin
    • 4 years ago

    “On an eight-core AMD FX-8350 CPU, the number of draw calls that can be issued under DirectX 12 scales with the number of cores on tap (up to six cores)”

    so which is it, does it “scale with available cores on tap on an 8-core CPU” or does it only “scale to 6 cores” ?

      • BaronMatrix
      • 4 years ago

      Perhaps MS has a finite limit to lower overhead…

      • xeridea
      • 4 years ago

      Perhaps it scales to 6, but with the significant overhead reduction this is good enough since at this point the CPU bottleneck is basically non existant, plus extra cores can be used for non graphical things (like AI).

      • Sam125
      • 4 years ago

      If I were to guess, I’d say it’s a limit coded into the 3D Mark test, rather than a limitation of Dx12 itself.

      Either that or the scheduler for whatever operating system they tested on isn’t capable of handling 8 cores at once.

      • Zizy
      • 4 years ago

      Another possibility for that limitation:
      DX 11.x was probably developed with 6 cores for games in mind (2 are reserved for OS & stuff), so DX12 inherited some limitation because of that. No hardcoded wall or anything, it just scales poorly beyond that point. There are no CPUs with >8 cores (for gaming) anyway, making this a non-issue.

        • xeridea
        • 4 years ago

        DX11 can use about 1.5 cores, and OS overhead is negligible.

    • derFunkenstein
    • 4 years ago

    PCPer apparently got an early go-around with the benchmark as well. Fortunately their findings with GeForces and Radeons were similar. Seems like the draw call thing is an issue long behind us…for now anyway.

    [url<]http://www.pcper.com/reviews/Graphics-Cards/3DMark-API-Overhead-Feature-Test-Early-DX12-Performance[/url<]

      • Ninjitsu
      • 4 years ago

      And AnandTech:
      [url<]http://www.anandtech.com/show/9112/exploring-dx12-3dmark-api-overhead-feature-test[/url<] EDIT: Wow AMD's DX11 drivers are bad. They actually have pretty good GPUs by contrast.

    • JAMF
    • 4 years ago

    Just imagine how lush a world could be drawn in a flight sim. All those objects. Trees everywhere. And with increased number of dials in a complex cockpit you would see lower frame rates. So maybe stable and smooth frame rates just on the horizon?

      • NTMBK
      • 4 years ago

      And the actual physics simulation will still run in a single thread like it has since 1995 πŸ™‚

        • derFunkenstein
        • 4 years ago

        Yeah that’s what is sad. This accelerates just a tiny piece of the puzzle. But any bottleneck erased is a good one.

          • Klimax
          • 4 years ago

          StarSwarm is almost definition of it. Such a poor multithreading of simulation itself was brutal. (Of 360s run it spent <1s in more then 4 threads at once)

        • JAMF
        • 4 years ago

        But that’s up to the devs. Take it up with them if you’ve found their flight models lacking.

        I wonder what response you’ll get from Eagle Dynamics/DCS, or the devs from Rise of Flight, Battle of Stalingrad or X-Plane.

        But then they also have to have AI running on a thread, ballistics and with physics, that’s 3 threads. Makes you wonder how they made it work on single and dual cores all that time.

      • Meadows
      • 4 years ago

      Don’t forget the rendered cockpit lock if you want to crash.

        • sweatshopking
        • 4 years ago

        Too soon.

    • tfp
    • 4 years ago

    Just imagine the numbers on a good CPU…

      • geekl33tgamer
      • 4 years ago

      Uhh, the figures from the 1st slide were done on an i7 apparently . The test rig notes are at the bottom. πŸ˜‰

        • tfp
        • 4 years ago

        You want me to click a link and leave TR? Nice try…

          • geekl33tgamer
          • 4 years ago

          Oh I forgot, AMD’s site is a phishing scam or malware. Silly me.

            • lilbuddhaman
            • 4 years ago

            I mean, with that credit rating, you should be wary.

            • geekl33tgamer
            • 4 years ago

            TouchΓ©. πŸ˜‰

      • RdVi
      • 4 years ago

      A proportionately lower increase?

      Isn’t that the point, to lessen the bottleneck at the CPU? So a faster CPU would see less of an increase.

      • WaltC
      • 4 years ago

      Yea, if Intel cpus are good for anything at all they excel at providing pleasing benchmark scores…;) And it’s quite true that if you’re willing to spend 10x as much ($1,000) you can buy an i7 xxxx cpu that’s almost 2x as fast as an FX-6xxx series cpu (~$100), I certainly won’t argue that…;)

        • travbrad
        • 4 years ago

        Yes if only Intel sold some CPUs for less than $1000 that were still fast in games. No one is using Intel CPUs for gaming these days since they are just too expensive!

        Since their $1000 CPU is called an i7-5960X maybe they could introduce some new cheaper CPUs with lower numbers on them. Maybe something like i7-4790K or i5-4690K? Maybe even some really cheap CPUs with “i3” in the name? It know it sounds crazy but it could happen one day.

          • Welch
          • 4 years ago

          Clearly a troll that likes to make up fake CPU SKUs, shame on you!

      • Klimax
      • 4 years ago

      And even better, imaging if AMD supported Driver Command Lists under DX11..

    • balanarahul
    • 4 years ago

    The multi core scaling graph implies that there is no performance gain beyond six cores. Does this mean that a 3930k/4930k/etc perform better without HyperThreading?

      • orik
      • 4 years ago

      that’s actually a really great question, i’d love to see TR do some benchmarks on this. especially with a single card vs multi-gpu setup.

      i wonder if dx12 is gonna carve out demand for a 6 core processor w/o ht from intel.

      what socket would this go on? what sort of naming convention would it go with?

      • Waco
      • 4 years ago

      If all you’re going to do is run on an AMD CPU, maybe…and only call as many draw calls as possible.

      Most games don’t do this. πŸ˜›

      • moose17145
      • 4 years ago

      would be interesting to see how it scales on (lets say…) a dual core with HT vs a quad core without. Or how a quad with HT would fair against a 6 or 8 core without HT. Like lets say a 4790K with HT enabled against a 5960X with HT disabled.

      Likewise it will be interesting to see if DX12 helps close the performance gap between Intel and AMD (at least as far as gaming goes. Obviously this does nothing to help CPU performance in non-gaming related areas).

        • tsk
        • 4 years ago

        My thoughts exactly! I’d really like to see how all those processors compare. Does a 4C/HT equal an 8C non HT?

      • NTMBK
      • 4 years ago

      Nah, you still want the spare Hyperthreads for torrenting your pr0n in the background.

      • HisDivineOrder
      • 4 years ago

      Given that the AMD “octacore” chips are in fact quadcores with a heavier emphasis on making hyperthreading performance more core-like, I’d say you shouldn’t count these as real 8-core results.

      Wait for the Intel octacore results to get a genuine CPU with 8 full cores and then we’ll know what’s what.

        • xeridea
        • 4 years ago

        Wrong, there are 8 full integer cores. The FPU is shared, but 95% of the time this doesn’t really matter in every day work or gaming. Hyper threading is completely different in that it is just feeding 1 core with 2 threads. HT gets you about 20% boost, module gets you ~95% boost.

          • Jason181
          • 4 years ago

          In everyday work, I’d agree, but gaming is definitely fpu-bound a lot of times. Note the abysmal performance of AMD in certain games.

            • xeridea
            • 4 years ago

            I would say it is more of an issue of the lower IPC of AMD CPUs rather than the shared FPU. I have only done limited OGL programming, and haven’t done any intense analysis, but there still is a lot of integer math and logic. Lower IPC is a big deal on DX11 since much of it is single threaded. DX12/OGLN/Mantle will be a huge welcome for the AMD chips with high core counts and mediocre IPC. I am just speculating though.

          • maxxcool
          • 4 years ago

          if that were true we’d see AMD cpus on the top end of the cinebench cpu chart on their webpage. we don’t .

            • BobbinThreadbare
            • 4 years ago

            That doesn’t logically follow. If each AMD core is still slow enough that 8 of them can’t overcome Intel’s IPC advantage they wouldn’t be at the top.

          • BobbinThreadbare
          • 4 years ago

          It’s not true that FPU sharing doesn’t affect gaming. Techreport showed ~10% boost in speed when the new Windows scheduler came out that was aware of of the architecture.

            • xeridea
            • 4 years ago

            Ok so… there are still enough modules to do all the FPU work so long as they are scheduled properly. And 10% is significantly lower than the 40% hit per thread for HT, so calling a module a core is wrong.

            • BobbinThreadbare
            • 4 years ago

            That’s true, but there is a performance hit with sharing an FPU, despite AMD’s claims. Not a big one, but a hit nonetheless.

            The module is a core meme won’t die though, you’re fighting a lost battle there. People are going to parrot what they parrot.

            • TO11MTM
            • 4 years ago

            The hit is there but I can’t help but wonder about the longer term decoupling implications…

            By that, I mean that they’ve got two integer cores sharing FPU Resources. That means the FPU Resources are already a little further decoupled from the rest of the pipeline (Because now they have to wait for resources to become available.)

            I think this could become more important to AMD later, if they can start to share GPU+FPU Resources by having the modules just schedule the work on the GPU Cores and potentially have overall much less redundant functionality in the design between the CPU’s FPU and what’s available via the GPU.

            That was always my imagined endgame for ‘fusion.’ You can’t help but look at AMD right now and feel bad about their margins on parts; the die size is killing them, and what I’m describing could potentially help a lot with that.

      • f0d
      • 4 years ago

      the rendering might only take advantage up 6 cores but games can already issue multiple threads with stuff other than rendering (like ai and destruction) so having more than 6 threads wont be going to waste most of the time

      • ozzuneoj
      • 4 years ago

      Its kind of strange, the graph actually shows a slight decrease from 6 to 8… why would they say this?

        • xeridea
        • 4 years ago

        testing variance.

        • wierdo
        • 4 years ago

        I would guess error margins and increased multi-threading overhead perhaps.

      • VincentHanna
      • 4 years ago

      You can get more overclocking headroom out of a 3930k if you disable HT, and yes, there is generally a law of diminishing returns for multiplying threads and cores out exponentially because the code just ends up hanging because some calculations will always be performed sequentially.

      That said, most of the time its more of a “do you need it or not” question.
      6 cores no HT at 100% vs 12 cores/HT at 70%, 12 cores win hands down.
      6 cores at 20/20/14/6/2/1 vs 12 cores/HT at 20/20/14/6/2/1/1/0/0/0/0/0 You will generally not see any performance variance between the two chips, all else being equal.

      The same logic also applies to dual cores vs quads.

Pin It on Pinterest

Share This