Nvidia touts GameWorks and performance boosts for DX12 plus FCAT VR

Hardware enthusiasts tuning into Nvidia's live-streamed GDC presentation last night were probably most interested in Jen-Hsun Huang's announcement of the almost-a-Pascal-Titan-X GeForce GTX 1080 Ti. Nvidia's announcements regarding the DirectX 12 version of the company's GameWorks software development kit and the FCAT VR performance-monitoring tool were probably more relevant to most of the show's attendees, though. The green team also promised substantial performance increases in DirectX 12 titles in upcoming GameReady driver updates, something that's sure to please developers and gamers alike.

Nvidia says it has invested over 500 engineering-years into the development of the DirectX 12 version of GameWorks. The kit includes tools for physics simulation, error report analysis, profiling, and debugging. Nvidia claims that GameWorks DX12 takes advantage of asynchronous compute, potentially helping developers take advantage of otherwise-untapped hardware resources. The company's love-it-ot-hate-it HairWorks library for hair and fur simulation has been updated for DirectX 12, too.

The best news for Nvidia graphics card owners is probably the promise for increased performance in DirectX 12 games in upcoming GameReady drivers. The green team claims its engineers have "worked with game developers to deliver performance increases of up to 16%" in many popular DirectX 12 games including Ashes of the Singularity, Gears of War 4, Hitman, Rise of the Tomb Raider, and Tom Clancy's The Division.

It's safe to say that a poor VR experience can be easily identified on feel alone. As we have discovered in the TR labs, though, precisely quantifying the goodness or badness of VR performance is far more tricky. Nvidia says its FCAT VR tools can take some of the sting out of VR performance analysis. The tool is based on Nvidia's existing FCAT frame capture analysis tool. FCAT VR can measure frame times, dropped frames, warp misses, and synthesized frames. Nvidia says the tool should be available mid-March.

Comments closed
    • Mat3
    • 3 years ago

    It’s nothing more than Nvidia’s plan to sabotage performance in DX12 titles. Just like now: get a developer to throw in some crap like hair works with wasteful, useless default settings to cause performance to plummet across all cards… but less so on the latest high end Geforces!

    • DoomGuy64
    • 3 years ago

    Calling it now. Nvidia will ruin dx12 worse than they did dx11, since AMD won’t be able to fix dx12 gameworks problems as easily as dx11. Not like they ever did a good job of it in the first place.

    If this gets too out of hand, I could see Microsoft pulling a Vista/Creative on Nvidia. Total lockout of 3rd party extensions. Normally I would consider something like this bad, but at some point the sabotaging has to be stopped. Gameworks hasn’t been a positive thing for anyone except Nvidia shareholders, as it ruins performance even on Nvidia’s own hardware, making the Titan cards the only feasible hardware available to actually run the effects.

      • gamerk2
      • 3 years ago

      Uhhh…that’s the point isn’t it? Enhanced effects require stronger GPUs. If you don’t have the power, leave them off. I fail the see the problem.

      Also, you fail to understand what MSFT did with audio drivers in Vista. All MSFT did was make Directsound a software level API that does not interact with the hardware directly. Nothing prevented Creative from re-implementing EAX effects in any other supported audio API (OpenAL or XAudio). If MSFT did that for GPUs, you’d be limited to 640×480 at best, or essentially what you get today when you don’t have a driver installed.

        • DoomGuy64
        • 3 years ago

        Are you a shareholder or gamer? If you are a gamer, do you buy Nvidia’s Titan cards?

        No? Then it affects you. Especially in games where these extensions are not well implemented, and don’t give you the option to disable or limit them. People with mid-range cards should have the option to use lower settings, and AMD users shouldn’t be forced to run games built on PhysX, or forced Gameworks effects. Since that’s not how it’s done, it effects everyone to some degree. Some games give you options, but the ones that don’t are the problem.

        Nvidia can choose to implement it’s own API as well. Nothing stopping them. The problem is that they are perverting an existing standard with performance reducing extensions. If support for those extensions is removed, Gameworks will go the way of EAX.

          • ChicagoDave
          • 3 years ago

          I’m an occasional shareholder (of both AMD and NVDA), gamer and never buy a graphics card over $350.

          I think the majority of your complaints should be directed at game studios rather than Nvidia. Gameworks is simply a middleware that allows a studio to quickly and easily add NV-specific enhancements to their game. No one is putting a gun to their head and forcing them to use Gameworks, and using Gameworks doesn’t preclude the studio from developing a separate rendering path that doesn’t utilize GW code.

          It’s up to the developer to:
          A) Make sure their game runs properly on AMD hardware
          B) Make sure their game runs properly on older NV hardware
          C) Have scalable graphics settings to dial up or down the effects so the game runs smoothly on virtually any computer

          It’s not NV’s fault that some game developers don’t code their games to run well on different hardware setups. And it’s certainly not NV’s fault if the game doesn’t have quality sliders. However I question if that is really even a problem that exists – what PC games have advanced features requiring Gameworks, but don’t have adjustable graphics settings? I mostly play games that are 5+ years old so maybe I’ve missed some new trend, but every game I play has a ton of visual settings options that can be manually changed, or automatically adjusted by GeForce Experience if you have that installed (I don’t). If you don’t want Hairworks or whatever, just turn it off. If your game cannot turn it off, then I’d take that up with the studio.

          I’m fully aware that my 1060 can’t run the latest AAA with all of the Gameworks settings maxed, and I don’t expect it to. However I also don’t think that these things should be removed just because I cannot use them. In a year or two I’ll have a card that can run today’s games with everything turned up (which is why I stick to older games on Steam sales).

            • xeridea
            • 3 years ago

            It is not possible to make game works run well on any hardware even new Nvidia hardware due to its garbage coding and brute force nature of tessellation. It being a black box doesn’t help either since it is near impossible to optimize. In general the effects look poor and fake while just killing performance. AMD alternatives, such as HairWorks run far better, even on Nvidia hardware, while also looking better. So while it is up to the developer to optimize the game it just isn’t possible to do well with GameWorks.

            • Laykun
            • 3 years ago

            I think the fear here isn’t that the added features are just there to reduce performance for increased visuals, but that a potential has been lost for those same effects to be implemented in a cross-vendor manner that will mean both AMD and nvidia users will get the same visuals with a comparable hit to performance. Now, being a developer, I know that time is in limited supply, and while it’s easy to say “oh well they should just implement two rendering pipelines” that just doesn’t work out to be the reality of the situation. If nvidia GW works out to be super easy to integrate and there is no good AMD equivalent, then you’ll start to find that game content will be designed around those GW features being in the engine, for AMD users where it won’t run optimally they will have to turn off the features and miss out on the content that’s designed to benefit from GW, where it could have been designed instead for a more cross-vendor focus set of tools. This means by the virtue of it not being cross-vendor and time and resources being limited, it starts to look like there is an unfair bias towards nvidia users in terms of features.

            AMD is already trying to tackle this problem with though by releasing a set of tools similar to GW called GPUOpen that provides similar features, whether this is actually any good or ever gets used is another question altogether.

            In reality developers should just use pre-built engines that do all this leg work for you, like Unreal Engine 4.

      • DancinJack
      • 3 years ago

      Calling it now, DoomGuy64 continually trolling anything not specifically pro-AMD.

      Surprise!

        • Pancake
        • 3 years ago

        Not sure if it’s trolling or cold-war level paranoia. He probably checks under his bed before sleeping in case Jen Hsun Huang is hiding under there.

    • jts888
    • 3 years ago

    Can somebody explain in brief how DX12 GameWorks will function?

    There’s not as big a place anymore for middleware to sit between an engine and the hardware/driver, so I’m curious how exactly Nvidia designed this.

      • PixelArmy
      • 3 years ago

      Judging by the comments here, I think there is an misconception of what this is (perhaps because middleware is not an exact term). Think of this as a library used by games and/or engines that will use DX12 API calls in lieu of DX11 (or whatever) API calls to generate effects.

      • psuedonymous
      • 3 years ago

      “Can somebody explain in brief how DX12 GameWorks will function?”

      The brief version:

      Developer wants to make a Shiny Thing. They need a Shiny Thing shader to make the Thing shiny: reflect the environment, have settings for lustre, relight the surrounding environment depending on the light falling on it, etc.
      Developer has a choice:
      1) Write a brand new Shiny Thing shader, and then do the work of optimising that shader for each specific GPU architecture (and DX12 here makes that [b<]harder[/b<] because you no longer have driver-level optimisation to rely on, all the work that was done there by the vendor before is now [b<]your[/b<] job) 2) Write a brand new Shiny Thing shader and do no optimisation because that take 30000 man-hours of work and you have 500 man-hours available to do it in because CRUNCH TIME ALL THE TIME GOTTA HIT THAT PUBLISHER DEADLINE OR WE WORK FOR NO PAY (AKA welcome to modern 'AAA' contracted development) 3) Drop in the Shiny Thing code that the vendor provides, that is pre-optimised for their stuff and works passable on other cards. Big developers that can control delivery dates and have good institutional knowledge can go for option 1. Most developers can only really choose between option 2 and option 3 due to time/cost constraints. Gameworks is one implementation of option 3: Nvidia provides chunks of pre-optimised code that a developer can drop in, same as they did for Gameworks in DX11 and earlier.

    • Voldenuit
    • 3 years ago

    PCPer has a nice preview of FCAT VR. They also specifically mention that the tool will work with both nvidia and amd cards (although they said the nv cards provide more metadata).

    [url<]https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-FCAT-VR-Preview-New-Performance-Tool[/url<]

    • chuckula
    • 3 years ago

    [quote<]Nvidia claims that GameWorks DX12 takes advantage of asynchronous compute, potentially helping developers take advantage of otherwise-untapped hardware resources.[/quote<] See this is why AMD is in no hurry to release Vega. With Nvidia pushing asynchronous compute, even older AMD cards will soon beat the GTX-1080ti.

      • Chrispy_
      • 3 years ago

      Does Nvidia have any hardware that includes working async compute yet?

      I thought it was broken/kludged in Maxwell and Pascal was still not working correctly.

        • jts888
        • 3 years ago

        To be fair though, Maxwell and Pascal do [i<]technically[/i<] support async compute shaders, just not with the granularity or lower overhead of GCN, since term's just supposed to mean shaders that can be allowed potentially parallel execution if the developer specifies so in a dependency graph. "Real" Pascal (GP100) is believed to have it, while the consumer line (i.e., slightly enhanced Maxwell shrinks) is still arguably faking it at the host driver level, since it's really just doing context switches, just with latency improved by not limiting it to draw boundary calls. What GCN allows is warps/wavefronts from both graphics and compute shaders to both be queued up and executed within CU blocks so that graphics shaders don't needlessly monopolize ALUs when their execution turns more heavily to fixed-function units (geometry, rasterization, ROP). The fundamental challenge in getting this to work as great as it sounds it should is that each CU has finite data holding capacity and register/local data store pressure and L1D cache thrashing are huge constraints. In practice, most of what both AMD and Nvidia try to accomplish is to overlap the screen-space post-processing stages of a game renderer with the geometry setup stages (i.e., Z-buffer generation and shadow map updating) for the next frame, and this can be done through any combination of manual engine/driver developer work or hardware assisted juggling, with both costs and benefits associated with either approach.

        • Kougar
        • 3 years ago

        “Not working correctly” as in not implemented because the uArch was never designed for it. Not sure Volta will change that either but will see.

        • Ryu Connor
        • 3 years ago

        Yes, Pascal has it.

        Its implementation is not the same as the AMD ASync.

        Microsoft didn’t define how ASync had to be done.

        [url<]https://techreport.com/forums/viewtopic.php?f=3&t=118620&hilit=async#p1325207[/url<] [url<]https://hardforum.com/threads/demystifying-asynchronous-compute-v1-0.1909504/[/url<] AMD's implementation works as follows: [quote<]So on GCN each of the ACEs can dispatch work to each of the CUs, and they enable very fast context switching thanks to a dedicated cache. Now, operating under the assumption that context switch latency is negligible and that the 0.25ms of stall time from task A is contiguous this is what happens. Task A dispatched to all 10 CUs. Task A is in execution for 0.5ms. Task A is dispatched to fixed function unit(s) using intermediate result from each CU. ACEs assigns parts of task B to each CU Task A context is swapped to dedicated cache within an ACE. Task B is dispatched Task B is executing on all 10 CUs for 0.3ms Task B is finished Task A context is swapped back into each CU. Task A executes for 0.5ms Task A complete. Total time = 1.3ms vs 1.55ms without exploiting multi-engine[/quote<] NVIDIA's works as follows: [quote<]On Maxwell what would happen is Task A is assigned to 8 SMs such that execution time is 1.25ms and the FFU does not stall the SMs at all. Simple, right? However we now have 20% of our SMs going unused. So we assign task B to those 2 SMs which will complete it in 1.5ms, in parallel with Task A's execution on the other 8 SMs. Here is the problem; when Task A completes Task B will still have 0.25ms to go, and on Maxwell there's no way of reassigning those 8 SMs before Task B completes. Partitioning of resources is static(unchanging) and happens at the drawback boundary, controlled by the driver. So if driver estimates the execution times of Tasks A and B incorrectly, the partitioning of execution units between them will lead to idle time as outlined above. Pascal solves this problem with 'dynamic load balancing' ; the 8 SMs assigned to A can be reassigned to other tasks while Task B is still running; thus saturating the SMs and improving utilization.[/quote<] The huge difference in ASync shouldn't be surprising. The design of execution units between AMD and NVIDIA are completely different. [quote<]GCN has one geometry engine and one rasterizer in each Shader Engine (usually 9 CUs). NVIDIA employs a geometry engine per SM and rasterizers shared by all SMs in a GPC ( 4 or 5). The balance of resources is radically different.[/quote<] This is the problem with DX12. By asking developers to get deeper to the metal, you've basically asked them to do per vendor optimization. Ain't nobody got time for that. [url<]https://developer.nvidia.com/dx12-dos-and-donts[/url<] [quote<]Do [...] Expect to maintain separate render paths for each IHV minimum[/quote<] GDC - [url<]http://unspacy.com/ryu/tr/doublethework.png[/url<]

          • Voldenuit
          • 3 years ago

          Thank you.

          It’s not as if there is one canonical way to achieve async compute in DX12. Microsoft merely set out the specification, and it was up to vendors to implement it with their own architectural approaches, factoring in their existing and planned architectures, silicon budget, performance tradeoffs etc.

          Low level APIs will always benefit best from vendor specific solutions, as each architecture will have its own layout, capabilities, and idiosyncracies. It’s not (necessarily) a conspiracy. And if a developer doesn’t understand that, the onus is on them, not the hardware IHV.

          • jts888
          • 3 years ago

          The analysis you’re quoting is rather poor.

          The dynamic behavior of Pascal is mediated at the driver level, which has microsecond-level latencies to respond to transient on-GPU conditions as well as requires interrupt servicing by the host.

          GCN also doesn’t operate on the basis of “context switches” (think more like HyperThreading), nor are ACEs some sort of context caches.

          • psuedonymous
          • 3 years ago

          To expand on this: that SM partitioning for multiple tasks can (and has) been done on DX11 also. This is why AMD get such speedups from Aysnc Compute being explicitly enabled on DX12, but Nvidia do not: the SM partitioning can occur implicitly (and has been since Kepler), but the pre-emption method GCN uses must be explicitly invoked. GCN under DX11 has large amounts of SM time sitting idling due to this. It’s also why GCN cards consistently show very high theoretical GPU compute performance compared to an Nvidia card that performs similarly in practical gaming.

      • erwendigo
      • 3 years ago

      Yes, like the 7970 or 270x…. uppsss, sorry. I forgot it, AMD doesn’t support async now with these deprecated products (zero support in drivers about async for the masses that have a card with GCN 1.0).

      Don’t be a moron, the last game that used async with DX12 (a gaming evolved one, Sniper Elite 4) have very similar improvements with async with nvidia and amd cards, if nvidia is supporting async in its gameworks platform is to try that when the developers use async, they do it in the right way for the nvidia architectures. Async programming requires a especific architecture aproximation to the issue, sniper 4 got it right in both sides, so probably they took a two way developing of async to running well in both platforms.

      The support of async in gameworks will be clearly oriented to the nvidia archs. Sorry for destroying your dreams, but that delusional state of the mind of you, well, it’s like stealing a candy to a kid!

        • AnotherReader
        • 3 years ago

        You have been trolled by Chuckula. He wins!

          • chuckula
          • 3 years ago

          [url=https://www.youtube.com/watch?v=J6KOc_vVmSM<]THE DAY IS MINE![/url<]

Pin It on Pinterest

Share This