More on how Mantle helps performance
Mantle's closer-to-the-metal development model, coupled with a more lightweight driver, seems to pay some very real performance dividends. The game developers in attendance at APU13 were reluctant to quote actual performance figures from their games, partly because their work still isn't quite finished. However, some figures were quoted that shed light on Mantle's performance benefits.
For starters, Nixxes' Katsman revealed that "very early figures from Thief" (which is "not fully running on Mantle yet") showed a big reduction in draw call overhead. "Before, we would often see about 40% of the CPU time stuck in the driver, in D3D, or in various threads," he said. "The early measurements we did, right now we have that down to about a fifth of that."
The guys from Oxide offered a more visual representation of Mantle's CPU overhead in their talk. Mantle is the yellow rectangle, the game engine is the blue one, and unused CPU time is shown in green:
DICE's Andersson extrapolated upon that same notion in his keynote, saying that, with Mantle, the CPU "should never really be a bottleneck for the GPU anymore." In a separate demonstration, Oxide showed their Mantle-enabled space game suffering no frame rate hit when the FX-8350 processor on which it ran was underclocked to 2GHz, or half its base speed. (Graphics processing in that demo was handled by a Radeon R9 290X.)
The reduction in draw call overhead also means more draw calls can be issued per frame. Riguer said Mantle raises the draw call limit by an order of magnitude to "at least" 100,000 draw calls per frame "at reasonable frame rates." This isn't just theoretical—Oxide showed their space game demo actually hitting 100,000 draw calls per frame. Andersson, who was in the audience for that presentation, was impressed enough to tweet about the demo.
Mantle will allow game developers to use more CPU cores, too, as these two slides from Andersson's presentation show. According to Andersson, the Mantle model outlined in the second slide is the "the exact model that we're using on all of the consoles"—both current and next-gen ones. In his talk, Katsman explained that, if a system has eight cores, Mantle allows developers to use all of those cores for their game. "So, we can have four to do rendering, a few more to do physics and some other things. We can make games that are far more complicated. We can increase the draw distance to significant distances, have far denser worlds."
According to Katsman, "The density of everything in the world is something that's being held back, and I think Mantle will help alleviate that." That said, "Just because we can draw more things doesn't mean we have the CPU resources to simulate them all." For example, while Mantle might make it possible to draw many more characters in a given scene, developers will have to consider the cost of running AI simulations for all of those characters.
In addition to making more effective and efficient use of the CPU, Mantle will allow GPU resources to be used more efficiently. Katsman brought up the Radeon R9 290X, which has 5.6 tflops of compute power, and said that an "awful lot" of that compute power is "lying there dormant." With current APIs, some of the compute power might be used for some parts of a frame, but other parts "will be bottlenecked by something else," such as "getting things from memory, by fetching textures through the texture fetch units, [and] the rasterization units." He went on:
The APIs we have right now, they just allow us to queue synchronous workloads. We say, "draw some triangles," and then, "do some compute," and the driver can try to be a little smart, and maybe it'll overlap some of that. But for the most part, it's serial, and where we're doing one thing, it's not doing other things.
With Mantle . . . we can schedule compute work in parallel with the normal graphics work. That allows for some really interesting optimizations that will really help your overall frame rate and how . . . with less power, you can achieve higher frame rates.
What we'd see, for example—say we're rendering shadow maps. There's really not much compute going on. . . . Compute units are basically sitting there being idle. If, at the same time, we are able to do post-processing effects—say maybe even the post-processing from a previous frame, or what we could do in Tomb Raider, [where] we have TressFX hair simulations, which can be quite expensive—we can do that in parallel, in compute, with these other graphics tasks, and effectively, they can become close to zero cost.
If we guessed that maybe only 50% of that compute power was utilized, the theoretical number—and we won't reach that, but in theory, we might be able to get up to 50% better GPU performance from overlapping compute work, if you would be able to find enough compute work to really fill it up.
The 50% figure is a theoretical best-case scenario, but Katsman added, "It seems quite realistic that you would get maybe 20% additional GPU performance out of optimizations like that."
Also, because Mantle lets developers use GPU memory more efficiently, the new API could allow for the use of higher-resolution textures in a given game, according to Katsman.
Mantle's advantages are many, but a few downsides that were mentioned in the various presentations at APU13.
One of those is that, unsurprisingly, supporting an additional API incurs added development time and cost. Mantle currently works only on GCN-based Radeon graphics processors, which means that developers who adopt it must also use either Direct3D or OpenGL to support other graphics hardware. Andersson said DICE spent about two months porting Battlefield 4's Frostbite 3 game engine to Mantle. Asked for a ballpark cost figure, Katsman told me that, for a simple PC project like Nixxes' Thief port, adding Mantle support might amount to roughly a 10% increase in development cost. He was quick to add, however, that such an increase is a drop in the bucket compared to the total development cost of the entire game for all platforms, which might add up to something like $50 million.
The lack of multi-vendor and multi-platform support is another one of Mantle's notable downsides. Microsoft and Sony use different APIs for the Xbox One and PlayStation 4, and Mantle doesn't yet support Linux, OS X, or Valve's upcoming SteamOS. There are some mitigating factors here, though. Katsman noted that Mantle optimizations are "conceptually similar" to the ones developers write for next-gen consoles. That tells us developers won't be starting from scratch when adding Mantle support to their games. Also, Katsman believes Mantle's performance improvements make its implementation worthwhile even if only a fraction of users benefit. As he pointed out, developers already spend time writing support for features like Eyefinity and HD3D into their games, and those features have even smaller user bases.
Finally, adding Mantle support to current game engines, as Nixxes did with the version of Unreal Engine 3 used by Thief, can be a challenge. "Native D3D ports will not magically get much higher performance," explained Katsman. "If you emulate the same system on top of Mantle, you will not get much better performance." Fully optimizing an existing engine for Mantle seems to involve breaking and rewriting some chunks of that engine to take advantage of the new development model. But here again, Katsman believes the performance improvements make the effort worthwhile.
|Nvidia admits, explains GeForce GTX 970 memory allocation issue||125|
|Here's my guest appearance on tonight's Alt+Tab Show||7|
|Watch John Romero talk about Doom level design||38|
|I'll be on Newegg TV's Alt+Tab show live at 3PM PT||15|
|Windows 10 build 9926 adds Cortana, Continuum, and more||38|
|Apacer exec sees 256GB SSDs falling below $70 this year||45|
|Deal of the week: IPS monitors with 4K and 1080p resolutions||11|
|Haswell CPU and Radeon graphics team up in ASRock's VisionX mini PC||14|
|Report: Google to launch its own cellular service||82|
|HA. AMD in the red and nVidia in the green. Thats funny cause you know... *cough* oh forget it.||+79|