What Mantle does
Mantle takes a number of steps to alleviate the issues outlined on the previous page. By giving developers more direct control of the GPU and putting them, in Riguer's words, in the "driver developer's seat," Mantle can cut overhead and allow for more efficient use of both the graphics hardware and the CPU.
Mantle's most fundamental and innovative feature, according to AMD's Brian Bennett, is its execution model. Here's how he described it:
These days, a modern GPU typically has a number of engines that execute commands in parallel to do work for you. You have a graphics or compute engine, DMA, multimedia . . . whatever. The basic building block for work for those engines is a command buffer. In [the diagram above], a command buffer is a colored rectangle. A driver builds commands targeting one of the engines; it puts the command buffer into an execution queue, and the engine, when it's ready to do some work, goes to the execution queue, grabs the work, and performs it.
[That's] as opposed to a context-based execution model, where it's up to the driver to choose which engine we want to target; it's up to the driver to figure out where we break command buffers apart, and manage the synchronization between those engines. Mantle exposes all that, abstracted, to you. So, you have the ability to build a command buffer, insert commands into it, submit it to the queue, and then synchronize the work between it. This lets you take full advantage of the entire GPU.
More fundamentally to Mantle's goals is the fact that you can create these command buffers from multiple application threads in parallel. . . . That is the key to opening up the potential of our multi-core CPUs these days. There is no synchronization at the API level in Mantle; there is no state that persists between command buffers. It is up to you to do the synchronization of your command building and of your command submission; and if you want to do work on multiple engines, we give you constructs to synchronize work between those engines. You have all the power.
Mantle's execution model extends to multiple GPUs. Developers have access to all of the engines on all of a system's Mantle-compatible GPUs, and they can control those GPUs and handle synchronization themselves. "Synchronization between the GPUs," Riguer explained, "becomes a natural extension to the mechanism we exposed . . . on synchronization between multiple queues. In fact, we make [the] multi-GPU model exactly like a single-GPU model scaled up to multiple devices."
As a result, developers have much more flexibility in the way they split up workloads between GPUs, and they can "try to make [their games] scale a lot better" than what's possible with CrossFire right now. Techniques superior to today's alternate frame rendering (AFR), whereby each GPU renders a different frame in the animation, can be developed, and asymmetric configurations—such as those with slow integrated graphics and fast discrete graphics—can be more readily exploited.
Moving beyond AFR is particularly important. While that technique works reasonably well with current games, Riguer said future titles will run more workloads with lots of frame-to-frame dependencies, such as compute-based effects. To handle those, "You would need to either duplicate the workload across GPUs or serialize across the GPUs. In either case, your scaling suffers."
Mantle manages memory in a very different way than Direct3D, too. Here is Bennett's explanation of that feature:
In traditional APIs, when you create an object like an image or a buffer, the driver implicitly allocates memory for you. [That] seems okay, but it has a number of problems. It's difficult to efficiently recycle memory; you're going to have bigger memory footprints because of that; creating the object itself is more expensive, because you have to go to the OS to get the GPU memory; and the driver becomes inefficient, because it spends a lot of time managing these OS video memory handles to work with the display driver model.
In Mantle, API objects are simple CPU-side info that have no memory explicitly attached. Instead, you as the app developer allocate GPU memory explicitly and bind it to the object.
Again, higher efficiency and flexibility is the name of the game.
That brings us to monolithic pipelines. To paraphrase Johan Andersson, Mantle rolls all of the various shader stages that make up the graphics pipeline into a single object. Above, I've added the slide from Andersson's keynote, since it's somewhat more enlightening than the one used by Riguer and Bennett in their presentation.
In short, monolithic pipelines help avoid draw-time shader compilation—a problem that, as I mentioned earlier, can make games stutter. Here's how Bennett sums it up:
In the current implementations, draw-time validation that the driver does is super expensive. Since you can vary all your shaders in state independently, we spend a lot of time at draw deciding what hardware commands we should write. By compiling the pipeline up front, binding the pipeline is lightning fast in comparison.
Second, by compiling this up front, you give us the opportunity to spend some cycles to improve the GPU performance. If we know everything you're doing in the whole pipeline, we can optimize that. And . . . with the draw-time validation models, sometimes you'll bind a new state, call draw, and that draw will have an inexplicably high CPU cost. Maybe the driver had to kick off a shader compile in the background, and that's going to impact you. [There are] no surprises with Mantle.
Mantle doesn't just help prevent shader compilation from occurring mid-game. It can also prevent shaders from being recompiled each time the game is launched. According to Riguer, recompilation can account for a "lot of the startup time," but with Mantle, "the shader compilation is a lot more predictable, and we give you the ability to save and load very quickly and easily a complete compiled shader pipeline, which should virtually eliminate all the loading time that stems from shader compilation."
Incidentally, Bennett said he expects pipelines to look "different in the future." He suggested that Mantle's graphics pipeline abstraction will help the API adapt to these future changes—enabling "some stuff that we can't do in real time now."
Mantle introduces a new way to bind resources to the graphics pipeline, as well. According to Bennett, the traditional binding technique is a "pretty big performance hog," and the currently popular alternative, which he calls "bindless," has downsides of its own, including higher shader complexity, reduced stability, and being "less GPU cache friendly."
Mantle's binding model involves simplified resource semantics compared to Direct3D, and it works like so:
In Mantle, when you create your pipeline, you define a layout for how the resources will be accessed from the pipeline, and you bind that descriptor set. The descriptor set is an array of slots that you bind resources to. Notably, you can bind another descriptor set to a slot—and this lets you set hierarchical descriptions of your resources.
If your eyes just glazed over, that's okay—mine did a little, too. In any event, Bennett said that the ability to build descriptor sets and to generate command buffers in parallel is "very good for CPU performance." During his presentation, Johan Andersson brought up a descriptor set use case that reduced both CPU overhead and memory usage.
Bennett went over one more way in which Mantle can reduce CPU overhead: resource tracking. Right now, drivers spend a "lot of time" keeping track of resources. With Mantle, tracking resources is up to the application. Bennett said he expects apps to do a better job of it than the graphic drivers, and he hinted that developers won't have to do much extra work to make that happen: "Your game engine is probably doing that sort of tracking already, because you're supporting consoles that require it."
Last, but not least, Mantle has some debugging and validation tools built into the API and the accompanying driver. AMD didn't share a ton of specifics about those, but there was mention of "lots of extra controls for stress testing applications and forcing very specific debug scenarios." Riguer added, "In fact, I would say that writing [debug] tools on top of Mantle, in many cases, would not be much harder than slapping on a fancy UI on top of capabilities we are putting right into Mantle." Both Johan Andersson of DICE and Jurjen Katsman of Nixxes called Mantle's debugging and validation tools "really powerful."
|Aorus K9 Optical keyboard senses strokes with infrared light||7|
|Deals of the day: Ryzen and Threadripper CPUs on the cheap and more||2|
|ROG Strix XG32VQ and XG35VQ fuse fast VA panels with FreeSync||14|
|ROG Strix GL702ZC takes 16 Ryzen threads on the move||19|
|Rumor: December Radeon drivers will bring a performance OSD||32|
|Intel spins up new assembly-and-test site for Coffee Lake CPUs||11|
|Deal of the day: A laptop with an i5-8250U and Pascal graphics for $680||34|
|G.Skill's DDR4-4400 kit seizes the four-module memory speed crown||19|
|EVGA DG-7 cases cover every base||20|