Taking up the . . .
I think it’s safe to say Mantle has incited an unusual amount of interest for a programming interface. PC gamers and game developers alike are intrigued to see progress on this front, probably because we all have a sense that glassy smooth animation in PC games seems to require beefier hardware than it should—and that PC gaming performance hasn’t improved as rapidly as PC hardware has.
Naturally, then, we set out to test Mantle performance as soon as DICE released a patch late last week for Battlefield 4 that adds support for the new API. We had to wait a little longer for AMD to release the first Mantle-enabled graphics drivers, but the Catalyst 14.1 betas are now available to anyone who wants to try them.
Because these are beta drivers for an all-new API, they come with a host of caveats. AMD tells us the best performance is limited to some of its newest GPUs, particularly the Radeon R9 290 and 290X. The firm expects similar performance gains from any Mantle-capable GPU eventually—that is, anything based on the Graphics Core Next architecture, dating back to the Radeon HD 7000 series—but we’re not quite there yet. Also, although Mantle generally performs well in BF4, there is a known issue with occasional stuttering and potential instability. Furthermore, a number of hardware configurations aren’t supported with Mantle just yet, including the Enduro and Dual Graphics configs for laptops and Eyefinity multi-display setups with monitors in portrait mode. In other words, AMD has a lot of work left to do, and this first beta driver is just an early preview of what’s to come.
Mantle isn’t the only new capability in this driver, either. Catalyst 14.1 also includes frame pacing for multi-GPU solutions based on older GPUs, like the Radeon HD 7000 series and the R9 280X, when connected to Eyefinity multi-monitor setups and 4K displays. That feature remains limited to DirectX 10 and 11, not DX9, but we’re happy to see it. We’ll have to test it separately.
You may have gathered from the list of caveats for this driver that Mantle is an all-new thing. A lot of things we take for granted from Direct3D and OpenGL won’t yet work with Mantle. AMD says it’s been rebuilding all sorts of functionality, like multi-GPU and multi-display support, from scratch. One causalty of Mantle’s novelty is our usual suite of graphics performance tools. Programs like Fraps and the FCAT overlay are built for Direct3D, so when we switch BF4 into Mantle mode, we lose the ability to monitor performance using those tools.
Happily, the folks at DICE have built some good instrumentation into the latest version of BF4, as Johan Andersson explains in this blog post. The screenshot above shows the plot the game can display in the corner of the screen depicting frame rendering times as they happen. Frame production times for the CPU and GPU are tracked separately in different colors, so it’s easy to tell whether performance is mainly CPU-bound or GPU-bound. For instance, in the few seconds represented above, the CPU and GPU are pretty evenly matched, but the CPU is responsible for a few slower frames.
You can tell the game to log performance to a file, in which case it writes out a series of values for each frame: the CPU time, the GPU time, and the overall frame time. That’s it. We’re getting exactly the right raw data, and there’s not an FPS average in sight. This is progress, folks. When I commended Andersson for these tools on Twitter, he said FCAT support will be coming in a later update. That should allow us to do proper multi-GPU testing. For now, we have what we need to do frame-time-based testing with single GPUs and Mantle right out of the gate.
Enabling Mantle is as simple as setting an option in the BF4 video menu and then restarting the game. Once it’s up and running, Mantle seems to be almost entirely seamless. We were able to swich between windowed and full-screen modes without issue, so the API appears to play well with the rest of Windows, even as it’s bypassing the Direct3D rendering path.
Although we haven’t yet done any sort of detailed comparison, at least superficially, image quality in BF4 isn’t altered substantially by switching between the Direct3D and Mantle renderers. Everything looks solid, well filtered, and properly rendered with Mantle, just as it does in D3D. This isn’t a bad start for a new API, especially with an application as complex as Battlefield 4.
We’re beginning our coverage of Mantle by focusing on CPU performance rather than GPU performance because we expect the biggest gains on the CPU side of things. AMD says most of the benefits should come in CPU-limited scenarios, mostly thanks to a reduction in the number of draw calls needed to render a scene. (Direct3D has become rather infamous for the number of draw calls it requires, and those generally translate into additional CPU overhead.) We do anticipate some more modest GPU performance gains from Mantle, as well, and we plan to explore those in another article.
We tested Mantle performance versus Direct3D in Windows 8.1 using a couple of different processors, a Kaveri-based AMD A10-7850K APU and an Intel Haswell-based Core i7-4770K. The idea was to test in a CPU-constrained performance scenario using two processors with different levels of performance. In fact, I had hoped to show a lower level of CPU performance by including another AMD APU, a 65W Richland-based A10-6700. However, its performance turned out to be almost identical to that of the 95W Kaveri 7850K, so I held it out of our final results in order to keep things simple.
The main video card we used was a Radeon R9 290X card from XFX. This 290X comes with a custom cooler and sustains its peak Turbo clock almost constantly, even under the heaviest of loads. It essentially eliminates the clock speed and thermal variance issues we’ve seen with stock-cooled 290X cards. (I’ll be writing more about this card soon.) To ensure the GPU wasn’t the performance constraint, we tested BF4 at 1920×1080 on the “high” image quality presets, which is fairly easy work for a video cards of this power. We also tested at these same settings using a GeForce GTX 780 Ti, in order to see how Nvidia’s Direct3D driver fares compared to AMD’s D3D and Mantle implementations.
We captured performance info while playing through a two-minute-long section of BF4 three times on each config. You can click the series of buttons below to see frame-time plots from one of the test runs for each config we tested.
Even the raw plots readily show Mantle producing lower frame times and more total frames than Direct3D does with the same R9 290X card.
The known issue with occasional stuttering rears its head in one plot, for the 4770K with Mantle. You can’t see the full size of the frame time spike on the plot, but it’s 295 milliseconds—nearly a third of a second. We didn’t see this sort of hiccup all that often, but it did happen during some test runs, including the one we plotted for the 4770K.
AMD has made some big claims for performance improvements from Direct3D to Mantle, and the numbers from the A10-7850K appear to back them up. The leap from an average of 69 FPS to 110 FPS is considerable by any standard, particularly for an API change that apparently produces the same visuals. Even better, our latency-focused metric, the 99th percentile frame time, tends to agree that Mantle is substantially faster than D3D in this case. Mantle also outperforms Direct3D in combination with the Core i7-4770K, but the differences aren’t quite as dramatic.
One thing we didn’t expect to see was Nvidia’s Direct3D driver performing so much better than AMD’s. We don’t often test different GPU brands in CPU-constrained scenarios, but perhaps we should. Looks like Nvidia has done quite a bit of work polishing its D3D driver for low CPU overhead.
Of course, Nvidia has known for months, like the rest of us, that a Mantle-enabled version of BF4 was on the way. You can imagine that this game became a pretty important target of optimization for them during that span. Looks like their work has paid off handsomely. Heck, on the 4770K, the GTX 780 Ti with D3D outperforms the R9 290X with Mantle. (For what it’s worth, although frame times are very low generally for the 4770K/780 Ti setup, the BF4 data says it’s still mainly CPU-limited.)
The “time spent beyond X” graphs are our indicator of “badness,” of how long frame production times exceed several key thresholds. Those intermittent stuttering episodes with the early Mantle driver show up in the beyond-50-ms results for the A10-7850K, even though we didn’t see a hiccup of this size in every run. Since we’re showing the median result from three runs, the spike we plotted for the 4770K doesn’t show up at all here. (There were no such spikes in the other two test sessions.)
The big takeaway here comes from the “time spent beyond 16.7 ms” plot. You need to produce a frame every 16.7 milliseconds to achieve a smooth 60-FPS rate of animation. Mantle moves the A10-7850K much, much closer to that goal, even with that one big latency spike in the picture. If AMD can eliminate those hiccups, then slower CPUs like the 7850K should be capable of delivering a much smoother gaming experience than they can with Direct3D.
These are still early days for Mantle, but we can already see its ability to reduce CPU overhead rather dramatically compared to Direct3D. That’s exactly the sort of innovation folks have wanted to see in PC gaming, and AMD and DICE are already delivering. One would hope this demonstration of a more modern approach to graphics programming would spur others (ahem, Redmond) to innovate in a way that can benefit the entire PC ecosystem.
There’s lots of work yet to be done on Mantle. AMD needs to refine its drivers, add some key features, and improve performance scaling for its older GCN-based graphics chips. Meanwhile, in order for Mantle to really gain traction, EA and DICE will have to follow through on their promise to bring the Mantle rendering path to a host of other games based on the Frostbite 2 engine.
Based on these first results, the big beneficiaries of Mantle’s proliferation will probably be folks who, for one reason or another, have a PC that isn’t built to perform especially well in many of today’s games. PCs with slower processors stand to gain the most.
That said, there are already some well-worn paths to very good gaming experiences on the PC today. The Haswell-based Core i7-4770K is faster than the A10-7850K regardless of the graphics API. Switching from AMD’s Direct3D driver to Nvidia’s will get you more than halfway to Mantle’s performance on an A10-7850K, too. AMD would do well to work on improving its Direct3D drivers and CPUs, as well as pursuing Mantle development—but I’m sure they already know that. I’m happy to see AMD pushing innovation in graphics APIs at the same time.
We’ll surely test Mantle’s performance on a broader range of CPUs as it matures. I’m curious to play around with different core counts and to see whether low-power chips like Kabini can provide good gaming experiences with Mantle. Our next task, though, will be to see what performance benefits Mantle can deliver in GPU-limited scenarios. Stay tuned for that.
I tweet things on Twitter sometimes.