Three years ago, AMD released Zacate, a new APU architected from the ground up for low-power mobile systems. Every year since then, the company has rolled out an updated lineup of chips with that very same mission. The new arrivals this year are called Mullins and Beema, and they promise to deliver the best blend of performance and power efficiency seen so far in low-power AMD APUs.
We already learned a little about these products at AMD’s APU13 event back in November. A couple of weeks ago, AMD flew us down to its Austin, Texas campus for a presentation that revealed the missing pieces. We found out all about what makes these systems-on-a-chip different (and better) than their predecessors, Temash and Kabini, which were introduced last May.
AMD also gave us some hands-on time with a real, live Windows 8.1 tablet based on Mullins. We were able to run some of our productivity and gaming benchmarks on the system, and we were left with a good sense of how AMD’s next-gen tablet SoCs may compare to the competition. We’re still missing data on battery life, but the performance picture, at least, is fairly clear.
Without further ado, let’s delve into our first look at AMD’s Mullins APU—and its slightly more power-hungry sibling, Beema.
Introducing Mullins and Beema
Mullins and Beema are really two sides of the same coin. That is, the code names refer to the same silicon. AMD uses binning to differentiate Beema, which is aimed at thin-and-light notebooks, from Mullins, which fits inside the tighter power envelopes required for tablet designs. This is the same kind of two-for-one scheme we saw with Temash and Kabini last year—and with previous iterations of AMD’s low-power mobile APUs.
So, what’s changed since the previous generation? From a CPU and GPU architecture standpoint, not much. The diagram above refers to “Puma+” CPU cores, but Mullins and Beema are actually based on the same Jaguar CPU microarchitecture as their predecessors. No architectural changes have been made to boost instructions per clock, we’re told. AMD has nevertheless taken a number of steps to raise clock speeds when appropriate and to boost power efficiency. Some of those improvements extend to the integrated Graphics Core Next GPU. In all, the transistor count has risen from 914 million to 930 million transistors due to “low level changes.” AMD has also made enhancements to the memory and display interfaces, and it’s enabled an ARM Cortex-A5 core that was present but disabled in Temash and Kabini. This ARM core underpins Mullins and Beema’s Platform Security Processor (PSP), which sets up an isolated execution environment for secure applications.
We’ll look at the PSP more closely in a minute. For now, let’s talk a little more about those performance and power-efficiency tweaks.
Thanks to a mix of circuit-power optimizations and process-scaling improvements, Mullins and Beema leak substantially less power than the previous generation. AMD claims to have achieved a 19% leakage reduction across the CPU cores and a 38% leakage reduction in the integrated GPU. Less leakage means less energy wasted as heat, which in turn means AMD was able to raise clock speeds within the same thermal envelopes. The fastest 15W chip from last year’s Kabini lineup ran at 1.55GHz with a 500MHz GPU speed, but the fastest 15W Beema chip runs at 2GHz, can hit 2.4GHz with Turbo Core, and clocks its GPU at 800MHz. That’s quite impressive, considering both processors are manufactured on the same 28-nm process.
Speaking of Turbo Core, Mullins and Beema make more extensive—and more intelligent—use of that feature than their predecessors. While Temash and Kabini supported Turbo Core in the silicon, only one member of the lineup actually implemented it: the A6-1450, which used Turbo to push its CPU clock speed from 1GHz up to 1.4GHz. In that implementation, Turbo didn’t exploit the chip’s temperature monitoring capabilities. AMD instead estimated power consumption using digital activity counters.
Mullins and Beema do things a little differently. Turbo Core is enabled in four of the seven initial models, and it allows for much greater gains than before. For example, Turbo can push the fastest Mullins variant from its base speed of 1.2GHz to a maximum of 2.2GHz. Temperature monitoring comes into play this time, but in a way that differs somewhat from what we’ve seen on other AMD processors. Mullins and Beema’s temperature-based clock speed and voltage regulation even has its own name: STAPM, short for Skin Temperature Aware Power Management.
The theory behind STAPM is simple. The silicon is capable of running at up to 100°C without any reliability problems, but sustaining that temperature won’t do in a tablet, where the heat will propagate from the chip to the chassis and will eventually burn the user’s fingers. Rather than lower the maximum operating temperature, AMD lets the chip run at up to 100°C—but it uses an algorithm to estimate the temperature of the tablet’s “skin,” or outer chassis, and it ramps down the clock speed and voltage to prevent the skin temperature from getting too toasty.
STAPM involves a “sophisticated . . . silicon temperature tracking algorithm,” AMD tells us, but there’s no actual sensor to keep track of skin temperatures directly. Skin temperatures are estimated indirectly using “extensive modeling of different system parameters,” from screen brightness to fan speed. Through that modeling, AMD says, “We can predict what the skin temp would be based on everything we already know is going on in the system.”
The payoff of STAPM, according to AMD, is that the processor can operate at a higher clock frequency for anywhere from four to 20 minutes while the skin temperature slowly creeps up. Since many CPU-intensive productivity tasks involve bursts of activity with long stretches of idle time, STAPM leads to “higher performance most of the time,” AMD says. The company claims a performance increase of as much as 63% in “key workloads.”
That gain need not come at the cost of battery life, either. On the contrary, a faster-running processor can complete tasks more quickly and thus spend more time at idle, which “actually saves energy in many common use-cases,” the company tells us.
STAPM works in combination with another scheme, Intelligent Boost, to maximize power efficiency. As part of Intelligent Boost, the power management micro-controller “tracks application behavior [in] real-time to determine frequency sensitivity,” and the clock-boosting behavior is “adjusted accordingly.” Translation: clock speeds only go up for applications that stand to benefit. In situations where a higher clock speed wouldn’t lead to a substantial performance gain, clock speeds stay low, and less power is consumed.
The PSP and other improvements
Mullins and Beema’s Platform Security Processor, or PSP, was already inside last year’s Temash and Kabini silicon, but AMD waited until this generation to enable it—purportedly out of a desire to “more closely align with partner timelines and industry readiness.” The PSP is made up of a 32-bit ARM Cortex-A5 processor core, a cryptographic co-processor, dedicated ROM and SRAM, some extra logic to enable secure booting, and a local memory interface that allows access system memory and resources.
In a nutshell, the PSP establishes an autonomous (and programmable) execution environment for secure applications. The nitty-gritty of it eludes us somewhat, but AMD provided this helpful summation:
We have and are working with a partner to enable the security kernel, which we then built into an enabled reference platform. That reference platform was then provided to [software vendors] with support from the security kernel partner and AMD to enable [software vendors] to adapt their software to use the hooks required to enter the secure environment. X86 applications dive through [the] Trusted Application Environment to get access to the ARM core.
The diagram below provides a higher-level overview of how the PSP comes into play:
Non-secure applications run on the x86 cores, while secure applications like anti-virus software, online banking, and biometric authentication can execute code on the PSP. Any malware running on the system should be constrained to the x86 environment, and it should therefore be unable to tamper with or hijack whatever data the PSP is processing.
AMD says this design enables “enterprise class security” and leverages an industry standard, ARM TrustZone, that also works on ARM-based smartphones. AMD expects at least four or five applications to leverage the PSP by the end of 2014. We’re told the PSP will eventually become “pervasive” across the company’s product line, as well. That should give developers an added incentive to support the feature in their applications.
Mullins and Beema have other tricks up their sleeves, too, in addition to the PSP and the new Turbo mojo.
On the memory controller front, faster DDR3-1866 RAM is supported on Beema variants aimed at mainstream notebooks, and AMD has added low-power mode that’s “optimized for [the] lowest power DDR3-1333.” That mode is supposed to cut power draw by 500 mW. Mullins and Beema don’t support LPDDR3, like Intel’s Bay Trail SoC, but AMD claims it’s “getting most of the gains” of LPDRR3 with its new low-power mode.
Also, the company has used voltage-mod logic to reduce the power consumed by the display interface. AMD estimates the savings from that tweak at 200 mW for high-resolution panels.
Finally, there’s Windows’ connected standby mode—a critical feature for tablets that went untapped by Temash and Kabini. AMD tells us Mullins and Beema “can support connected standby.” However, the company “has not seen high demand from customers,” and it expects “the additional cost associated with the feature to impact how many total systems ultimately implement the feature.” That cost stems from the various platform requirements associated with connected standby.
Our sense is that, while there are no technical obstacles to developing a connected standby driver for Mullins and Beema, AMD isn’t feeling much pressure to do so right now. That could be because the company hasn’t been able to secure tablet design wins for Mullins yet. Connected standby allows tablets to provide always-on connectivity in the same way that smartphones do, so one would imagine tablet makers would be eager to implement the feature, even if there is an added cost. Earlier this year, a CNet News story attributed the lack of 64-bit Bay Trail slates to the prioritization of 32-bit connected standby drivers by Intel and Microsoft.
The Mullins and Beema lineup
As we mentioned earlier, Mullins and Beema are based on the same silicon and differentiated through binning. AMD tells us the chips are “very carefully harvested and separated to fit their target market.” The launch lineup consists of seven offerings in all: three that fall under the Mullins umbrella and four higher-wattage models that bear the Beema code name.
Note the new branding. AMD has tacked “Micro” between the series indicator and the model number to demarcate Mullins. That’s a little clearer than what we saw in the previous generation. Temash and Kabini were only differentiated by the first digits of their model numbers.
Interestingly, the lowest-wattage Mullins model actually has a higher TDP than the lowest-wattage chip from last year’s Temash series. However, Mullins has a lower SDP, or scenario design power. In practice, AMD says, Mullins will be able to power high-performance fanless tablets, which Temash could not.
Mullins can also run at much higher clock speeds than Temash, thanks to Turbo and STAPM. Temash’s processor cores maxed out at a 1GHz base speed and a 1.4GHz Turbo speed, but Mullins can range up to 2.2GHz. Perhaps not coincidentally, the slowest member of the Mullins series has the same CPU clock speeds as the fastest Temash chip.
Turbo Core isn’t as prevalent among these Beema processors—only the fastest of them supports it. Nevertheless, clock speeds have gone up, as has power efficiency. The fastest member of the mobile Kabini family runs at 2GHz, but it’s saddled with a 25W thermal envelope. The new A6-6310 reaches the same base speed and runs even faster thanks to Turbo, all within a tighter 15W TDP.
We didn’t get a chance to benchmark Beema, but we do have Mullins results on the next page. Press on to see them.
The Discovery tablet and the test
At AMD’s campus in Austin, we were given a few hours to test the Discovery Project tablet, an 11.6″ slate powered by the fastest Mullins variant. The device featured a 1920×1080 display resolution, 2GB of memory, a 64GB solid-state drive, Bluetooth, Wi-Fi, GPS, and all the other bits and pieces one would expect to find in a Windows 8.1 tablet. AMD described the Discovery tablet as a “fully featured product with everything Mullins has to offer.” This was no product, of course. It was an “internally developed” reference machine intended as a showcase for the press and AMD’s partners.
Still, it worked well, and it didn’t look half bad.
The system was set up with Windows 8.1 and a number of applications intended for us to test. We didn’t use the AMD-provided benchmarks, though. Rather, we came armed with a USB 3.0 solid-state drive containing our own test apps and games. We brought the latest versions of 7-Zip, LuxMark, Musemage, TrueCrypt, the x264 encoder, 3DMark, BioShock Infinite, DiRT Showdown, and Tomb Raider. We also ran the latest SunSpider and Kraken web benchmarks.
We didn’t have time to install a clean version of the operating system, but we did check the “Uninstall” and “Power Options” control panels, among other things, to make sure the test conditions were as clean as our schedule would permit. We also configured the operating system the way we usually do, disabling things like Windows Defender and System Protection, to ensure fair and comparable results.
For comparison, we ran the same tests on Asus’ Transformer Book T100 convertible tablet, which is based on Intel’s Bay Trail processor, and the Kabini whitebook AMD sent us last year. (A small caveat: while the AMD systems both ran Windows 8.1 x64, the Transformer was stuck with Windows 8.1 x86, since it lacks 64-bit support.) For our web and 3DMark tests, we also threw in data from Google’s second-gen Nexus 7 slate and from Nvidia’s Shield handheld, which both ran Android 4.4 KitKat. The Shield may not be a tablet, but it provides a glimpse at the unbounded performance of Nvidia’s Tegra 4 processor, which is helpful as a point of reference.
Memory subsystem performance
The Mullins tablet had the least memory bandwidth of the three systems we tested. That’s not too surprising, because the A10 Micro-6700T is limited to a single channel of 1333MHz memory. By contrast, the A4-5000’s single-channel controller supports faster 1600MHz RAM. The Atom Z3740 is limited to a 1066MHz memory speed, but its second memory channel more than makes up the deficit.
Low memory bandwidth or not, Mullins performs pretty nicely in these productivity tasks. The A10 Micro-6700T actually matches or outperforms the A4-5000 despite its lower base clock speed and tighter TDP. Credit for that performance should probably go to the new Turbo mechanism, which can push the Micro-6700T from its base frequency of 1.2GHz to as much as 2.2GHz—well above the A4-5000’s 1.5GHz maximum.
Mullins fares quite nicely against the Atom, too. It’s faster in x264 and TrueCrypt, and it’s not very much slower in 7-Zip. Of course, the Atom does have a lower SDP: 2W, compared to 2.8W for the A10 Micro-6700T.
Thanks to its integrated GCN graphics processor, Mullins comes prepared for GPU computing workloads. We put those capabilities to the test in LuxMark, a ray-tracing benchmark, and in Musemage, a GPU-accelerated photo editing application.
Mullins is far more competent than Bay Trail in these GPU-computing tests. However, this is one scenario where it fails to catch up to the 15W A4-5000—despite the fact that the two chips have the same maximum GPU speed. In all likelihood, Mullins’ more limited memory bandwidth is to blame.
Graphics and gaming
We’ll start our graphics testing with 3DMark, which has the advantage of running on not just Windows devices, but also Android ones. That provides some very helpful context. Note that, while some of the devices below have different screen resolutions, 3DMark is designed to compensate. It renders scenes offscreen at a fixed resolution (1280×720, in the case of the “Ice Storm” test) and then scales the image to match the display.
No doubt about it, Mullins has a lot of graphics horsepower for a tablet SoC. It’s faster than Bay Trail, and it’s even speedier than the Tegra 4 chip in Nvidia’s Shield handheld, which runs virtually unrestrained thanks to the Shield’s large chassis, active cooling system, and beefy battery.
Again, though, Mullins doesn’t quite match the performance of the 15W A4-5000.
Next up: a few games from our Steam library. We ran BioShock Infinite, Dirt Showdown, and Tomb Raider at 1280×720 using the lowest detail settings available.
No real surprises here. The numbers echo what we saw above: Mullins is obviously more capable than Bay Trail, but it’s slower than Kabini, notably so in some cases.
More to the point, Mullins isn’t quite fast enough to run these recent PC games at playable frame rates, even with the detail and resolution turned down. DiRT Showdown is the smoothest of the bunch, but 30 FPS is still a little sluggish for a racing game. On a Mullins tablet, you’d probably want to stick with older releases, casual or indie titles, and whatever tablet-friendly games are available on the Windows Store.
Spending time with the Discovery tablet has answered many of our questions—but it’s also left us with many new ones.
We learned that, for a tablet processor, Mullins is quite fast. It performs comparably in many tests to last year’s A4-5000, a 15W part that landed in mainstream notebooks, and its graphics performance outpaces that of the competition from Intel and Nvidia. (To be fair, the Atom Z3740 we tested isn’t the fastest Bay Trail incarnation—but it does have the same GPU speeds as the flagship Z3770.)
Mullins’ performance would be a great asset for any convertible tablet designed to double as a productivity machine. The performance we saw may not be sustainable for prolonged workloads, since AMD’s STAPM mechanism ramps clock speeds during a relatively small window of time. Then again, as AMD pointed out, many CPU-intensive productivity applications involve short bursts of activity with long idle periods. Mullins should do very well in those cases.
Of course, to be fit for convertibles, Mullins also needs to allow for good battery life. We still have no clear sense of whether it will do so. AMD has told us that Mullins will be “competitive” in that respect, which suggests some level of parity with Bay Trail. But in the absence of hard numbers, it’s difficult to make any predictions.
We also don’t know how successful AMD will be at scoring tablet design wins this year. Last year’s Temash chip was a no-show in Windows 8 and 8.1 slates from major vendors, and AMD wouldn’t say how widely it expects Mullins to be adopted. The company did, however, concede that it will be a “challenge” for it to “be highly visible in a larger number of SKUs.” Connected standby support could factor in there, but one thing is certain: AMD is facing extremely aggressive competition. It’s no secret that Intel has been carving out market share for its Atom chips using so-called “contra revenue” subsidies. Those subsidies might well be the biggest impediment to Mullins’ success.
Happily, Beema may have an easier time working its way into the marketplace. Lenovo has already announced several laptops based on it: the Flex 2 and the B and G series. Those systems are coming in June, and they’ll feature a special, A8-branded version of Beema that AMD offers “for select opportunities.” (That model isn’t part of the standard Beema lineup.)
Additional design wins, if they come, will likely be announced around the time of the Computex trade show in early June, with availability to follow in the summer.