Single page Print

A first look at AMD's Mullins mobile APU

And a primer on Beema

Three years ago, AMD released Zacate, a new APU architected from the ground up for low-power mobile systems. Every year since then, the company has rolled out an updated lineup of chips with that very same mission. The new arrivals this year are called Mullins and Beema, and they promise to deliver the best blend of performance and power efficiency seen so far in low-power AMD APUs.

We already learned a little about these products at AMD's APU13 event back in November. A couple of weeks ago, AMD flew us down to its Austin, Texas campus for a presentation that revealed the missing pieces. We found out all about what makes these systems-on-a-chip different (and better) than their predecessors, Temash and Kabini, which were introduced last May.

AMD also gave us some hands-on time with a real, live Windows 8.1 tablet based on Mullins. We were able to run some of our productivity and gaming benchmarks on the system, and we were left with a good sense of how AMD's next-gen tablet SoCs may compare to the competition. We're still missing data on battery life, but the performance picture, at least, is fairly clear.

Without further ado, let's delve into our first look at AMD's Mullins APU—and its slightly more power-hungry sibling, Beema.

Introducing Mullins and Beema
Mullins and Beema are really two sides of the same coin. That is, the code names refer to the same silicon. AMD uses binning to differentiate Beema, which is aimed at thin-and-light notebooks, from Mullins, which fits inside the tighter power envelopes required for tablet designs. This is the same kind of two-for-one scheme we saw with Temash and Kabini last year—and with previous iterations of AMD's low-power mobile APUs.

So, what's changed since the previous generation? From a CPU and GPU architecture standpoint, not much. The diagram above refers to "Puma+" CPU cores, but Mullins and Beema are actually based on the same Jaguar CPU microarchitecture as their predecessors. No architectural changes have been made to boost instructions per clock, we're told. AMD has nevertheless taken a number of steps to raise clock speeds when appropriate and to boost power efficiency. Some of those improvements extend to the integrated Graphics Core Next GPU. In all, the transistor count has risen from 914 million to 930 million transistors due to "low level changes." AMD has also made enhancements to the memory and display interfaces, and it's enabled an ARM Cortex-A5 core that was present but disabled in Temash and Kabini. This ARM core underpins Mullins and Beema's Platform Security Processor (PSP), which sets up an isolated execution environment for secure applications.

We'll look at the PSP more closely in a minute. For now, let's talk a little more about those performance and power-efficiency tweaks.

Thanks to a mix of circuit-power optimizations and process-scaling improvements, Mullins and Beema leak substantially less power than the previous generation. AMD claims to have achieved a 19% leakage reduction across the CPU cores and a 38% leakage reduction in the integrated GPU. Less leakage means less energy wasted as heat, which in turn means AMD was able to raise clock speeds within the same thermal envelopes. The fastest 15W chip from last year's Kabini lineup ran at 1.55GHz with a 500MHz GPU speed, but the fastest 15W Beema chip runs at 2GHz, can hit 2.4GHz with Turbo Core, and clocks its GPU at 800MHz. That's quite impressive, considering both processors are manufactured on the same 28-nm process.

Speaking of Turbo Core, Mullins and Beema make more extensive—and more intelligent—use of that feature than their predecessors. While Temash and Kabini supported Turbo Core in the silicon, only one member of the lineup actually implemented it: the A6-1450, which used Turbo to push its CPU clock speed from 1GHz up to 1.4GHz. In that implementation, Turbo didn't exploit the chip's temperature monitoring capabilities. AMD instead estimated power consumption using digital activity counters.

Mullins and Beema do things a little differently. Turbo Core is enabled in four of the seven initial models, and it allows for much greater gains than before. For example, Turbo can push the fastest Mullins variant from its base speed of 1.2GHz to a maximum of 2.2GHz. Temperature monitoring comes into play this time, but in a way that differs somewhat from what we've seen on other AMD processors. Mullins and Beema's temperature-based clock speed and voltage regulation even has its own name: STAPM, short for Skin Temperature Aware Power Management.

The theory behind STAPM is simple. The silicon is capable of running at up to 100°C without any reliability problems, but sustaining that temperature won't do in a tablet, where the heat will propagate from the chip to the chassis and will eventually burn the user's fingers. Rather than lower the maximum operating temperature, AMD lets the chip run at up to 100°C—but it uses an algorithm to estimate the temperature of the tablet's "skin," or outer chassis, and it ramps down the clock speed and voltage to prevent the skin temperature from getting too toasty.

STAPM involves a "sophisticated . . . silicon temperature tracking algorithm," AMD tells us, but there's no actual sensor to keep track of skin temperatures directly. Skin temperatures are estimated indirectly using "extensive modeling of different system parameters," from screen brightness to fan speed. Through that modeling, AMD says, "We can predict what the skin temp would be based on everything we already know is going on in the system."

The payoff of STAPM, according to AMD, is that the processor can operate at a higher clock frequency for anywhere from four to 20 minutes while the skin temperature slowly creeps up. Since many CPU-intensive productivity tasks involve bursts of activity with long stretches of idle time, STAPM leads to "higher performance most of the time," AMD says. The company claims a performance increase of as much as 63% in "key workloads."

That gain need not come at the cost of battery life, either. On the contrary, a faster-running processor can complete tasks more quickly and thus spend more time at idle, which "actually saves energy in many common use-cases," the company tells us.

STAPM works in combination with another scheme, Intelligent Boost, to maximize power efficiency. As part of Intelligent Boost, the power management micro-controller "tracks application behavior [in] real-time to determine frequency sensitivity," and the clock-boosting behavior is "adjusted accordingly." Translation: clock speeds only go up for applications that stand to benefit. In situations where a higher clock speed wouldn't lead to a substantial performance gain, clock speeds stay low, and less power is consumed.