Single page Print

AMD previews Carrizo APU, offers insights into power savings

Excavator cores and other innovations to help improve efficiency

"Carrizo" is the code name of AMD's next-generation CPU for notebooks and convertible PCs. This chip has been on AMD's roadmap for some time now as the successor to the Kaveri chip that powers the firm's current lineup of A-series APU products.  We even got an early look at the first working Carrizo silicon at CES in January.

Still, many of the details about Carrizo and its next-generation "Excavator" CPU cores have been shrouded in mystery to date. Fortunately, that's changing. In conjunction with the International Solid State Circuits Conference, AMD has begun to tell the story of how it has achieved major improvements in power efficiency and performance with Carrizo, even though the chip is built on a 28-nm fabrication process like Kaveri before it. AMD Corporate Fellow Sam Naffziger and Senior Director of Client Products Kevin Lensing briefed us ahead of ISSC and shared some fascinating information about Carrizo's new technology.

Introducing Carrizo
For the uninitiated, Carrizo is AMD's answer to Intel's Broadwell chips, and it's expected to arrive in consumer systems around the middle of this year. Like AMD's other "Accelerated Processing Units," or APUs, Carrizo combines CPU cores and graphics on the same piece of silicon. In fact, Carrizo is almost a complete PC system on a single chip, and nearly every major component onboard has been updated compared to the prior generation.

We don't yet have all of the details, but Carrizo combines an evolved version of the Bulldozer CPU core known as Excavator, "next generation" Radeon graphics based on the GCN architecture, and an updated UVD accelerator block capable of handling H.265 video. Carrizo is also the first "big" AMD APU to integrate the traditional south bridge I/O functions (like USB and SATA), making it a true system on a chip.

Thanks to this change, Carrizo is able to share the same pinout and motherboard infrastructure as AMD's low-cost, low-power product, known as Carrizo-L. (Carrizo-L is similar to Beema and Mullins and will likely compete with Intel's Bay Trail and Cherry Trail products.) Lensing told us AMD hopes the shared infrastructure between Carrizo and Carrizo-L will allow the company to capture more of the available market. PC makers should be able to offer systems across a broad range of price and performance levels based on the same basic chassis and motherboard.

AMD claims Carrizo is "the first processor in the world with HSA 1.0 support," referring to its Heterogeneous Systems Architecture effort to enable converged CPU-and-GPU computing. That claim is a bit confusing since AMD said something similar about Kaveri, which it touted as the first architecturally complete HSA development platform. In this case, the mention of the HSA 1.0 spec is important. That spec has long been a work in progress and is only just being finalized. Perhaps it's no surprise that only Carrizo meets its full demands. More concretely, Carrizo adds at least one relevant HSA feature that Kaveri lacks: GPU context switching for multiple processes. When it arrives, Carrizo will surely become the reference platform of choice for HSA development. (Whether or not HSA will gain any great traction with software developers, of course, is another question.)

Carrizo's real magic isn't listed in its spec sheet, though. Instead, it has to do with how AMD tackled a daunting engineering challenge: delivering meaningful improvements in chip density, performance, and power efficiency over Kaveri without the benefit of a die shrink. After all, this chip has to compete with Intel's Broadwell, which is fabricated on a much more advanced 14-nm process with second-gen tri-gate transistors. Carrizo is built on a 28-nm process using traditional planar transistors.

Yet AMD claims it has managed to squeeze out some substantial improvements over Kaveri. Overall, Carrizo weighs in at roughly 3.1 billion transistors, or 29% more than Kaveri, with "approximately the same" die area. Power use and performance, two sides of the same coin, are also apparently much improved in this new chip. The firm has achieved these gains using careful tuning for laptop-class power envelopes bolstered by various innovative techniques—and that's what AMD is sharing this week at ISSCC.

Excavator: heavy equipment gets streamlined
The Excavator CPU cores in Carrizo are the fourth generation of cores based on the initial Bulldozer microarchitecture. Each generation has improved per-clock instruction throughput and power efficiency over the last one, and Excavator is no exception. AMD estimates a 5% overall gain in per-clock instruction throughput over the prior-gen Steamroller core thanks to various changes.

We don't know what all of those tweaks are yet, but Naffziger did mention one change in particular: the L1 data cache has doubled in size while maintaining the same access latency. He also alluded to support for new instructions in Excavator, but without offering any further details. Excavator is rumored to add support for AVX2, which would boost performance in specific code paths that use SSE or the like, but the new instructions wouldn't contribute to a general performance increase. At any rate, we don't expect dramatic changes on the CPU architecture front from this generation of AMD tech. Those are likely reserved for the upcoming Zen microarchitecture, an all-new, x86-compatible core expected to supersede the Bulldozer family next year.

The most notable changes in Carrizo come not in architecture, but design. Naffziger said the Excavator team "stole some plays from the GPU playbook" by adopting a high-density design library traditionally used for GPUs. This library packs quite a bit more logic into a given amount of chip area. The examples below show some important parts of the Excavator core when laid out using a high-performance library a la Steamroller and a high-density library a la Carrizo.

The overall improvement in density is even more dramatic than one might expect from casual inspection of the examples above. The dual-core CPU modules on Carrizo occupy 23% less area than Kaveri's, even with the added features and the doubling of the L1 data cache's capacity—all on a 28-nm process.

That said, we are talking about very different looking chips, when all is said and done. The images below illustrate the layers used in a CPU-focused metal stack versus those used in a GPU-focused stack.

The Excavator team made a trade-off here, choosing the higher logic density of a GPU-style design over the clock frequency headroom afforded by a CPU-style design.

As the plot above attests, that trade-off makes sense for Carrizo because the chip is targeted to laptop-class power envelopes of about 15W. Not only does the high-density library reduce the chip area required for each CPU core (thus saving on costs), but it also yields some nice reductions in power use during low-wattage operation.

Notice that the crossover point where Excavator no longer beats Steamroller is at about 20W per dual-core module. That fact may help explain why AMD hasn't articulated plans to produce a socketed version of Carrizo for desktop systems. The chip's tuning probably doesn't translate well into desktop-class power envelopes of 65W or higher. Carrizo's benefits over Kaveri may be questionable in such scenarios.