The Carrizo processor is AMD’s follow-on to Kaveri and a direct competitor to Intel’s Broadwell CPUs. After a lengthy prelude, AMD is officially taking the wraps off of Carrizo today at the Computex trade show in Taipei. The firm expects laptops based on Carrizo to be available near the end of this month, and now that the chip is official, we know a number of juicy details about it that had previously been murky.
The long and short of it is that Carrizo is a quad-core processor with integrated Radeon graphics. Unlike prior “big-core” CPUs from AMD, this one is a true system on a chip (SoC), with an integrated south bridge I/O section and no need for a companion chipset. Chips like this one, which AMD calls “APUs” or accelerated processing units, pack a ton of complexity into a single die. To give you some idea, have a look at Carrizo’s basics, neatly baked into a stat sheet from AMD.
Carrizo’s highlights include a next-gen x86-compatible CPU core code-named Excavator and revised Radeon graphics based on the third revision of the GCN architecture.
Unlike Kaveri before it, Carrizo will not be making its way into socketed desktop form, largely because it’s tuned for optimal operation in a frugal 15W power envelope. This new APU’s benefits likely wouldn’t translate well into full-sized desktop systems. Instead, AMD has chosen to target laptops, all-in-one desktops, and small-form-factor systems with power envelopes of 35W and lower. Smartly, the firm has also decided to consolidate the infrastructure for its high-performance and low-power APU lines into a single setup. Upcoming processors known as Carrizo-L, similar to the low-power Beema APU, will be drop-in compatible with Carrizo motherboards, so system makers can mix and match AMD processors as they wish.
You may have noticed in the specs table above that Carrizo is manufactured on a 28-nm fabrication process. That’s the same basic generation of process tech as Kaveri before it, and it’s a far cry from the 14-nm tech Intel is now using for Broadwell. Fortunately, AMD tells us it has managed to squeeze some formidable improvements into this new APU regardless. In fact, I’ve already covered Carrizo’s advances in power efficiency and transistor density in some depth, so go read that article if you’d like to know more. I’ll try not to repeat too many of those claims here. Instead, I’ll focus on new information, including filling in some details about Carrizo’s various components and the overall performance of the chip.
Excavator digs in
The newest member of AMD’s Bulldozer lineup of x86-compatible CPU cores is Excavator, which debuts exclusively aboard Carrizo. Like the past “heavy equipment” cores from AMD, Excavator includes a number of targeted tweaks intended to improve its performance and power efficiency. AMD’s architects estimate Excavator’s performance has improved between four and 15% on a clock-for-clock basis, and the firm is quick to point out that this new core occupies no more silicon die area than Piledriver did before it.
To achieve those gains and save on die space, the Excavator team slashed the size of the L2 cache from 2MB per dual-core module on Steamroller to 1MB per module here. They then doubled the size of each core’s L1 data cache to 32KB while keeping access latencies the same. That tradeoff between L1 and L2 cache capacities was apparently a worthwhile one. They also found ways to reduce the caches’ power consumption, such as clock gating, that cumulatively produced a 50% power savings. Here’s a look at the dynamic power use of the L1 caches in Excavator versus, presumably, Steamroller, straight from AMD’s presentation:
Beyond the cache tuning, the team improved the core’s branch prediction accuracy by growing the size of the branch target buffer from 512 to 768 entries. They also implemented a fast-flush capability in the floating-point unit that, I presume, allows the pipelines to more quickly recover from a branch misprediction.
Furthermore, Excavator adds support for some new x86 instructions, including the AVX2 suite and MOVBE, SMEP, and BMI1/2. Applications that employ those instructions could see some nice increases in performance and efficiency.
The final bit of goodness AMD built into Excavator is something that has been sorely missing from these big APUs: support for low-power standby modes. These deep sleep states enable features like Windows 8’s Connected Standby and Windows 10’s InstantGo, although those are apparently two names for the same basic thing. The idea is to bring smartphone-style sleep modes with periodic wakeups for notification checking to Windows-based systems. Intel’s CPUs have supported this capability for several years at least. Interestingly enough, AMD tells us that the InstantGo capability on Carrizo systems will make use of the ARM Cortex-A5 CPU built into the security processor portion of the chip in order to control wake and sleep behavior with minimal power use.
Carrizo’s combination of better per-clock performance, improved power efficiency, and higher clock speeds translates into some measurable gains in CPU performance. Here’s a look at some results AMD supplied from Cinebench, an FPU-intensive image rendering benchmark.
True to its billing, Carrizo shines brightest in the 15W power envelope, where it’s up to 55% faster than Kaveri. The lion’s share of the gains come from higher clock frequencies, while the remaining 10-15% is attributable to increased per-clock throughput.
A beefier GPU based on a better GCN
The Radeon graphics units in this new APU are based on the third generation of the GCN architecture, and they’re most closely related to the discrete Tonga GPU that powers the Radeon R9 285. The core graphics technology isn’t terribly different from what AMD first shipped aboard the Radeon HD 7000 series, but the firm has made some incremental improvements over time in areas like tessellation performance.
Carrizo inherits the delta-based color compression scheme first used in Tonga. This logic conserves memory bandwidth by storing color data in a losslessly compressed format. AMD claims this feature alone can improve performance in games by five to seven percent at the cost of a “modest” amount of additional silicon area.
This SoC’s graphics block consists of eight GCN compute units tied to two render back-ends and 512KB of dedicated L2 cache for graphics. All told, that amounts to 512 stream processors, 32 texels per clock of texture filtering capacity, and eight pixels per clock of ROP throughput. AMD cites a peak arithmetic rate of 819 gigaflops, presumably for the fastest 35W model of Carrizo.
Notably, the graphics portion of Carrizo occupies its own separate voltage plane on the chip. Thus, it can operate more efficiently by selecting the optimal voltage for its own needs. As a result, this chip can squeeze into smaller power envelopes while keeping all eight of its GCN compute units active. Those units can run at higher peak frequencies, as well.
These changes add up to concrete gains in performance over the prior generation, again especially in the 15W power envelope.
AMD bills Carrizo as the first HSA 1.0 compliant APU, a claim that may seem strange to those familiar with HSA, since AMD has been talking about it for years. However, the 1.0 spec is a new thing, and Kaveri didn’t implement a few features needed to meet its requirements, such as GPU-based context-switching. Carrizo adds that capability and thus earns its billing.
This APU’s support for HSA should position it as an obvious development platform for HSA-based applications, but I don’t believe Carrizo has the necessary plumbing to deliver on HSA’s performance promises. We’ll likely have to wait another generation or two, until APUs get a cache-coherent common interconnect fabric and shared last-level caching, before the hardware is ready to match AMD’s vision.
Better video processing hardware
The UVD block on AMD’s chips hosts dedicated logic meant to accelerate the encoding and decoding of popular video compression formats. Carrizo’s UVD unit looks to be the very latest revision of AMD’s technology in this space. It adds the ability to decode the HEVC/H.265 format expected to be widely used for streaming in the future, especially for 4K content.
AMD has quadrupled the throughput of the UVD block, too, in order to support 4K video properly. The firm demonstrated flawless 4K video playback with low CPU utilization back at CES in January on early Carrizo silicon. This is one area where Carrizo’s feature set clearly trumps that of Intel’s Broadwell. For 1080p video playback, the APU is now able to spend about 25% of its time doing the actual decode work and the rest dropping into a power-gated sleep state, saving energy.
Another video-focused enhancement in this SoC is the addition of a dedicated, high-quality image scaling unit in the display pipeline. Older AMD APUs have used the GPU’s shader units to scale images to fit the target display’s resolution, but doing so burns power. By switching to custom scaler hardware, the firm reckons it saves about half a watt compared to GPU-based scaling.
All in all, AMD estimates that Carrizo cuts power consumption during 1080p video playback roughly in half, taking it down under five watts.
Here is a terrifying slide from the AMD presentation that shows older APUs lasting only 3.3 hours while playing 1080p videos on a 50Whr battery. I’m not sure what to make of that.
A bit of the performance picture
AMD’s presentations are rife with performance results from various scenarios, but I’m naturally skeptical of performance claims in the absence of actual hardware to test—and AMD tells us we won’t likely see that until perhaps July. I’ll offer a couple of slides, though, in order to give you a sense of what the firm expects users to see from Carrizo-equipped laptops.
Yes, it can play DOTA 2 and LoL. And yes, it’s faster than Haswell-based Core i5 and i7 processors (at least in 3DMark).
I think the question on everyone’s minds is whether it’s faster than Intel’s just-introduced Broadwell Core i5/i7 processors. Although AMD has traditionally led the market in integrated graphics performance, those Broadwell chips look to be pretty formidable, thanks in part of 128MB of eDRAM serving as a graphics cache. Then again, the Iris Pro Broadwell processors have 47W power envelopes, above the 35W peak for the first round of Carrizo parts.
The speed and feeds—and where we’ll find these APUs
This would be the part of the article where I’d show you a list of Carrizo models, specs, and prices. Unfortunately, despite several requests, we don’t yet have that info from AMD.
I can tell you that Carrizo products will range from 12W to 35W. The peak CPU boost clock on the fastest 35W Carrizo variant will be 3.4GHz, and the GPU boost clock will be 800MHz. DDR3 memory speeds will range from 1600MHz to 2133MHz, depending on the power envelope. The 15W parts will use 1600MHz memory, for instance. We’ll try to add specific speeds and feeds for the first Carrizo-based products when we receive that info.
One dose of reality that we got from our discussion with AMD about Carrizo is where we can expect to see these chips deployed in consumer systems in the coming months. The company is targeting “mainstream clamshell laptops,” likely those in the $400-700 range. PC makers sell those things by the boatload, but they’re not exactly the sexy, image-defining systems that tend to hog most of the attention these days. Intel seems to have locked up the lion’s share of the design wins for convertible systems like the Microsoft Surface or the Transformer Chi T300. Mainstream laptops have grown thinner, but they’re not the thin-and-light class of systems that are so often labeled Ultrabooks (an Intel brand).
AMD expects the first Carrizo notebooks to arrive near the end of June. Those systems will likely be basic clamshells. A little later, in early July in North America, we should start to see some thinner form factors and eventually a couple of ultrathins. At least one PC maker has a Carrizo-driven convertible system in the works, slated to hit the shelves at Best Buy in mid-July. After that will come systems that team the APU’s built-in graphics with a matching discrete GPU for added gaming power, along with all-in-one desktops and small-form-factor systems.
AMD’s challenge is to capture a chunk of the PC market with Carrizo going into the crucial back-to-school and holiday seasons. That first season should officially kick off with the Windows 10 launch on July 29. Assuming it doesn’t see any delays, Carrizo should at least be present in consumer systems at the right time. We’ll have to see whether the systems built around it will prove compelling to the folks doing the buying.