Brand-new x86-compatible CPU architectures don’t come along very often, but we’ve recently seen the first extensive details about two essentially new, clean-sheet designs from AMD, the high-end Bulldozer architecture and its smaller, low-power sibling, named Bobcat. We’ve already covered Bulldozer at some length, and now we’re turning our attention to its pint-sized relative.
Brad Burgess, an AMD Fellow and the Chief Architect of the Bobcat core, offered the public a first glance at his team’s creation last week at the Hot Chips conference, and AMD has since released the slides from that talk to the media. We didn’t attend Burgess’ presentation, but we did speak with Dina McKinney, Corporate Vice-President of Design Engineering at AMD, about both of the new architectures. We’ll have a look at some of AMD’s slides below and attempt to point out some of what you’ll want to know about this promising new microprocessor.
Let’s start by situating Bobcat in the context of existing x86 processors. For most intents, this CPU will be AMD’s answer to the Intel Atom, a PC-compatible processor tailored toward keeping two key things in check: power consumption and manufacturing costs. As PCs become more mobile and range further into commodity territory, the need for capable, low-cost, power-efficient processors has become more evident, and the surprising success of netbooks as consumer products has strongly validated Intel’s approach with the Atom—even if that success hasn’t come in exactly the form Intel originally anticipated.
Like the Atom, Bobcat has been designed largely with synthesized logic, with what AMD calls a “small number of custom arrays.” This choice involves a fairly straightforward engineering trade-off. Larger CPUs like the Phenom II (and Bulldozer) use lots of custom-designed logic because it can be more efficient, yielding smaller die areas and superior power efficiency. GPUs and other application-specific chips tend to rely more heavily on synthesized logic because it can shorten design cycles and allow the chip to be ported to a different manufacturing process with relative ease. The extensive use of computer-generated logic should allow the Bobcat core to be refreshed with regularity and remixed into a range of different products for various markets, much as Intel has done with its various Atom platforms.
Bobcat’s portability is also crucial for how it will initially be deployed: as a part of the Ontario “APU” or accelerated processing unit, the first of AMD’s “fusion” processors that combine a CPU with a GPU on a single piece of silicon. Ontario will be manufactured by TSMC on the same 40-nm fabrication process used to produce current Radeon graphics chips. Thereafter, we’d expect Bobcat-based APUs to make the transition to new fabrication processes at a cadence similar to the traditional refresh rate for low-end graphics chips.
Don’t let the “fusion” label or talk of the GPU as a “SIMD engine array” confuse you: Ontario will not be a true hybrid processor that fluidly combines traditional serial-style CPU processing with data-parallel-style GPU processing into a novel programming model that achieves previously unseen performance heights. Much like current Pine Trail Atoms, Ontario will simply combine two low-power CPU cores and a modest GPU on the same chip in order to save on power, die area, and costs—not that there’s anything wrong with that.
In fact, Ontario has the potential to be substantially more interesting to computing enthusiasts than all of this Atom talk might seem to suggest. Architecturally, Bobcat employs a more aggressive out-of-order approach to instruction execution, which could allow it to retire quite a few more instructions per clock than Atom, on average. In other words, Bobcat could be a much faster processor than Atom.
AMD claims Bobcat will achieve an “estimated 90% of today’s mainstream performance in less than half the silicon area.” That’s a big hint, and we should unpack it a little bit to get a sense of what it tells us. The comparison being made here is between a dual-core Bobcat and the current Athlon II/Turion dual-core CPUs.
For a sense of the Athlon II’s performance, you might want to check out one of our recent CPU roundups, but the bottom line is that it’s pretty decent overall—an Athlon II X2 255 is similar to a Penryn-based Pentium E6500 and well over twice the speed of a Pentium 4 670. The X2 255 is more than up to the task of running modern games, too. If Bobcat reaches 90% of that performance—and that’s still a big “if” since we don’t know exactly how AMD is estimating performance or what clock speeds it will reach—then it should be plenty adequate for the vast majority of everyday computing tasks. We’re talking about performance similar to, or better than, Intel’s consumer ultra low-voltage processors, which are our current favorites for ultraportable laptops.
As for silicon area, today’s Athlon IIs are based on the 45-nm “Regor” chip, which has a die size of 118 square millimeters. A dual-core Bobcat implementation should weigh in at under half that, which is pretty small indeed. However, AMD is careful to point out that the “90% performance/under 50% size” estimate is not a statement about the whole of the Ontario chip, since that chip will include a GPU, too. (For reference, Intel lists the dual-core Pine Trail Atom D510, made on its 45-nm process, at 87 mm². That also includes a GPU.)
We don’t know yet exactly how Ontario’s GPU will look or what portion of the total die area it will comprise. We do expect it to be a true, DirectX 11-class Radeon with robust hardware acceleration of video playback for contemporary compression formats. Our guess is that on both the graphics and video playback fronts, we can probably expect Ontario to be markedly better than Pine Trail Atoms and potentially superior to Intel’s CULV offerings, as well.
The part of this picture that’s not yet complete is power consumption. AMD has only said that Bobcat will use “a fraction of the power” consumed by today’s mainstream CPUs and that the Bobcat core will be “sub one-watt capable.” After consulting with AMD, we take that statement to mean that a single Bobcat core can draw less than one watt while actively doing work, in the right configuration. That should be a nice starting point, considering that Intel’s first Silverthorne Atom products, which were 45-nm parts with a single CPU core by itself on a chip, ranged from 0.65W to 2.4W max, depending on the model.
For the fastest Bobcat variants, the upper limit on power could be much higher than that, depending on how the curves for clock frequencies and voltage work out. Of course, Ontario will have two cores, a GPU, and a video decode block onboard, too. The dual-core Atom D510 has a 13W max TDP, and some variants of Ontario might land in that neighborhood. Even a relatively poor outcome, with much higher power draw than Pine Trail, would still be a fraction of the 65W TDP of the Athlon II X2 255.
The picture that emerges from these estimates is pretty darned attractive, we have to admit. Ontario may well be a watershed commodity PC component—fast enough not to annoy most power users in casual use, small enough to be breathtakingly cheap, and capable of enabling generous battery run times in mobile systems.
Burgess’ Hot Chips presentation lays out Bobcat’s internals in some detail, and unlike Bulldozer, I don’t believe AMD is holding back any major bits of information about the architecture at this point, because the products are coming to market soon.
Overall, Bobcat looks to be a very modern processor architecture with a 13-stage main pipeline. The L1 instruction and data caches are 32KB, and the L2 cache is 512KB. As far we know, the L2 cache isn’t shared between two cores on Ontario. Dual-issue execution cores seem to be in vogue at AMD right now; Bobcat takes the same path as Bulldozer there. Instructions can be executed out of order, as we’ve mentioned, which should bring higher performance per clock than the Atom. The load/store engine is also out-of-order capable, with the ability to move stores ahead of loads.
Of course, the thing that most distinguishes Bobcat from current Phenoms or Bulldozer is its focus on keeping power consumption low. Ticking off check boxes on a feature list won’t always convey the impact of the thousands of little choices chip architects and designers make in fashioning a product like this one. For what it’s worth, though, Bobcat does include fine-grained clock gating, power gating, and a low-power C6 idle state. AMD has also used physical register files for local storage, to avoid the power overhead associated with dynamic register mapping.
One problem for low-power x86 processors is how to handle another sort of overhead, the sort created by decoding typically more complex x86 instructions into simpler “micro-ops” or internal instructions actually executed by the processor. Intel took a nearly CISC-like approach to the Atom in which 96% of x86 instructions are translated into single or “fused” dual micro-ops. AMD’s approach with Bobcat is similar. Burgess says 89% of x86 instructions decode to a single micro-op and 10% into a pair of micro-ops, while the remaining <1% of more complex instructions are handled in microcode.
Bobcat’s x86 ISA support is quite extensive, too, with support for AMD’s 64-bit extensions and all SSE versions up to SSE4A, including SSSE3. The newest extensions for floating-point math like AVX and FMA aren’t supported, but they don’t really square with Bobcat’s mission in life. Notably, Bobcat does support AMD’s secure virtualization instructions, which suggests this core might be employed as part of a cloud server platform at some point in the future.
Indeed, we’re fascinated by the variety of prospects for Bobcat to expand its mission beyond notebooks and low-cost desktops when AMD so chooses. Intel has quite explicitly stated that Atom will push into ever-smaller form factors as progress allows, rather than gaining additional MIPS and FLOPS. Over the next few years, manufacturing process advances and additional integration could eventually make it possible for Bobcat to fit into pocketable devices like smart phones, too, but AMD is keeping mum on that subject for now.
AMD could also choose to license the Bobcat core to third parties, much like ARM does with its processor architectures. Intel has tiptoed around the possibility with the Atom, but we’ve not seen much actual progress. Since AMD is now a pure-play design house—that is, it has spun off its manufacturing capacity into GlobalFoundries and focuses solely on chip design—licensing might make more sense for it than for Intel. In fact, AMD’s graphics unit, the former ATI, has some experience on this front, having designed the Xbox 360 GPU for Microsoft. The fact that Ontario will be manufactured at TSMC opens up some possibilities for collaboration with other TSMC customers, although we’d be surprised to see AMD go the licensing route in this first generation of Bobcat-based devices.
For those who’d like to see them, we’ve assembled Burgess’ detailed slides on the Bobcat microarchitecture in the image gallery attached to this story.