The recent turmoil at AMD has taken place against the backdrop of longer-term problems. AMD has long been the junior partner in a duopoly of PC-compatible CPU makers, with industry behemoth Intel taking the lead role and the lion’s share of the market. The smaller firm has been a scrappy competitor at times, but an embarrassing string of delays and problems with key products, like Barcelona and Bulldozer, has sullied its record in the past five years. During that same time frame, Intel has established and executed quite well on its “tick-tock” cadence of new CPU architectures and process technology shrinks. The result has been an erosion of AMD’s competitiveness and market share, with a predictable effect on the company’s balance sheet.
In short, AMD faces some daunting challenges that Read and his hand-picked executive team will have to confront. For months now, whenever we’ve asked for specifics on the firm’s new direction, we’ve been told to wait for AMD’s Analyst Day event. That day finally came last Thursday, and we got a sense of how this new regime plans to steer AMD through some rough waters and, one hopes, into a better competitive position.
The day started with a speech from Read that was part vision statement and part pep talk. Read’s demeanor seems earnest and youthful, although he tends to speak in a particular variety of corporate lingo that seems oddly indirect and can be somewhat difficult to parse quickly. That tendency may simply be a result of the strange task CEOs are sometimes given: to paint a picture in broad brush strokes without revealing too many specifics. Much of the day’s talk was simply the new management team sharing its assessment of AMD’s current challenges and opportunities with the outside world. What specifics there were to share, Read left to the other members of his executive team. Still, after consideration, we think he had some interesting things to say.
Before going on to address big-picture questions and possible new directions for his firm, Read made sure to underscore one central point: execution is key. Many of AMD’s problems in the past few years stem from an inability to deliver on its product plans, not from having the wrong plans in place. Read is keenly aware of that fact, and he said he has been working on getting AMD “into fighting shape.”
Read then launched into a bit of an overview of the market and AMD’s place in it. He described the current state of x86 processors as an “unhealthy duopoly” and hinted at how he intends to extract AMD from the situation. Rather than trying to capture the most complex technology possible, he said, AMD needs to “skate to the puck”—in other words, to focus on where the market is going and attempt to provide products to meet those needs. As an example, Read pointed to AMD’s success with the low-power Brazos platform for netbooks and cheap ultraportable laptops. Among Brazos’ virtues is that it’s easy to manufacture and “customers love them.” The unspoken subtext: Intel didn’t really contend for that portion of the market, failing to give Atom the graphics and video decoding horsepower it needed, probably because it wanted to protect its business in higher-margin laptop chips. AMD was able to capitalize by offering a better user experience, without pushing on bleeding-edge process technology and without attempting to squeeze out every possible ounce of performance. This sort of savvy product targeting—almost asymmetrical warfare, if you will—appears to be a big part of the new executive team’s focus.
Looking forward, Read alluded to a gathering storm: the conflict between a “large incumbent” and “emerging players” in the computing industry—that is between Intel’s PC businesses and the new order of “ubiquitous devices” largely powered by ARM-based systems-on-a-chip, or SoCs. He likened the coming transition to the conversion from mainframes to client/server computing in the 1980s, when “proprietary control points” eroded and prices plummeted. The implication: that Intel’s processor business could face a similar fate in a future where the x86 instruction set becomes much less important and established computer price points begin dissolving.
One might expect such a transition to be devastating to AMD, as the second-source supplier of x86-compatible PC processors, but Read portrayed it as an opportunity for his firm, instead. After all, AMD has over 7,000 engineers and quite a bit of competence in building computing platforms. In this context, Read would like to shift AMD’s focus toward pursuing business in places where the competition isn’t so suffocating and where the potential for growth is greater—from AMD’s current emphasis on mature markets, PCs, servers, and discrete graphics to a new focus on emerging markets, mobile clients, cloud servers, and embedded systems.
If such talk sounds like capitulation in AMD’s decades-long struggle against Intel, well, we suppose to some degree it is. Without pushing on process technology or trying to eke out the last few percentage points of instruction throughput, AMD won’t be contending against Intel for the high-end desktop or server CPU crowns. Then again, AMD hasn’t really been competitive on process technology or high-end CPU performance for quite some time, and it hasn’t consistently made money, either. By making the usual reality into official policy, Read may be able to defuse a perpetually difficult competitive situation and free up resources to pursue more profitable directions.
At the same time, we don’t expect AMD to “pull an HP” and abandon its core business. In the short term, through the end of 2013, Read’s focus is largely to deliver the products already on the roadmap. Many of those are too far along for the new team to make major changes to them, and they address established businesses. Longer term, we expect AMD to continue producing server CPUs and client-focused APUs while dipping its toes into “adjacent” markets—like embedded systems, smart TVs, and game consoles—using the same chips or tweaked variants of them.
Along the way, Read plans to employ several key tools, including a new “SoC-style” approach to chip creation intended to make AMD more agile, a more modest short-term roadmap that presumably frees up resources for other projects, and a programming standard called Heterogeneous System Architecture that should allow developers to exploit the mix of CPU and GPU power in AMD’s APU products. Read left it to his colleagues, CTO Mark Papermaster and Dr. Lisa Su, Sr. VP and GM of AMD’s Global Business Units, to explain the details.
A revised roadmap
Dr. Lisa Su provided an update on AMD’s public product roadmap, which now extends into 2013. Dr. Su opened by asserting that future AMD products will “align” to market trends and consumer needs. She described computing power as “practically free now” and said AMD wants to bring computing to lower power and cost points. With that said, Dr. Su also emphasized AMD’s commitment to several parts of its traditional core business, including APUs, servers, and “technology leadership” in graphics.
In fact, Su described graphics as one of AMD’s “crown jewels” that ends up being the “centerpiece of our roadmap.” Fittingly, then, graphics appears at the very top of the firm’s roadmap for the next couple of years, headlined by the “Southern Islands” chips that have already begun arriving in the form of the Radeon HD 7000 series. Cards based on the “Tahiti” GPU are already here, and the mid-range “Pitcairn” and low-end “Cape Verde” are expected before the end of this quarter.
Slated for 2013 is a new family of GPUs code-named “Sea Islands.” Like their predecessors, the Sea Islands chips will be manufactured on a 28-nm fabrication process, so most of the improvements to them will have to come from the revised GPU architecture and the compute-focused “HSA features.” Beyond that, we don’t know too terribly much about Sea Islands yet.
We’ve seen demos of it at multiple trade shows over the course of many months, and the second-generation “Trinity” APU, the replacement for Llano, continues to edge toward release. Also a 32-nm chip, Trinity will benefit mainly from newer internal components. The CPU portion of the chip will be comprised of a pair of dual-core “modules” based on the Bulldozer microarchitecture. In fact, the CPU cores in Trinity have been massaged to improve instruction throughput and assigned a new code-name: Piledriver. Trinity will be the first demonstration of the potential for tweaking this troubled new microarchitecture to better live up to expectations. Meanwhile, Trinity’s graphics core will be derived from the Northern Islands generation of products—i.e., Cayman and friends, also known as the Radeon HD 6000 series. Interestingly, Trinity’s video block will be borrowed from Southern Islands, so it will presumably include an H.264 encoding engine.
Altogether, AMD thinks Trinity will deliver a nice performance boost over Llano, and it intends to turn those gains into power savings. The firm expects one variant of Trinity to offer performance equivalent to a 35W Llano processor, but in a 17W power envelope. Those 17W parts will be targeted at the same sort of systems that will house Intel’s 17W Ivy Bridge processors: ultra-thin laptops, also known as ultrabooks.
Dr. Su offered some positive early indications for Trinity. She said design wins are “tracking ahead of Llano,” and that chips are already shipping to PC makers, with products due by the middle of the year. She even pulled out an example of a Trinity-based ultra-thin laptop, a reference design from Compal, to illustrate the potential there. Frankly, at a fairly uniform 18 mm thick across most of its chassis, that system looked a little bit chunky for an ultrabook. Still, we can imagine tolerating a little more bulk if Trinity’s Radeon integrated graphics can enable a decent gaming experience in such a system.
The low-power portion of the APU roadmap has seen some changes. Gone are the Brazos follow-ons “Krishna” and “Wichita,” originally slated for 2012. Those chips were expected to have up to four enhanced “Bobcat” cores and to be fabricated on a 28-nm process. In their place now is a minor revision of the current product, dubbed “Brazos 2.0.” Still a 40-nm chip, Brazos 2.0 adds support for USB 3.0 and for AMD’s Turbo Core dynamic clock frequency scaling.
Also coming in 2012 is an important new member of the family: “Hondo,” an ultra-low power Brazos-derived part that will be aimed at Windows 8 tablets. Extending the Brazos platform into ULP territory, with TDPs half that of current parts, may prove difficult, but Dr. Su expressed confidence that we will see Win8 tablets with AMD silicon. More importantly, this attempt will be the first of many from AMD to contend in this space. In fact, Su explicitly described an aspiration to take x86 processors into power envelopes below 2W, which she said is “absolutely” feasible, although such a product isn’t on the near-term roadmap for 2012 or 2013.
Not mentioned in Dr. Su’s speech and only buried in the pre-briefing slide you see above is one other new chip for 2012: “Vishera,” a conventional desktop processor for Socket AM3+ motherboards. Yes, AMD still has plans on this front, in part because high-end desktops share silicon with the server product lineup. Vishera will be a 32-nm part with up to eight Piledriver cores and no integrated graphics. If the tweaks to those cores prove to be effective, Vishera could restore at least some of AMD’s competitiveness on the desktop.
2013 looks to be a very busy year for AMD, full of transitions to revamped CPU cores and new process technology. The “Kaveri” APU is scheduled to supplant Trinity, with up to four x86 processor cores based on “Steamroller,” the continued evolution of the Bulldozer microarchitecture, and graphics based on the GCN architecture inside the Radeon HD 7000 series. Kaveri will include some special sauce for the APU-focused HSA programming model, and it will be one of several 2012 APUs fabricated at a 28-nm process node. For what it’s worth, Dr. Su pegged Kaveri as the first APU with aggregate compute power in excess of a teraflop.
In the low-power domain, “Kabini” looks like the true spiritual successor to Krishna, a 28-nm APU with up to four enhanced “Jaguar” CPU cores and GCN-derived graphics. Kabini will also integrate south bridge I/O components, making it what Su called AMD’s “first real SoC.” Its sibling, “Tamesh,” will inhabit the ULP domain, with only two CPU cores.
Notably, the roadmap shows no planned successor to “Vishera,” the eight-core discrete desktop CPU. That probably means AMD’s Socket AM3+ offerings will have to survive the duration of 2013 with the same silicon available at the end of 2012. The fate of AMD’s products in this segment will most likely be determined by whatever AMD decides to do on the server front.
Speaking of which, here’s a look at AMD’s revised server plans. Notice that the 10-core “Sepang” processor and “Terramar,” its dual-chip derivative, have been canceled and replaced with the 8-core “Seoul” and its dual-chip variant, “Abu Dhabi.” Also notable by its absence is any mention of a transition to a new socket infrastructure. That means AMD’s next round of server chips should slide into the same C32 and G34 sockets we’ve known for several years now.
Keeping these products at “only” 8/16 cores probably makes sense in the context of the current sockets’ bandwidth limitations. Our larger concern is how AMD will remain competitive in the face of Intel’s soon-to-be-released Sandy Bridge-EP processors. The desktop version of that chip is already incredibly formidable, and desktop workloads offer much less opportunity to flex the massive amounts of I/O bandwidth—40 lanes of PCI Express Gen3—connected to each socket of a Sandy Bridge-EP system. We can’t talk about all of the details yet, but Intel appears to have captured some very nice power and performance benefits from the integration of high-speed I/O onto the processor die. AMD can’t follow suit until it transitions to a new socket, and that evidently won’t be happening before the end of 2013.
AMD still anticipates a future for its high-performance x86 CPU cores well beyond 2013, though, as illustrated by this slide showing, vaguely, how Opteron cores will evolve over time. Once we get to 2014 and beyond, we expect the direction set by the new executive team to begin taking hold in earnest. As that happens, AMD’s server chip portfolio may expand to include some non-traditional products targeted at specific workloads. For a better sense of how AMD’s roadmap might look into 2014 and beyond, we should look at what the company means when it says it’s moving toward an SoC-style design methodology.
The meaning of an “SoC-style” approach
Our understanding of AMD’s newly adopted approach to creating products came into sharper focus when we had the chance to participate in a “fireside chat” with CTO Mark Papermaster and a small group of journalists last Friday.
Papermaster opened by explaining his role at AMD; he is wearing two hats, acting as the Chief Technology Officer who sets the firm’s long-term technology direction and also running the development team. He told us he’s taken on both roles since it’s so important for AMD to execute well on its plans. Throughout the conversation, although he was willing to talk pretty freely about technology and ideas, Papermaster kept returning to the theme of solid execution as his top priority.
Hand in hand with the talk of consistent execution, Papermaster sounded several themes to describe AMD’s goals, including agility, flexibility, and architectures that are “ambidextrous.” At the heart of it all is a different approach to building chips, one that is borrowed from the world of low-power and embedded system-on-a-chip (SoC) products that are becoming nearly ubiquitous in smartphones, tablets, consumer routers, embedded systems, and a whole host of other devices.
SoCs are often assembled from blocks of custom logic—referred to as IP or intellectual property—whose basic design is licensed from a third-party provider. Think of a smartphone chip that incorporates CPU cores from ARM, graphics from Imagination Tech, baseband communications tech from another provider, and so on. Many different chip companies combine these basic IP building blocks into various configurations tailored for certain requirements. The IP blocks can be mixed and matched with relative ease because they all share a common, industry-standard communications interconnect.
We’ve seen aspects of the IP-based SoC approach in one part of the PC market over time: core-logic chipsets, where specific I/O blocks are often licensed from third parties and incorporated into support chips. Generally speaking, though, PC processors have been proprietary affairs. AMD’s Llano, for instance, combines Phenom-class x86 CPU cores with Radeon graphics, an in-house north bridge, and AMD’s own memory controller. Sandy Bridge is built largely from Intel’s proprietary tech, as well. PC chip designs have become increasingly modular in recent years, but that modularity is relatively limited. Papermaster explained that AMD’s current chips are not built from IP blocks that have been expressly tailored for re-use.
Going forward, Papermaster envisions a common interconnect that AMD can deploy across its entire lineup. This interconnect will be high-speed, low-power, and capable of sustaining memory coherency across multiple logic blocks. The interconnect will act as glue for AMD’s various types of IP, whether it’s graphics, CPU cores, video encoders, or what have you. The idea is to allow the firm to mix and match its assets, easing the creation of chips based on its core technologies.
Although this interconnect will necessarily have to be proprietary in order to feed AMD’s high-performance CPU and GPU cores, Papermaster said it will have a bridge to the AMBA interconnect created by ARM and used by other SoC providers. That fact opens up all sorts of intriguing possibilities, including the incorporation of third-party IP into AMD silicon, either as a means of adding new features or, more likely, as part of an effort to build a chip tailored for a specific customer.
Now, he told us AMD is “investing very heavily in emulation technology” in order to perform validation on the various IP blocks it has in development. The idea is to move bug discovery earlier in the process, before the whole chip comes together.
Even with some modularity in its current chips, the move to an SoC-style approach appears to involve a fairly noteworthy change in the company’s operations. Papermaster told us he has restructured his organization to fit this strategy. In a separate conversation, Graphics CTO Eric Demers also asserted that the change requires a true shift in mentality compared to AMD’s prior methods. As an example, Papermaster said AMD’s product validation efforts have, in the past, largely focused on testing an entire chip. Now, he told us AMD is “investing very heavily in emulation technology” in order to perform validation on the various IP blocks it has in development. The idea is to move bug discovery earlier in the process, before the whole chip comes together.
First and foremost, the new executive team expects this modified method of building chips to make it easier for AMD to deliver on its product roadmap commitments. Beyond that, it may also enable new combinations of AMD IP and open up new business opportunities.
That’s especially true for AMD’s server products, where “workload-optimized” processors are a big part of its future plans. Rather than competing directly with Intel’s formidable Xeon processors in every case, AMD hopes to win business by building more varied processors targeted at specific types of workloads. With two x86 CPU core development tracks—high-performance and low power—and its increasingly compute-capable GPUs, one can envision many possible combinations. One possibility is a future server chip with a modest contingent of Opteron x86 cores for integer math and a boatload of FLOPS supplied by a host of GPU compute units. Another option Papermaster mentioned specifically is an ultra-dense server processor comprised of a large number low-power Brazos cores. Either of those processors might be better suited to a specific application than a stock Opteron or Xeon. This sort of targeting looks like it could make quite a bit of sense given the way server-class workloads have diverged in recent years. Segments like HPC hinge almost entirely on FLOPS and memory bandwidth while others, like cloud providers, require energy efficiency and scalable performance with lots of integer-focused threads.
When pushed for specific, non-theoretical examples of how AMD might incorporate third-party IP into future chips, Papermaster offered one scenario related to a “smart TV” product. Such products need some compute power, good display technology, and video codec hardware. They also have the very attractive property of being high-volume parts, so they could offer the economies of scale needed to make a chip business workable. The relationship between AMD and a customer, say a big consumer electronics firm, might start with AMD supplying a discrete GPU that the customer would pair with its own applications processor. Later, these components might be integrated into a single chip, provided by AMD, where Radeon graphics and video processor tech share die space with third-party IP.
AMD is now open to the possibility of such integration, where in the past, it probably wouldn’t have been (although it does have some history of making custom GPUs for game consoles.) Papermaster was careful to explain that such relationships are likely to be few—he said there won’t be “hundreds of customers,” not even “dozens.” But AMD appears to be working toward some new types of relationships with select customers, made possible by a newfound willingness to combine its own technologies with those invented elsewhere.
Yes, such a relationship could mean that an ARM CPU core could be combined on the same silicon with, say, Radeon graphics. AMD clearly opened the door to that possibility and talked openly about “ISA flexibility” as part of its new strategy. Still, the firm’s public roadmap mentions only x86-compatible processors for the time being, and we don’t know of any specific plans for AMD to produce an ARM-based SoC to compete with the likes of Nvidia’s Tegra lineup. All we really know is that AMD’s new leadership is expressing an openness to new types of products and business relationships. We don’t have many specifics so far, and we’re unsure how many of the ideas being kicked around will turn into products. If they do, they’ll most likely become visible once AMD exposes a public roadmap for 2014 and beyond.
On the question of high-performance CPUs
All of the talk about not pushing “the bleeding edge” on process tech and not trying to eke out the last few bits of performance—along with the lack of emphasis on traditional high-end desktop and server CPUs—left us wondering about AMD’s intentions for its x86 processors. It’s one thing to deemphasize an area where competing is difficult and quite another to quit contending there. Several of us asked questions related to this topic in an attempt to gauge AMD’s commitment to pushing forward on x86 performance.
The Bulldozer microarchitecture, obviously, has some performance issues. Worryingly, AMD’s message at the time of the FX processor’s introduction was that future Bulldozer-based processors would see 10-15% performance gains each year, starting with Piledriver. Given where Bulldozer has started, that plan now looks like a recipe for failure, in light of Intel’s recent trajectory. Encouragingly, when asked about Bulldozer’s prospects, Papermaster pulled out the 10-15% estimate without being prompted and disputed it: “We need more than that. We’ll get more.” Also, although the 2012-2013 products are “in delivery mode,” he hinted at the possibility of a new socket in the next generation of server CPUs.
One of Bulldozer’s big weaknesses right now is its performance in individual threads. The CPU does relatively well on some broadly multithreaded workloads, but its IPC (and thus performance) in each thread is often relatively poor. David Kanter attempted to tease out the new CTO’s thoughts on single-threaded performance by asking what sort of gap with Intel is acceptable. 15%? 30%? More? Papermaster wasn’t willing to give us a number, but to our relief, he didn’t attempt to argue that single-threaded performance is unimportant, like some of AMD’s marketing folks have been doing. Instead, he said he refused to give a number because he “didn’t want to cede anything” to his development team. In other words, it looks like AMD will continue to push ahead on this front, even with the change in business strategies.
In the wake of the FX processors’ release, some folks began speculating about whether AMD had abandoned its traditional use of custom logic design for broader use of logic synthesis. The speculation was fueled by Bulldozer’s apparent inefficiencies and especially by the massively inflated Bulldozer transistor count AMD supplied to the press. This question is also something of a hot topic because, to one degree or another, most semiconductor companies are employing logic synthesis more extensively over time. When asked for his take, Papermaster acknowledged that AMD’s x86 CPU cores (outside of Brazos) have been “highly customized up to this point” and said that, historically, there has been a “huge gap” between custom and synthesized logic. However, he asserted that the gap has “come down significantly” and that a new emphasis on synthesis will begin affecting AMD’s roadmap in 2014.
One final tool that AMD will use to maximize its potential as a supplier of both CPUs and GPUs—and of products with both elements on a single chip—is something it calls the Heterogeneous System Architecture, or HSA. HSA has replaced “Fusion” in AMD’s lexicon, but happily, it’s a much more specific thing, with real technology behind it and a vision for realizing the potential of APUs.
Fundamentally, HSA is a software development target platform intended to allow applications to take advantage of both CPU and GPU computing resources on “converged” chips like AMD’s APUs. HSA has several components, including a virtual ISA, a memory model, and a system specification. The ISA, known as HSAIL, is conceptually similar to the PTX ISA in Nvidia’s CUDA, which provides a stable, fairly low-level programming target that still allows major changes in GPU architectures over time. HSAIL instructions will be translated into true machine code by a just-in-time compiler provided by the hardware vendor. Unlike CUDA, though, the HSA memory model and system specification will take into account the capabilities of APUs and other SoCs whose CPUs and GPUs can share the same memory.
HSA differs from familiar names like OpenCL and C++ AMP because it is a lower-level platform, even a possible compile target for apps written in OpenCL. As we understand it, HSA’s goal is to make a common virtual machine with easy access to all available computing resources. AMD expects it to be programmed just like current SMP systems, with “seamless” access to CPU and GPU execution resources using the same basic syntax. The abstraction layer should handle the details of what gets processed where, bringing CPU and GPU computing resources to bear on the data as appropriate.
Crucially, HSA will be ISA agnostic not just for the GPU, but for the CPU, as well—so an application written for HSA could run just as well on an x86/Radeon combination like Trinity as on, say, an ARM/Imagination Tech combination in a tablet.
AMD hopes to turn HSA into an open, industry-wide standard. To that end, the company has established a foundation much like the ones that govern other standards, and it has invited other hardware, software, and OS developers to join. So far, the firm says it’s hearing good things from its customers, but we’re not aware of any companies that have joined yet. Assessing the prospects for such an effort is notoriously difficult, but if it somehow takes off, HSA could become an incredibly important standard, perhaps the first to allow CPU-GPU convergence to begin realizing its potential in consumer applications—while undermining a host of competing standards, everything from CUDA to x86. If not, well, as HSA point man Manju Hegde explained to us, it could still be a useful tool for enabling development on AMD platforms.
AMD has published a roadmap for HSA, which is interesting because it suggests what capabilities will make it into future APU generations. The addition in 2012 of “bi-directional power management between CPU and GPU,” for instance, should be a Trinity feature. Looks like AMD will be progressively exposing features as it incorporates them into its APU hardware over time. Also, notice that HSA won’t be extended to support discrete GPUs until 2014. Initially, this effort is very much about taking advantage of APUs and the ease of programming made possible by shared memory in APUs and other SoCs.