news papermaster details new amd mobile server roadmaps

Papermaster details new AMD mobile, server roadmaps

AFDS — At the closing keynote of the Fusion Developer Summit in Bellevue, Washington today, AMD Executive VP and CTO Mark Papermaster gave us a glimpse of AMD’s latest roadmaps for its 2013 mobile APUs and server CPUs.

AMD has three APUs in the pipeline for next year—all of which Papermaster says will deliver "no-compromise solutions" for tablets and fanless notebooks. At the high end, Kaveri will deliver four cores based on AMD’s Steamroller architecture (the successor to Bulldozer and Piledriver) and integrated Radeon HD graphics. Papermaster said Kaveri will be the first AMD APU with fully shared memory and virtual shared memory. AMD expects Kaveri to fit into 15-35W thermal envelopes and to populate 13.3-15.6" notebooks with thicknesses of 0.83" or less.

For cheaper and lower-power applications, AMD is prepping Kabini, the successor to today’s Zacate and Ontario APUs. Kabini will feature four Jaguar cores (Jaguar being, of course, the successor to Bobcat), Radeon HD integrated graphics, and power envelopes in the 9-25W range. Kabini-based notebooks should have 11.6-15.6" screen sizes and thicknesses in the vicinity of 0.71-0.94".

Last, but not least, will be AMD’s next-gen APU for tablets: Temash. Like Kabini, Temash will feature four Jaguar cores and integrated Radeon HD graphics. However, it will squeeze into thermal envelopes as slim as 3.6W. AMD says Temash will power 10-11" tablets as thin as 0.39".

On the server front, AMD plans to launch the Abu Dhabi, Seoul, and Delhi CPUs later this year. These chips will succeed Interlagos, Valencia, and Zurich, respectively, fitting into the same G34, C32, and AM3+ sockets. The memory configurations will remain the same, as will core counts. From what I can tell, the only difference will be a substitution of Bulldozer cores for Piledriver ones. (Piledriver is currently found in AMD’s Trinity APUs, but existing Opterons are still Bulldozer-based.)

Papermaster also communicated a desire for AMD to be more agile and to get new solutions to market more quickly. He brought up yesterday’s announcement about ARM’s TrustZone security IP finding its way into future AMD silicon. Speaking more broadly, Papermaster said AMD’s strategy involves "flexibility around ISA." ISA stands for Instruction Set Architecture, so this might hint at even deeper collaboration with ARM—like, say, coupling ARM CPU cores and Radeon graphics without an x86 CPU. Considering that ARM is espousing AMD’s Heterogeneous Systems Architecture and backing AMD’s HSA Foundation, such a combination doesn’t seem unlikely.

0 responses to “Papermaster details new AMD mobile, server roadmaps

  1. [quote<]AMD is Gaining Momentum at an Increasing Rate.[/quote<] Yes, but in what direction is this momentum heading?

  2. I think they use it only for multimedia as of now and did not opened it at large to developpers. But I could see it giving a hand to GPGPU stuff.

  3. I personally question just how beneficial Intel’s L3 is to graphics. If tons of Cache we’re that great nvidia and amd would have put them on their chips already. Even the biggest multibillion transistor chip only has ~1MB of cache. those transistors are far better spent on the graphics part. Intel has even add another level of cache in the graphics processor to further isolate the graphics processor from the L3 so as to not totally choke up the ring bus with graphics data.

  4. Yes I got that they can’t communicate directly using the cache, but I think the cache still has a chance to improve GPGPU and so is a fast ring bus communications. Tough I really don’t know what’s the kind of delta of CPU to GPU communications there is between AMD APU and Intel Ivy, seems like the ring bus is faster.

  5. Great point ! it could be done for sure, but I still have doubts about battery life in that form factor for this product at this point in time. Would be glad if it turns out I’m wrong.

  6. [quote<]A node shrink generally enable about 40% less power consumption.[/quote<] Yes, but that's not all it is. They will also be moving to high-k gates. And that will apply to the entire chipset. Atom went from a 10w single-core to a dual-core SoC in a phone with a much smaller change. Just look how much battery life increased from the original Atom platform after moving only the northbridge to 45nm. And look what the same process AMD will be using is doing for phones right now. A dual-core can outpace a quad-core, [i<]and[/i<] save battery at the same time.

  7. The server APU’s seem very interesting. I won’t be surprised if/when AMD is planning to put them in their SeaMicro high-density servers. Should also be attractive to some HPC companies, no real need for a GPGPU card, just plop in the APU in two or four socket boards with really fast memory, along with your high-speed interconnects, should be flying when you turn it on.

    SemiAccurate did post a small article on SeaMicro boards having an Opteron chip:

    [url<][/url<] And according to here: [url<][/url<] (replay day 3 and scroll down to 11:06) You will see slides detailing the Opteron CPU on that SeaMicro board. They also mentioned that they are also working on a version with an APU on the board as well. Not real hard when you connect those articles with the slides in this article with what AMD is trying to do with their APU future. Combine that with their HSA foundation, I really feel that AMD is breaking new ground here. I can't wait for 2013/2014.

  8. It was my understanding that IB despite the fact it shares the physical L3 cache for the cores and the IGP, in fact we’re not talking a bout real “shared” memory – the IGP gets a slice of L3 and the cores get another part of L3. Moreover IGP’s memory adress space is exclusive, aka it has reserved a piece of xMB of RAM that can’t be used by the cores, at least not at the same time. As such IB’s IGP and Cores exchange data thru the ring and PCIe, very little else – which is prefectly fine for IB. The classic issue is that GPUs don’t care about memory latency, they like bandwidth, and CPUs are exactly the other way around – so that’s why it’s pretty hard to unify memory adress space in such a way that both are properly fed. One can hope AMD can pull it off next year, because Intel is surely gonna do it at some point.

  9. Well Ivy already has a cache shared with the IGP and a presumably much faster ring bus communication and also support OpenCL and Directcompute. So those new developpement model won’t nessecerily leave Ivy behind in performance. Even some application that are OpenCL accelerated runs faster in software on Intel hardware than equivalent AMD APU. Note that accelerated part might be modest investment on software side as well. On these software, maybe the HD4000 wouldn’t even be the bottleneck, which would probably still favor Intel.

    By the time OpenCL and Directcompute will be revelant to most users, Haswell or Broadwell might be able to make a decent showup on that front too. I sure hope AMD will retain a commandable lead on GPU until then.

  10. These numbers are from a 28nm production I presume. I don’t think even at 22nm a Brazos derivative could take place in a smartphone. This is still a piece that is more power hungry than Atom in absolute values, mostly for being out of order. It need to be below 3.9W near 1w in fact. A node shrink generally enable about 40% less power consumption. Presumably if architecture is powergated more effectively etc there is some other possible gain, but still. Personnaly I think it will have to wait for 14nm or an AMD much more open to HSA than what is revealed here (think going ARM in that segment).

    By then I also expect Intel to redesign Atom for OoO as well, they have ressources for getting back to drawing boards. Interesting times.

  11. AMD is Gaining Momentum at an Increasing Rate. I Think The Heterogeneous Route is a Genius one and Their Effort in getting the Software part along with Hardware to work as a team would Pay Off soon. I won’t be surprised to see them being a serious contender in the performance segment once they have a unified memory between GPU and CPU. It’s quite strange why their competitor didn’t adopt that route as well since I think it’s really the future of computing. Once they start rolling the game at full momentum.. it would be something like that… People would be fused on their Fusion experience.

  12. I am not too familiar with the detailed economic organization of chip manufacture, but generally, advanced industrial production is done by networks of companies rather than a single company (a modern car includes contributions from roughly 5000 companies). Even Intel is certainly dependent on a whole army of external expertise (from wafers to etching machinery and so on). These companies are probably also working with GF, TSMC and so on and are part of the shift from one process node to the next. This means that they also have to adapt and invest. The process leader (Intel) therefore not only incurs an internal cost for being “first” but also his partners have to constantly invest to keep pace. “Slower” manufacturers can very probably profit from this initial investment and have therefore a lower cost to reach the same node level as the initial pioneer.

    Long story short, Intel certainly profits a lot from its lead in process technology, but that lead also comes at a price as the technological trajectories of chip manufacturers are not completely independent. Slower actors profit from innovation spillover. This is part of why AMD is able to survive (more or less) by selling relatively cheap chips.

  13. I don’t know if it’s just me, but I don’t find all these announcements very exciting. Perhaps part of me thirsts to know what AMD has in the works to succeed the Bulldozer architecture sometime in 2015, if they even plan to continue doing x86 seriously.

  14. And mobile silicon ≠ server silicon, so gourd only knows when Broadwell Xeons materialize.

    Of course, I think that’s referring to Atom, but they have so many of these totally different chips now that require a particular manufacturing process variation. It’s just spreading the roll out of a new node across even more years than it was already taking.

  15. Hopefully by then there wil be some more applications that make use of OpenCL or Directcompute to really make it compelling with GCN gpu and shared memory. What would be really nice is if Unreal 4 engine or Crytek 3 made use of it in it’s engine extensively. This and multimedia transcoding or content creation both of which are taking place now, but I expect would spread it’s wing quite nicely on a GCN based APU with shared memory. This would probably, make for a good value proposition.

  16. Well 2014 is the target for 14nm for chipzilla from the chip maker itself.

    [url<][/url<] By then Globalfoundry will probably be at 22nm. So one full process behind, which is more in line with what we've seen for years.

  17. [quote<] Intel has way too many different cpu product families to make separate chips anyway[/quote<] That's a valid point. Realistically, I was really suggesting something more like a 3rd dual-core with a completely different cache configuration, much as AMD did specifically for the Athlon II X2. That's a lot of different ways to make the same CPU! In a roundabout way, my point was that "ultrabooks" seem to justify it now, though. But that idea probably only popped into my head because that's literally what they're doing with the ULV nevermind!

  18. Because of significantly more advanced power gating, turbo, and integration than we see from Bobcat today, a quad-core could wear all of those hats.

    So I guess my question is will AMD use different chips for different [b<]purposes[/b<], or for different prices? As you said, a dual-core part would also probably cut down the GPU. But the entire chipset will be integrated, so they'd undoubtedly cut that down, too. To make it work in tablets, the current Bobcat uses the same dual-core CPU, but a different chipset, with limited features. Some possibilities are that the 3.6w version won't have USB 3.0, or the memory controller will use LPDDR2 instead of DDR3. This would change the way the tablet is used, more so than how fast/slow it is.

  19. I think saying the loss of l3 cache on the AII X4 didn’t have an impact is a bit exaggerated though it was admittedly not that large (depending on the app of course, maybe a bit more than 5% on average). The AII X2 sort of compensated this with a larger L2. Since Trinity has already quite huge L2 though any L3 cache indeed would only really make sense if the gpu can use it.
    Current Celerons and Pentiums probably wouldn’t be very good without L3, their L2 cache is just too small. Removing L3 would most likely be a disaster in terms of perf/w (remember the P4 Celerons with only 128KB L2 cache? Same huge power draw as regular P4 with half the performance). The L3 itself shouldn’t use that much power, and I’m not convinced the cpu as-is could even work without any L3 (and the gpu benefits from it too). Intel has way too many different cpu product families to make separate chips anyway (the smallest SNB/IVB chip is dual-core/3MB L3/6EUs, there’s a second dual-core die with 4MB L3/12(16)EUs and features just get disabled from there).
    To answer the OPs question, I think it is reasonable to expect a ~15% per-clock increase or so over Trinity. AMD essentially promised there’d be gains in that order for each BD generation.

  20. [quote<]That gives Intel a 4x transistor density advantage.[/quote<] That's a funny and/or unsettling thought at first glance, but you have to keep things in perspective. As a smaller company, what's more important for AMD is the cost implication of a new process vs. what can be done by adjusting an existing one. When Intel moved Xeons to 32nm almost right out the gate and just about made them 50% more powerful, AMD ended up not having an equivalent process for their servers until over 2 years later. However, in response, AMD [b<]doubled[/b<] up, just using a matured process that had capacity to spare. It took Intel 3 years since the introduction of 32nm for the Sandy Bridge E Xeons, and only then did they finally slap AMD back down. Ivy Bridge E won't be here until 2013, but we know not to expect much of it. 28nm is already going fine right now. The risk for AMD in 2013 is minimal. They can make big chips, and lots of them. 2014 is when things get interesting in the same way as the above examples. The Xeon version of Haswell, an extremely high power chip on a mature process, will also introduce DDR4. That's probably going to be a game changer.

  21. If I’d have to guess I’d suspect there might be two dies for Tamesh: one dual-core with also slower graphics which goes down to 3.6W (but some versions might have higher power draw). The other one quad-core with faster graphics using up to 6W (this one being physically identical to Kabini, just lower clocked, much like Ontario/Zacate). Or maybe just one die with disabled parts (it should be a pretty small chip anyway), I guess that would depend on how many dual-core parts they expect to sell.

  22. Hopefully he’s referring to the impact a shared L3 can have on the GPU, the important part of Llano, and not the CPU performance singled out, which would be unrealistic for the chip itself.

    But that’s not to say that a shared L3 cache with the GPU is a necessity. They’ve already talked about how differently Kaveri handles memory, and that suggests it’s not recycling the same system bus.

    While a large, high speed L3 surely helps the GPU in SB/IB, to give it the credit would be to ignore the true next step in integration that SB introduced in a PC context – the ring bus.

    We’ve seen for years that AMD knows very well how to implement an L3 cache. That’s not their missing architectural ingredient when compared apples to apples to Intel.

    As an aside, I’m actually curious why Intel doesn’t do the Celerons and Pentiums without the L3, or at least some ULV chips. It’s always sucking juice for no real reason in the laptops those end up in.

  23. The Athlon II X2/X4 almost never noticed the loss of the L3 cache relative to the Phenom 2, so I’m dubious it would have made much of a difference for Llano, and given Bulldozer’s L3 cache hardly offers much more bandwidth then main memory so I’m dubious it amounts to much for consumer workloads unless the implementation is dramatically improved.

  24. It’s a bit concerning that AMD will be using 28nm for future server products due in late 2013 since they will likely be competing against 14nm Intel chips. That gives Intel a 4x transistor density advantage.

  25. Am I correct in understanding the “X & Y” descriptions on the slides mean “Desktop & Portables”?

    I’m kind of curious what kind of performance to expect from Kaveri? Maybe 2-10% faster clock for clock than it’s predecessor? I’m also wondering if it’ll have an L3 cache on it? The lack of which seemed to hurt Llano’s performance quite a bit.

  26. That’s already made it possible. The size of AMD itself is the limitation here.

  27. The way AMD has been performing of late, he might want to change his name to PowerPointMaster.

  28. I’m going to get you an industrial-strength prescription for dramamine because of the motion sickness you’ll be feeling from doing so much spinning.

  29. Wait a few years and Moore’s Law will make it possible for low-end AMD processors to target that market as well.

  30. Every time I see this guy’s name I think it’s some kind of printer company, like Lexmark renamed itself or maybe HP spun off their printer division when I wasn’t looking. (Though that would probably get some made-up, easily-trademarkable name that cost them millions in consulting fees to invent, of course.)

  31. No, he’s excluding the lower endpoint. It’s “something larger than 18mm, up to and including 24mm” or, equivalently “24mm down to but not including 18mm” At least, that’s the reasonable interpretation of it.

    It’s not the most elegant way to put that, but if you’ve ever tried to cram technical details onto slides you’re familiar with subverting absolute clarity in pursuit of back-of-the-room readability, knowing that you’re going to be expanding on the points verbally anyway.

  32. I for one am glad that AMD is not wasting resources trying to get into the smartphone market. Tablets are a good end point for the plucky little company.

  33. I’m curious if the tablet Jaguar chip that goes down to 3.6w will actually be the quad-core. It’s entirely possible, but the trouble is that they’d previously said there would be single, dual, and quad-cores, which implies the chips are actually built two different ways, much like Llano.

    But regardless, it’s nice to see that tablets with keyboard docks will no longer be a compromise compared to lower end ultraportables. Now, you won’t even have to really make a choice between one or the other.

  34. No, if he meant that it would say “18mm to 24mm”, because the whole interior of that range is greater than 18mm and including a sign would be totally superfluous.

  35. Minus one. He used it properly. The line reads: “greater than 18 mm, up to 24 mm”.

    The fact you can’t read is not the CTO’s problem.

  36. Seriously, you’re the CTO of a Fortune 500 company and you can’t figure out which one is the less than sign and which one is the greater than sign? Kabini obviously can fit into systems thicker than 18mm, that’s why you provided a range! You want the other sign, this one ‘<‘.