review amd cto reveals first steamroller details

AMD CTO reveals first Steamroller details

Ever since the relatively disappointing debut of AMD’s Bulldozer microarchitecture, we’ve been curious to find out what happens next. New architectures sometimes have their share of troubles, but they often bring with them quite a bit of headroom for improvement, especially once there’s operating silicon to be examined and optimized. Bulldozer seemed to have more than the usual share of problems, so the question became: does it have a correspondingly large amount of headroom for improvements in successive revisions?

The first update to the architecture, code-named Piledriver, debuted aboard the Trinity mobile APU. Although the performance improvements to the Piledriver core were fairly modest, the new architecture’s superior dynamic voltage and frequency scaling helped Trinity achieve substantially higher performance per watt than Llano, which was based on the older “Stars” CPU architecture. Piledriver is slowly making its way into desktop systems aboard the Trinity APU, and we should see a broader desktop Trinity launch very soon. However, the eight-core “Vishera” processor isn’t expected until next year. Regardless, our conversations with AMD architects have made one thing clear: the generation after Piledriver, code-named Steamroller, is where the big gains in performance should happen.

That leads us to today, at the Hot Chips conference, where AMD CTO Mark Papermaster delivered a keynote speech detailing some of the tweaks the firm is making to the Steamroller core in order to boost its per-clock throughput and power efficiency. Sadly, we didn’t attend Hot Chips this year, but we are in possession of the slides from Papermaster’s speech, which we can share with you. Let’s walk through them and see what we can learn about AMD’s upcoming architectural refresh.

The first slide reads like a simple acknowledgement of Bulldozer’s current weaknesses, including the Amdahl’s Law problem created by its relatively weak single-core performance. We’d hope for these areas to improve in subsequent generations. Otherwise, the basic layout shown here looks like any other Bulldozer overview, I believe.

The ‘net has been rife with speculation about the primary sources of Bulldozer’s problems. Looks like the shared front end of the dual-core Bulldozer “module” is indeed one of the culprits. Steamroller gets separate, dedicated decoders for each integer core, along with larger instruction caches.

There are some very big numbers in this slide, given what they represent. Branch mispredictions drop by 20%, instruction cache misses by 30%. Per-thread instruction dispatches that use the full width of the execution units are up by a quarter. Overall, these changes add up to a whopping 30% improvement in ops dispatched per clock cycle—and these numbers are based on simulation, not just hopeful estimation. Even more notably, this 30% figure comes from simulated client-focused workloads, including “digital media, productivity and gaming applications,” not just the server-class applications for which the original Bulldozer core was so obviously tuned.

Presumably, the revised front end is the single biggest improvement in Steamroller. Provided the rest of the engine can cope with how it’s being fed, these changes could result in a formidable boost to overall performance.

Steamroller’s cores should be better equipped with the front-end’s higher dispatch rate thanks to some changes to the schedulers and the memory subsystem. We don’t have too many specifics here, but the 5-10% improvement in scheduling efficiency again comes from simulation of client-side workloads like “digital media, productivity and gaming applications.”

Zooming back out, this slide offers a look at some power-efficiency provisions baked into Steamroller. The instruction fetch optimization, which detects loops and handles them more efficiently, is a familiar trick. The dynamic L2 cache resizing makes sense, too, since it’s a shared resource used by both integer cores and the working data set of different threads can vary. If not all of the L2 cache is needed, portions of it can be powered down.

We’re unsure what the floating-point “rebalance” is all about. Currently, Bulldozer’s floating-point performance is relatively weak, in part because a single FPU is shared between two integer cores. Streamlining the FPU’s execution hardware might save power, as is being claimed here, but we worry about performance. If “adjust to application trends” means hardware better suited to common workloads, then fine. If it means “gut the FPU and rely on the graphics processor to do floating-point math,” well, that’s less promising. We’ll have to get more specifics about what’s happening here.  Update: AMD tells us it’s not reducing the execution capabilities of the FP units at all. They’ve simply identified some redundancies (for instance, in the MMX units) and will be re-using some hardware in order “to save power and area, with no performance impact.” So no worries, there, it seems.

Moving from architecture to design opens up more opportunity. As you may recall, Bulldozer is relatively large for a 32-nm chip with its transistor count, especially after AMD revised down the transistor count estimate. Apparently, there’s plenty of room for improvement even in the same process node.

Shown above is a portion of the chip’s FPU. The top image comes from a current Bulldozer chip, which employs the hand-drawn custom logic that’s generally used in high-end x86 CPUs. The lower image comes from a potential future chip that uses a more automated high-density cell library. On the same 32-nm process node, the high-density library purportedly crams the same logic into 30% less area, with 30% less power use. As the slide notes, gains on this order would usually come from the transition to a newer, smaller fabrication process. We’d expect the more automated approach to design to reduce AMD’s time to market, as well.

What we don’t know is when we’ll see a product designed using a high-density cell library like this one. AMD tells us the future processor illustrated here is a post-Steamroller design, and it therefore seems likely that any improvements realized by using these tools will happen on a future process node, not at 32 nm.

When AMD purchased SeaMicro earlier this year, it acquired the technology used to build low-power, high-density server arrays that combine multiple CPU and system modules with a shared pool of virtualized I/O resources. The firm has since stated that it plans to hand over this fabric technology to its system vendor partners, to enable them to create high-density AMD-based solutions. Now, that fabric tech has a name: French Freedom Fabric.

At the time of the acquisition, SeaMicro didn’t offer Opteron-based solutions, but AMD stated its intention to deliver Opteron-based offerings in the second half of this year. The card pictured above is the hardware that delivers on that promise; it’s populated with a Bulldozer-derived Opteron 4256 processor and dual SO-DIMMs. The four chips across the top are labeled “SeaMicro” and likely manage the interconnect between this module and the rest of the system.  As you can see, the card itself is pretty compact, at roughly 12″ in length and about half that in height. You can imagine a host of these cards packed into a single enclosure as part of a very high-density cloud server solution. Offerings like this one may be attractive enough to allow AMD to get a toe-hold in the growing cloud and blade server markets.

Meanwhile, we’re looking forward to more potent CPUs based on Steamroller, which should boost server performance while also addressing Bulldozer’s biggest liability: client-side single-thread performance. Although we have a few numbers now about improvements to the individual portions of Steamroller, we don’t yet have any sense of the overall IPC gains or how those might combine with clock frequency increases to affect overall performance. For that, we’ll probably have to wait a while yet.

0 responses to “AMD CTO reveals first Steamroller details

  1. People seem to be hung up on IPC these days and are not keeping their eye on the ball. While IPC is important it’s not the end all or be all in processor performance. What matters is the user experience. If you have two systems that essentially perform the same and one uses a higher IPC while the other uses a higher frequency, then it doesn’t matter what the IPC is, what matters is how the PC functions, i.e. the user experience. In most cases you need to make a major change in processor performance to differentiate between similar processors running real applications, though APUs a little less so than just a CPU or GPU.

    When you investigate the nity-gritty of processor design you will find that there are pluses and minuses to short vs. longer pipelines. All the aspects of a processor’s design however ultimately determine it’s system performance and that’s what user’s experience. In reality it doesn’t matter what processor design approach is employed, what matters is how the PC actually performs running real applications, not benches.

    Unless all you do is run benches all day long instead of actually using your PC for work/communication/entertainment, etc., IPC really don’t mean anything, though many seem to not understand this as they are led down a path of uninformed opinion. Any of the current desktop CPUs including the FX models provide an excellent user experience and the FX Zambezi models are highly overclockable delivering more performance for less cash.

    Those who haven’t actually used an FX CPU or OC’d an FX CPU simply don’t understand that while these were a lateral performance move in some applications, (think single thread as a generality), they were a forward movement in others, (think multi-thread as a generality). In either case if you were looking to upgrade from a slower Deneb or Thuban, Zambezi was most definitely a step forward even before OC’ing.

    Vishera will offer 10-15% improvement over Zambezi which is a good bump in performance and Vishera should hit close to 5 GHz. on air. Considering Intel’s Ivy Bridge only got an ~5% improvement with a node drop from 32nm to 22nm and tri-gate, AMD is doing pretty darn good to get a ~10-15% improvement with Vishera, without a node change.

    Steamroller will add another ~10-15% over Vishera so AMD isn’t dead yet contrary to the false reports that have been made for the past 40 years regarding AMD. In addition only a small percentage of people ever buy the top of the line, over-priced, over-hyped models. The majority of enthusiasts and mainstream consumers buy the best bang-for-the-buck products which more often than not is AMD. Reading many PC hardware reviews however you’d be hard pressed to know this as it’s often understated in reviewers lust for the highest benches.

    As a consumer I want to know what the user experience is like and benches simply do not provide this information. In fact benches can be very misleading depending on how the benches are designed, what processor code drivers are actually used, how the benchmark “weighs” the results, if the bench is tainted to produce higher results for Intel, etc. Trust actual system performance with real applications, not potentially misleading benches.

  2. Sandy Bridge has all of those things, and performs admirably.

    Prescott’s Hyperthreading + replay + long pipeline was just a really stupid move on Intel’s part. Trace cache was probably helpful to performance, but not nearly enough to make up for the super duper long pipe.

  3. There are plenty of 65nm Pentium 4s and Pentium Ds that came out before Conroe. It’s not like 65nm and a brand new architecture hit at the same time.

  4. Steamroller is from the same family as Bulldozer. Are you 100% sure AMD put this picture with a steamroller caption ??

    Here is another slide from the deck : AMD uses a bulldozer for bulldozer and a steamroller for steamroller
    [url<][/url<] And of course, an excavator for excavator and a piledriver for piledriver. Kind of make sense, no ? edit: notice, no image of any front loader on that slide.

  5. The Bulldozer SKUs are very compellingly priced nowadays, but coming from someone using a Phenom II X4 right now, I don’t think I’ll see a dramatic performance improvement with an FX-8150 (unless I run 7zip and Handbrake all day). Idle power will probably be less but it won’t be enough to justify $600 for new parts. Priced well, I’ll opt for Vishera, but I don’t really need to upgrade for now so there’s a good chance I’ll wait for Steamroller. If not, then Haswell or Ivy Bridge.

  6. I was speaking generally, but yes I do believe that AMD is sharing resources on these CPUs as AVX is still the ‘convince developers to start playing with it’ stage, so it’s not very important today.

    AVX does however have the potential to turn certain previously overbearing loads into manageable tasks that can be integrated into regular workflows. I expect it to be extremely useful in the next few years.

    But for now, as we have all guessed it, AMD needs most to make a competitive CPU.

  7. So at least you read carefully the article and found out Papermaster did not say anything specific in regard to performance. 45%, 30%, 15% or 5% faster?

  8. I hope this guy is right, and it shows in few months. I would be happy if there is “only” 30% improvement by whatever means – clocks, IPC, or whatever, provided no more power consumed.

  9. According to VR-Zone [url<][/url<] they had an anonymous source say this about Steamroller: "Steamroller is not Bulldozer Enhanced. F*** no. The layout might look the same but our LEGO blocks are completely different. When all is said and done we should get 45% improvement and this goes to show how the Bulldozer was f***** design. This is all what Bulldozer was supposed to be." Haven't seen anything like this before.

  10. At 65nm, Netburst was abandoned as a desktop/consumer product and remained a Xeon. Intel isn’t known for having two competing technologies in the same market. So, it was dead at 90nm, whether or not it would have been any good at 65nm, 45nm, or beyond.

  11. Brontosaur-based, if I remember correctly. Fred’s model had a CPU the size of a walnut, but was unlikely to topple over, a feature still utilized by today’s machines.

    However, daily maintenance must have been rough, especially the morning disposal of the waste byproducts…that would make for a crappy day for even the most cheerful of mechanics! 😀

  12. One interesting point… while Papermaster talked up Steamroller, he didn’t appear to have much to say about Piledriver….

  13. Uh.. one typo in your comment: Bulldozer doesn’t have any AVX integer performance because Bulldozer (and Sandy Bridge and Ivy Bridge) only implements AVX 1.0 which does not really do integer and byte-manipulation operations.

    AVX 2.0 will include the integer operations and Haswell will have it first. I’m not sure but Steamroller may introduce AVX 2.0 on the AMD side.

  14. The Bulldozer or the upcoming Vishera FX?

    If I were to get one, I’d definitely get the improved-but-still-flawed Vishera version.

    But actually, I just ended up caving in and getting a lightly used 2600K instead — yeah, I know, what a cop out I am. And once people begin ditching their 3770Ks for Haswell or whatever, I’m probably going to get a late-stepping 3770K and pry off the heatshield and slap on some decent thermal paste and be happy with that for a couple of years.

  15. Yep, AMD copied all those features and produced a slower processor than they were able to build 2 years ago. What a great move by both companies.

  16. They had Pentium Ds, they sucked compared to both Core Duos and Athlon 64 X2s.

  17. It wasn’t the 90mn process it was the 31-33 stage pipeline and “double pumped” ALUs that were an issue.

  18. Considering Netburst included features like the trace-cache that AMD is going to be copying in Steamroller, I’d be a little bit less snarky and a little more humble in your attitude. Additionally, when you look that the fact that Netburst was the first implementation of SMT in the consumer space and that the Bulldozer module is effectively an inefficient version of SMT, you might want to be careful in insulting Netburst since AMD seems to have taken a lot of inspiration from what Netburst included.

    Netburst had numerous issues, although the Northwood core P4s were pretty competitive with the Athlons of their time period. Things did hit the fan with Prescott + AMD being smart with the Athlon 64, but Intel learned its lessons from that debacle.

    In the meantime, AMD went out and refused to learn from the history of what Intel had gone through just a few years before and plowed ahead with Bulldozer anyway. That is what is inexcusable: You can fault Intel for sticking with the MHz wars for too long and for having too much hubris with Prescott, but AMD should have damn well learned from the earlier failures of Intel in the process. Basically AMD’s upper management had Intel’s arrogance x 10 when they forced through Bulldozer.

  19. I would think SB/IB is No. 1, Nehalem may be No. 2, and well, I really can’t decide whether to rank K10.5 or BD as No. 3. Either way, I think there’s no point in getting a Nehalem at this point, not when SB is already here, so it’s either SB or BD for most folks.

  20. Its funny to think that the BD family is likely the 2nd or 3rd most powerful CPU family ever made. (Based on the idea that there would be a reasonable thermal ceiling and tasks to run which used a small number of threads and locally attached memory, not some big-iron massively-parallel infrastructure.)

  21. Well, I know they don’t have the resources to pull off something as fast as Sandy Bridge, not when they were designing BD, at least. But regardless, they took some design decisions that, even if they were easier and less expensive to implement, still made them end up being slower. Don’t get me wrong, I am planning to get an FX anytime soon, or I might even wait for Steamroller. But as much as I am very intrigued by the design, I also acknowledge its weaknesses. It’s AMD’s biggest and best bet, it’s not that good compared to the competition, but it’s still a state-of-the-art microprocessor that should get the job done, It’s even surprising that AMD has managed to produce this design considering they don’t have nearly as much resources (engineers as well as money) as Intel.

  22. I’m under no illusion that AMD’s budget is miniscule compare to Chipzilla.

    If for some reason you misinterpreted my [url=<]enthusiasm[/url<] in the second paragraph, then that would explain the tone of your post 😉

  23. Well yes, if they can do [b<]TWO[/b<] things, i.e. 40% IPC [b<]AND[/b<] a 5GHz part, then they're still in the game. My fantasy world is one where AMD is so poorly managed, has a bunch of interesting ideas, and then fails to execute so badly that they're the joke of the industry. Oh wait, that already happened; I must be dreaming.....

  24. [quote<]At 130nm and 65nm, Netburst was more than competitive.[/quote<] Eh, I must admit that I do not understand how you can claim netburst was competitive at 65nm. Core2 kicked its teeth in at 65nm. I expect AMD's 90nm X2's beat it soundly as well, but I can't recall specifics.

  25. I doubt AMD’s engineers seriously expected a big performance success. They sort of had a chance with the shot for high clocks, but they also must have known it was a gamble. They need to figure out clever ways to get close to Intel’s performance while spending way less on R&D.

  26. Well I challenge you to suggest what tradeoffs would be better. I think they did a decent job, the product appears to work for certain servers, it can work OK in mainstream desktop (Trinity), and it has lots of cool looking numbers to sell itself on (high clocks, lots of cores). Sure, they botched version #1 a bit, and 32nm has not been kind to them, it will probably never outright beat Intel’s competing products, but AMD doesn’t have the resources that Intel does.

  27. So, when presented with evidence that contradicts your opinions, do you change your opinions?

  28. you sir are correct.

    & if they are sticking with the same chipset…. great…. even if they come out with a new one while still allowing the old ones to be compatible it’ll all be good.

  29. I instantly thought that exact thing when I saw the picture, but knew somebody must have already pointed it out. A Ctrl-f on “loader” brought me here, to upvote your comment.

  30. [quote<]Negative comments are helpful for responsible company, to navigation towards better products.[/quote<]Yes, I'm sure that "responsible companies" have employees out there browsing every tech forum on the planet, in case some troll might be saying (in broken and misspelt English) that their product is bad. I guess that's how "responsible companies" make all their strategic and engineering decisions; nobody could "navigation towards better products" without your precious help. It's so marvelously charitable of you to grant these companies your incomparable insight for free! To preserve your own interests, maybe you should consider keeping your wisdom to yourself until they begin paying you for your valuable services.

  31. You can’t call something fail that D.N.E. yet.

    And DNE = does not exist, it’s burnt into my brain from calculus, sorry.

  32. I like the piano solo version myself


  33. That’s the party line, but it actually contradicts history.

    Netburst is actually an amazing architecture that’s very good at things that matter to desktop and server users. It’s a high-bandwidth high-transaction CPU that relies on custom logic for anything other than integer and rudimentary FP, specifically SSE2, and that worked very well once developers got on board.

    The real problem was Intel’s Netburst implementation on their 90nm process; that was borked. At 130nm and 65nm, Netburst was more than competitive.

    I’d even hazard to guess that if Intel adopted a current multi-core design based on Netburst, that it’d be more or less competitive with Core on the same process. It’d run at higher clock-speeds and vary in relative performance, but I believe that it’d simply blow Core et al away at certain things.

  34. …they couldn’t agree then as to what constituted a ‘quad-core’. In both that case and with dual-cores, AMD was first in integrated all of the cores into the same die. Intel’s first ‘dual-cores’ and ‘quad-cores’ took advantage of the 775 socket that allowed for two physical die to operate simultaneously in the same socket on the same bus.

    In all of those cases, every CPU after the x86 series had integrated FPUs of some sort. Some were weak as hell (K6), but they were there.

  35. When you think about it, this is AMD’s truly ‘new’ architecture since the K7. Intel did the P6, and went back to it with the Core series, interrupted by Netburst, since abandoned. The only other ‘new’ architecture is IA64 used in the Itaniums, and it’s so radical that it’s success or failure is still up in the air.

  36. AMD has repeatedly said that the Phenom II line, holding from the k7 architecture, was a dead end. I understand that the BD is a new-from-scratch architecture that may take some time to get up to speed. They do seem to suggest the necessary improvements though (I’m sure they know much better than we do where the CPU stalls).

  37. They were probably betting that by that time programs would have better threading. Programs that thread well are decently competitive on Bulldozer (video encoding, compression). Sadly, not all tasks are easy to implement in parallel and the 8 cores are frequently underutilized, especially in gaming.

  38. And at work, didn’t he operate an Excavator?

    Oh, and the Excavator was dinosaur-based. Bad omen.

  39. Any company will charge as much as they can. In fact, if they don’t, they aren’t doing good business. I’d say that overall, AMD has not managed to charge as much as they could have…

  40. My 486SX-25 handled doom2 fine, until the last level where there were tons and tons of enemies spawning, then it was a slide show haha.

  41. Let us pray: Pul-leeaazzee help AMD build a killer high-end cost-reasonable CPU that will re-up the competition, keep Intel’s feet to the fire, keep prices under some tension, and also be the first step in an on-going path of competitive CPUs.

    We hold these truths to be self-evident: that allegiance to a particular brand of CPU will get dumped as quickly as will a bad burrito in a combination plate if there is something better out there.


  42. Maybe you forgot but when when intel brought out their first “quad-core” AMD was saying it wasn’t a true quad-core because it did not lie all on the same piece of silicon. Intel said otherwise, they said it was a true quad-core. They couldn’t even agree back then as to what a definition of a core meant.

  43. Right. I think he’s saying it would be nice if Steamroller dropped into AM3+ motherboards.

  44. I think it’s more like they read the Sandy Bridge articles back in Sept. 2010 and realized it was a cool trick, but implementing it on Bulldozer was a little too late with BD so far along its development phase.

  45. Depends what clock the i486SX ran at. AMD’s Am486SX2-66 would’ve played Doom just as well as any DX2-66 ever did because Doom doesn’t even use the FPU at all.

    Edit – Oh, you were being sarcastic. I didn’t catch it.

  46. They promised 10-15% better performance each year when Bulldozer came out. Did they meet it with Piledriver? I think I read somewhere that PD is only 6% faster than BD. To get back on track on that 10-15% promise, SR needs to be 14-25% faster than PD. Yeah, they have their work cut out for them.

  47. If they’re sticking to the same chipset, well, isn’t that suggesting it’ll be compatible with current mobos?

  48. Yeah; it was a reply fail; it was meant for you. The engine comments were loosely connected to airplanes, but in this context they were referring to AMD’s modules with two fake cores (V4s) vs Intel’s true cores (V8s); MPG meant power efficiency

  49. Hilarity ensues: Steamroller’s µop cache has roots going all the way back to the trace-cache in the… Pentium IV. So this time around AMD is looking to copy the good features from the Pentium IV instead of the brain-dead features.

  50. Excuse me; “executive companions” and they all went to Ivy [s<]Bridge[/s<] League schools, you know. And are fluent in French. In so many ways....

  51. I was going to be a party pooper and remind everyone of their prior expic fail with the first iteration of this architecture and other famous misses like Stars. Then I remembered something, Piledriver was a nice suprise, exceeding expectations. I’ll be upgrading my GPU this year and next year I’ll update my CPU. I’d be very happy to return to AMD CPUs if they can deliver (Their processors allow overclocking and have all the extra instructions like VT, which I use, without forcing me to an expensive platform that has more PCI-E lanes and memorry controllers than I need).

  52. it doesn’t have to be on par with ipc if its a hot clocked design and 30-40% increase in ipc would certainly make up the performance gap, not sure what fantasy world you live in.

  53. Excellent explanation, but what I’m confused by is why he replied to RobbyBob with it.

  54. See [url=<]Heinkel He177 Greif[/url<] for a good example of what he's mumbling about: in the Second World War, Germany needed a long-range heavy bomber. They developed the He 177, which could - incredibly - deliver 13,000 lbs of bombs with pin-point accuracy via dive-bombing. It had many innovative features, but to be able to dive-bomb, the designers went with 4 engines in just 2 nacelles, where each nacelle used 2 'coupled' engines (see where I'm going with this?). Individually, the engines were fine (the same as used by the Me 109 fighter), but when coupled with another in a confined space with inadequate cooling, they had a disturbing tendency to burst into flames. Ultimately (and very fortunately for the Allies), the He 177 was a greater danger to its own crews than to the enemy. Eventually, the engineers fixed the problems with the engines, but by that time, the war was over (once again, see the similarities?) If the Interwebz had existed in 1946, the discussion rooms would have been filled with Heinkel fanbois saying "Ah, but if only...."

  55. It all depends on whether the front end was the bottleneck or not. AMD seems to think so..

  56. [quote<]be nice to see them deliver and even nicer if it was compatible with the current motherboards.[/quote<] You can forget about that. AMD has already put the nix on bringing an updated chipset out for AM3+. They are sticking to the same old chipset that they brought out in 2010.

  57. Given AMDs recent track record, I’d say that a healthy dose of skepticism is in order.

    +30% OPC? … I’ll believe that when I see it.

    From [url=<]The Bulldozer Aftermath: Delving Even Deeper[/url<] [quote<] It will be interesting to see if AMD will adopt a µop cache in the near future, as it would lower the branch prediction penalty, save power, and lower the pressure on the decoding part. It looks like a perfect match for this architecture. Another significant problem is that the L1 instruction cache does not seem to cope well with 2-threads. We have measured significantly higher miss rates once we run two threads on the 2-way 64KB L1 instruction cache. It looks like the associativity of that cache is simply too low. There is a reason why Intel has an 8-way associative cache to run two threads. (...) So far we found out that the instruction cache, the branch misprediction penalty, and the lack of clock speed are the main reasons why Bulldozer underperforms in the server world. [/quote<] Looks like AMD learned the same thing, except it would appear that they aim to improve IPC/OPC rather than clock speed, which will likely benefit the server chips in terms of keeping the TDP rating (so no need to upgrade the data center cooling), yet increase the performance with a simple CPU drop-in upgrade.

  58. when you don’t have anything competitive today it’s always good to talk about how you’ll have something amazing tomorrow.

    be nice to see them deliver and even nicer if it was compatible with the current motherboards.

  59. Umm not really, multicore cpu’s have been around for a dogs age and many of them were not complete processing units. Cell is an example of this. Multi-core cpus can be homogeneous or heterogeneous in design.

  60. They’ll be competitive insomuch as they’ll be priced right around the Intel CPU of similar performance. It’s not like they can go charge a premium and still sell CPUs.

  61. I went with a Pentium 4 2.8C back in 2004 because (as I recalled) Athlon 64 parts were selling a little bit more than what I’d pay for them. Good thing too, because back then Intel was running a raffle, and with that purchase I won an LG air conditioner. 😀

    There are some advantages to having a weak AMD, and personally I’d rather enjoy this time while it’s here. Back when FX wasn’t released yet I asked myself whether I’d rather BD smoked SB or not. It was obviously gonna be expensive if it turned out to be an SB killer. Since I wanted to own BD because I find the architecture very intriguing, I thought it would be fine by me if BD wasn’t really stellar because I obviously would pay less for it.

  62. Yeah. That’s actually what’s happening right now. They took a bet on a very radical design. It works for servers and some applications, but even at its best it doesn’t really smoke the competition. It’s obviously not doing well on desktops.

    They CAN benefit from taking tradeoffs, unfortunately, it really seems they didn’t make the right tradeoffs with Bulldozer. I suppose the benchmarks make that plain. They gambled on giving folks tons of cores, and that just isn’t what most folks need at this time.

  63. The “next year” thing: Was that from an official source, or just another rumor?

    If it’s an official source, I’ll go bang my head off the wall for a while.

  64. Well, that’s a hell of a lot more than I understood.

    So, IPC is going to be something less than 30%. I’ve seen other figures being thrown around, such as 10%-15% (so same as what Piledriver is supposed to deliver over Bulldozer). I just hope people do not over estimate the performance improvement, which would then lead to disappointment. Hopefully, as we get closer, AMD can clarify (better) what we can expect, for improvement.

  65. [quote<]I think they're making the wrong tradeoffs.[/quote<] AMD can benefit from making [i<]different[/i<] tradeoffs in the hope of doing [i<]something[/i<] better than Intel. They certainly stand no chance of making a chip that beats Intel at its own game. They can also get by with a less optimal CPU in some market segments, and rely on graphics to sell the product. So there is an opportunity to make a CPU core that can sell in certain compute intensive roles, can sell in an overall competitive mainstream consumer processor, and can be offered as an "enthusiast" part (by reusing the server parts). The enthusiast part is a weakest link, but it doesn't cost much to offer it. The fact they can talk about high clock speeds helps to sell product to consumers, regardless of actual value.

  66. [quote<]it gets tiring to constantly be subject to the marketing hype over these things[/quote<] They did the same thing with Barcelona.

  67. What, do you think Intel’s R&D budget it being spent on hookers? AMD will always have trouble keeping up. You should expect that.

  68. [quote<]This is still broken bulldozer architecture.[/quote<] Its only a broken architecture if they can't make it competitive. According to TR, it already beats K10 on performance per watt (based on Llano vs Trinity).

  69. If I understand this right, there are some dispatch and prediction gains that might, at best (if OPC is closely related to IPC) add up to 35-40% improvements? I know this is a complete guess and an early ballpark figure, but even 40% more performance doesn’t close the IPC gap between AMD and Intel.

    Assuming this, and if AMD get this chip out in the next 18 months (not something they have a good track record for doing) we could be looking at Sandy Bridge performance from AMD by 2014!


  70. Well, no, they’re not the same. IPC is an abbreviations of ‘instructions per cycle’, in which the word ‘instructions’ would be X86 instructions, or in case of an ARM architecture ARM instructions. Instructions are not the same as ops.

    The front end of a processor has to translate instructions into ops (one or several per instruction, or even multiple instructions per op). These ops can then be fed to the pipeline, and when these ops execute, the work that the instrucitons describe is being done. After executing the ops, care must be taken to make sure that the order that the ops have been executed in (which does not have to be the same as the instruction order) has the desired result as described by the instructions. When all ops associated with an instruction are done and the results saved in the correct order, an instruction is said to be retired, which means that the work it represented has been done.

    The entire process of translating an instruction into ops, executing them and retiring the instruction is what determines the speed at which a CPU can execute the instructions, or the IPC, when there are no other bottlenecks (such as RAM access). What I believe AMD is saying is that the front end can now feed the pipelines 30% faster, which is achieved by better caching and prediction, and more bandwidth. This however does not mean that the instructions can actually be executed 30% faster, because the rest of the CPU must be able to execute this larger amount of ops as well. This is what Scott meant with “Provided the rest of the engine can cope with how it’s being fed, these changes could result in a formidable boost to overall performance.”

    Please correct me if I’m wrong, I’m just trying to explain this from my own limited knowledge of CPU architecture.

  71. Are Operations per Clock cycle equal to Instructions per Clock cycle?

    I see a lot of comments mentioning a claimed IPC of 30%. I saw the comment of 30% increase on the 2nd slide, but it mentioned “Ops per cycle improvement”. Are they the same?

  72. I call it Fallroller, because it will continue general fall of CPU from AMD. Haswell will be sooner than SR, and gap in perf/power will remain the same or even will be wider.
    Negative comments are helpful for responsible company, to navigation towards better products. But AMD constantly ignore this feedback since Phenom launch, and they hurt themselves.

  73. It’s nice that AMD isn’t giving up, but it gets tiring to constantly be subject to the marketing hype over these things. After the Bulldozer disaster (how it was going to be the best thing ever, to find out that it wasn’t, followed by “No, no…it will be great in servers” to find out that it isn’t that great), Piledriver was rated as the next big thing that was going to put AMD on the map again…it didn’t…now it’s Steamroller, the savior…Please! Just stop with the marketing nonsense and work on the thing to provide the best possible performance at the best possible price, while getting closer to the competition, even if you don’t flat out beat it, which of course is something AMD hasn’t done for years now. At the high-end where performance is king, AMD is just destroyed by Intel…no questions asked.

  74. Man, that’s kinda sad. Ok, they probably know nothing about construction equipment but… to miss this is simply too sloppy on their part. Heck, I think AMD needs to fire a few more marketers. Just Google Steamroller and I’m sure a ton of pictures will pop up.

  75. [quote<]Good change, though still not enough for me to want one over Haswell yet.[/quote<] I think the main reason I want to get FX instead of Intel is because this architecture has received so much interest, speculation, analysis, discussion. and even arguments. It's probably one of the most interesting and controversial microarchitectures ever spawned. I wanna try it out even if Haswell turns out to be 2x faster. Heck, so what, I have 8 cores! (A bit of sarcasm on the last sentence there, though).

  76. The question was, why do you think SR will fail? That’s a valid response to your post, isn’t it? Also, it was followed by “Do you want SR to fail?” See, I didn’t say you wanted it to fail. I was asking if you want it to fail. Of course you don’t. Nobody does. But the thing is, bro, negative comments aren’t helping here.

  77. Yeah, good question. We heard a number of rumours about B3 but it seems nothing ever came of them…

  78. This is eight core, four module. Vishera is rev C0 of Bulldozer (Zembezi was rev B2) so it’s tweaks, not a radical redesign.

  79. Bulldozer thermals was related to overall power use, especially OC, not hotspots.

    Ivy bridge thermals are 100% a result of Intel using thermal paste (a very crappy paste) internally, instead of solder. Some review sites disassembled, swapped the paste for good paste, and achieved substantial heat reductions.

    I could see the power reductions, less space used would also be less length on the transistors and connections, and possibly less transistors overall. I know hand drawing is important for critical areas, but I have heard that there are automated tools used these days by both because of the shear complexity and number of transistors, that hand drawing the entire chip, by highly skilled engineers would be time and cost prohibitive. Also it is listed as a “High Density Library”, so this could be a special one that is more advanced the the ones previously used. Also Bulldozer was a bit rushed.

  80. One other thing: Why was there no work done to make each of the integer clusters wider? You can feed them all you want but as long as it’s a much narrower pipe compared to SB, there’s no way it’s gonna catch up unless AMD cranks up the clocks. Intel’s scheduler and dispatch mechanisms are top notch, feeding all the compute units well, and shoving in HT to boot for those cases where two threads can use some of the compute units for each of their tasks. Clever, really. AMD, however, has a relatively simpler execution engine compared to Intel, and instead relying on clock speeds to compensate. Come on, AMD, give us more ALUs!

  81. I think they’re making the wrong tradeoffs. Intel has a core design that compromises neither server nor desktop workloads, and on top of that has top-notch power efficiency to gain access to laptops. I know Intel has more resources to pull it off, but AMD either needs to keep dancing or sit down and just watch Intel rake it in.

  82. So all the chips prior to the i486 are not real cores? And the i486SX doesn’t have a functioning FPU as well.

  83. There is an 8-module part right now in the form of the 16-core Opteron 62xx. You can buy it right now and it’s not that expensive. Unfortunately I think they top out at just 2.3GHz or something.

    Edit – Ok, I take it back. The Opteron 6276 (16-core, 2.3GHz) is a bit pricey at $900. Then again, it is a server part.


  84. Something not often mentioned about why Netburst was so bad, is because Intel used it for so long, and tried to get it to go so far, when it was just crap from the start. They had little performance gain over the years, and even had slower, hotter, more expensive chips with die shrinks (back when die shrinks mattered more). They made the design worse by making the pipeline absurdly long.

    AMD is improving upon an architecture that has lots of potential, and already has big improvements with PD for efficiency. Intel was trying to milk a garbage architecture for way to long, simply as a marketing technique for GHz war, and sadly many bit, even though AMD was clearly better.

  85. That’s what I thought too. It’s supposedly due in Q3 or Q4 this year. And the next question would be: If Piledriver-based Vishera chips are delayed to next year, how would it affect plans for the Steamroller release? Will it pushed back, too? And whatever happened to the B3 stepping of Bulldozer?

  86. Why do you think Steamroller will fail? Do you want it to fail? Because if it does, you can bet higher CPU prices will be on the way too.

  87. Yeah, I got burned by that “Twin-Engine” crap. How was I supposed to know they were twin-4’s!!?? Friggin’ 10mpg and no torque!

    That V8 I got from the other guy is the real thing, goshdarn fast as hell and mpg to boot!

  88. It’s almost like they need a server line of cpus with strong integer performance and a desktop cpu that’s more focused on balanced resources.

  89. [quote<]Although we have a few numbers now about improvements to the individual portions of Steamroller, we don't yet have any sense of the overall IPC gains or how those might combine with clock frequency increases to affect overall performance. For that, we'll probably have to wait a while yet[/quote<] So the keynote wasn't too revealing i see.

  90. Funny how you wrote all that just to strike them out and make it more difficult for us to read. 😀

  91. Man, this sure made my day today. It seems AMD is really serious about getting the fundamental Bulldozer architecture up to speed. I wonder what they have in the works AFTER Bulldozer (or should I say, Excavator?). Also, I imagine AMD wants folks to always wait for the next redesign to come out. We waited from August 2010 to October 2011 for Bulldozer, then AMD promised better things with Piledriver when Bulldozer came out. And now this. Looks like I’ll end up using my Phenom II longer than any other processor I’ve ever owned.

    I think giving each integer cluster its own decode circuitry pretty much puts a big question mark on the idea of sharing resources. AMD made a lot of noise trying to innovate with the front end, but I guess that came back to bite them. With the fetch circuitry the only thing shared between the cores now (the shared L2 and L3 isn’t really something Bulldozer pioneered), each core pretty much comes to its own. This also sheds some light on whether a BD module is really a dual core machine, which I still believe is a valid argument. It’s like two independent cores sharing the fetch and decode circuits. Look at the BD module die shot to see this. It is in no way hyperthreading. It’s a brute force way of cramming two cores in a module and making them share the front end, which only results in lost performance.


    [quote<]The instruction fetch optimization, which detects loops and handles them more efficiently, is a familiar trick. [/quote<] So will AMD use the micro-op cache similar to Sandy Bridge? …. [quote<]Streamlining the FPU's execution hardware might save power, as is being claimed here, but we worry about performance. [/quote<] Looks like AMD is still crazy about trading performance for die space/power. That could again come back to haunt them. …. [quote<]will be re-using some hardware in order "to save power and area, with no performance impact. [/quote<] After how badly Bulldozer fared and with AMD’s claims that sharing resources is the way to go, I’m not sure I’ll believe this before I see it. …. The focus of AMD’s Steamroller also appears to be improving IPC. The only time frequency was even mentioned was in the first slide, which only said they will maintain the frequency engine, as well as some slides later on saying they will maintain latency on some circuits. Looks like AMD has pretty much realized it had a Prescott in its hands and is doing damage control.

  92. [url<][/url<]

  93. All you guys are wrong, their next great cpu is the RickRoller.

    [url<][/url<] AMD's new slogan for it is [quote<] Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you [/quote<]

  94. No he was merely differentiating between Summerroller and Winterroller.

    Though personally I prefer the tastier Springroller.

  95. I would wager that most people would consider (since p55c or pentium pro) a core to consist of at least one fpu unit and one integer unit.

  96. Hmm, since AMD is sticking with the high-clock architecture, I wonder if we will see base-clocks north of 4.4 GHz and maybe turbos to 5 GHz? Reminds me of the article from Tom’s where IBM’s z196 CPU (also a long pipeline/high-clock/high-throughput architecture) powering their behemoth mainframes climbs to 5.5 GHz, but at very high wattage.

    [url<],16716.html[/url<] Its great to see that they are tackling some of the problems that plagued Bulldozer. However, I am sceptical now than ever on how this will perform against Haswell. I read somewhere (can't remember where) that Haswell will be a FP monster and so far from the details we have now of Steamroller, there won't be any significant enhancements to FP performance. Sure, they might make it smaller physically and more power efficient, but what about making it faster? Yes, I know that AMD wants to eventually have their APUs go beyond the mobile space and mid-range desktops into the high-end arena of gaming desktops, workstations (already happening with Trinity FirePro APUs), and into servers (although, according a roadmap from TR [url<],[/url<] server APUs will be coming in after Piledriver) so that the GPU portion can do much of the FP grunt; but those changes aren't coming in anytime soon. That is why I am concerned for their lack of real changes in FP performance. Overall, I like seeing these changes but will they actually translate in real performance increases and will it be enough to go head-to-head with Haswell? Well, I don't think AMD is going to make this mistake again seeing as how the fallout fell after Bulldozer with promises of big performance gains over their previous CPUs.

  97. BD’s AVX integer performance [s<]is actually quite good[/s<] provides a respectable increase to BD's already strong integer performance, but the software for new instructions has typically lagged behind by a couple of years[s<], which is also a problem with FMAC which should help BD quite a lot.[/s<] BD's AVX fpu performance helps some, but can't nearly keep up with sandy bridge. The AVX units in BD are merely two already utilized execution units ganged together, while SB's AVX units are discrete, resulting in a much more impressive boost. This is also a case where HT really shines. [url=<]Benchmark Source[/url<] [url=<]Architecture Source[/url<]

  98. It’s stupid?

    Really though. The GPU is great at organized FP, but not every FP operation is so structured. They can cut back on the FPU even more without really sacrificing performance. SSE/AVX are better at in-line FP.

    And that’s not even the real problem. The real problem is that the GPU is still so very far away from the execution pipeline, even if it shares the same L2 cache as the CPU while residing on the same die. For large batch operations it’s unbeatable, but for what currently passes as FP workloads using the GPU is simply untenable.

  99. If the past is any indication, then yeah, expect price to go up, at least until Intel decided to compete, which they can. Even with these improvements, Intel still has a lot more flexibility in their manufacturing and pricing.

  100. It was pretty clear up until Bulldozer, for x86.

    And I actually do want to know- 8-core or 8-module?

    Because 8-module would be a special kind of cool.

  101. Is it rational to fear that, the moment AMD realizes they have a competitive lineup with intel (or stronger, even) that they won’t be as competitive on price anymore? That’s always been the big draw of AMD, especially at the low end, and I’d hate to see that be lost as soon as their performance crown is won

  102. Once again, there is no industry accepted standards as to what constitutes a “core”.

  103. i had thought that the FPU units were a stop gap solution until AMD are able to offload every FPU to the GPU cores that will be integrated into the CPU.
    whatever happened to that idea?

  104. CPU’s have been working towards ‘workload specific’ logic for quite some time- a distant memory would be the x87 coprocessor loaded on 386 and 486 systems that enabled FP.

    And it’s still a balance, no matter how you slice it. The K7+ cores have had better raw FPU performance than any Intel part yet released, but fall behind when float is done with specialized logic circuitry like SSE, and now AVX, et al.

    I think Bulldozer’s real point (and this is one Intel acknowledges) is that raw FPU isn’t really needed. In any algorithm where you’ll need to do bulk float computing, you’ll be using SSE/AVX/OpenCL, literally anything you can get your hands on before you use x87.

    It’ll be a thing of beauty if they can pull this architecture off right. It’d be like delivering on the promises of Netburst.

  105. The dynamic L2 cache is interesting since the bulk of their chips don’t use L3 at all, Bobcat derivatives included.

    So maybe the L2 will be bigger, and L3 will become pointless?

    That could fix a lot of issues that have been plaguing their CPUs for years, like high L2 latency, low L3 clock speed, and an inability to use their “full power” chips for mobile applications of any sort.

  106. They claimed 30% energy reduction with the high-density library. To me that implies lower speeds.

    I guess they gave up on the 6GHz nuclear reactor and aimed for a lower-clocked, higher efficiency design this time; IPC goes up at the expense of high clock and turbo.

  107. To quote the article quoting the AMD guy:

    [quote<]Shown above is a portion of the chip's FPU. The top image comes from a current Bulldozer chip, which employs the hand-drawn custom logic that's generally used in high-end x86 CPUs. The lower image comes from a potential future chip that uses a more automated high-density cell library. On the same 32-nm process node, the high-density library purportedly crams the same logic into 30% less area, with 30% less power use. As the slide notes, gains on this order would usually come from the transition to a newer, smaller fabrication process. We'd expect the more automated approach to design to reduce AMD's time to market, as well.[/quote<] I'll chalk this up to a misunderstanding, but it is rare to have an automated cell placement library achieve substantially better results than a properly done hand-drawn layout. In fact, let me take it a step further and say that in critical sections of the chip you almost always go to hand-drawn layouts to beat auto-placement setups. Further, Bulldozer was notorious for being thrown together using very little hand-placement of components and the over-reliance on auto-layout tools is part of the reason why BD has easily the lowest transistor density of any 32nm CPU in the x86 line from Intel or AMD. So basically: I am flat out not buying what this guy is saying insofar as that statement and those slides are concerned. Additionally, with increased transistor density comes more potential for cross-talk, capacitive coupling, etc, with the biggest etc by far being that you are now pumping out just as much (or more) heat into a much smaller die area. We've already seen Bulldozer's not so spectacular thermals and remember what happened to Ivy Bridge too. I'm interested in seeing what Steamroller can do, but remember that there is a good chance Steamroller will be going up against, at best, a very mature Haswell that is going to be *tiny* compared to Steamroller's die. If Steamroller slips at all (and we've seen it happen before) then there's a good chance it will be taking on Broadwell... we'll see.

  108. Ugh. Stop saying ‘broken’ when referring to a product that is shipping every month in the millions of SKUs. I keep thinking you are talking about some product that has been cancelled, delay, etc. because it can’t be made yet.

  109. [s<]I always laugh at that. A tech site editor can distinguish between two GPUs without a problem, but when it comes to heavy machinery, they never impress. Front Loader: [url<][/url<] Steam Roller: [url<][/url<] Tech Report has posted the incorrect heavy machinery for several of AMD's products. A simple Google search will provide the proper pictures.[/s<] *Chuckling at AMD*

  110. “However, the eight-core “Vishera” processor isn’t expected until next year.”
    -> That’s a lot later than us enthusiasts had hoped. The last rumours seemed to be for (probably late) October. Did this detail slip out at Hot Chips?

  111. No upgrade would mean it’s still not competitive with SB/IVB FPU when it comes to AVX though, right?

  112. where did you get 30% IPC improvements from? i don’t think AMD is claiming that much!

    I think overall it will be 15% better in performance per watt than Piledriver.

    I’m just hoping in the future these will morph into 8-core APU’s 😉

  113. I would think AMD learned its lesson from painting a blue sky and delivering gloom?
    Because this sound just way to good to be true.

    Extrapolating from know facts, like Bulldozer at 1.1 volt using up to 30% less energy, compared to its 1.4v ‘overvolted’ default at stock clocks,
    if Steamroller get 30% higher IPC, and a 28nm process, again on paper, AMD got a chip that is very exciting.

    I forsee late 2013 to be a geek delight 🙂

  114. Can it finally catch the 2nd generation Intel Cores ?
    or will it still be relegated to core2duo performance status ..

  115. If you search the rumors, you’ll find that the figure that’s been thrown around for the Haswell CPU performance improvement is 10%. Improving 30% on BD per clock, along with some clock speed improvements, should put AMD in a position to compete.

    Haswell’s “really big potential” is supposedly in graphics, where people are claiming it will be 3x as fast as the HD 4000. Will be interesting to see how that compares with the GCN-based graphics cores in Kabini.

  116. Mmm stuff like this makes me go ‘doki doki’.

    Makes me wonder if BD was really just a ‘lets get this product to market even if it’s a pile because we need to refresh our current lineup’ sort of thing. While people are still saying how awesome Phenoms are and they would be better off buying them, I don’t think the Phenom 2 could’ve lasted another four years till Steamroller got here.

    What would be worse, Bulldozer being a flop or Phenom 2s being on the market for close to six years with close to non-existant speed increases?

    I still think the frame time benchmarks would turn out better of four of the eight ‘virtual’ cores were disabled, so each unit is stand alone.

    I’m getting itchy on my i7-860 which I’ve had for close to three years… Waiting another year maybe what I need. As long as the $/performance is there, I’ll happily feed the underdog my money to stay in the game.

  117. That’s not a picture of a steamroller. That’s a frontloader. A steamroller is the machine with the huge metal cylinder at the front. 😀

    On topic, it would be nice to see AMD compete again.

  118. I think the shared FPU is actually a clever solution. 99% of cases it can handle 2 operations at once anyways. It also gets us closer to workload specific cores, which would be cool imo.

  119. Probably not, but at least they’ll be beating their own chips from 2 years ago.

  120. This is my fear as well, haswell has some really big potential as well. We’ll have to see if this plus the die shrink will be enough.

  121. Blows it away in every dimension!

    Edit: Ah memory lane: [url<][/url<]

  122. Well, at least then they’ll only be one generation behind and not 2 or 3 like they are now.

  123. i’m not sure a 30% increase would put them competitive with what intel will have by that point. I mean, i’d REALLY like to see a strong amd, but i think 30% faster is still slower… here’s hoping i’m wrong.

  124. Certainly a step in the right direction, okay so the fetch is still shared but at least now each core will have its own decoder. This takes it away from being imo AMD’s Netburst.

    I still don’t like the shared FPU, but we can’t have everything, at least this should decisively beat out the old Phenoms finally in integer IPC, if not for floating point.

    Good change, though still not enough for me to want one over Haswell yet.

  125. I will purchase if “competitive”(not perfect but close) with Intel on price, performance, and performance/watt.

  126. This is awesome.

    It’s very rare to see a company directly address what’s widely known to be wrong with their products openly. Further, Bulldozer, in concept form, is essentially ‘hardware’ Hyper-threading on a high-clock/high-throughput design. It should be amazing if implemented properly, and if AMD accomplishes what they’re aiming for with Steamroller, it looks like they might just jump right back into the game.

  127. I don’t know about that one. FedEx is always late when I order a package from them. One time I ordered something on 3-day shipping and it took them 8 days (excluding the weekend, even) just to get it to me. Never had a single problem with USPS.

  128. I’ve updated the article with a clarification about the nature of the FPU changes we noted. AMD claims they shouldn’t reduce performance.

  129. Fallroller on the way. I don’t believe in their 30% IPC improvements. This is still broken bulldozer architecture.

  130. “not just the sever-class applications for which the original Bulldozer core was so obviously tuned.”
    “sever” needs to be “server”.
    Its the 2nd paragraph below second slide.

  131. Yawn..I remember similar hype before Bulldozer came out, and we all know what came of that.

  132. It literally won’t be soon enough, because Haswell will have launched. They’ll get up painfully close to IVB just as it’s retired.

  133. I’m sure they can deliver…but deliver like FedEx (on time, when they say they will) or like USPS (eeeehhhh…we’ll deliver….eventually)?

  134. This looks good so far, improvement of the front end and cache which were the 2 items people pointed out right away with BD. Hopefully there will be improvements in L3 as well.

  135. Wow – a 30% performance improvement combined with the efficiency improvements seen in Trinity would put AMD firmly back in the game. Combine it with some of the serious graphics power they have available to them, and they could mount a serious challenge to Intel again.

    Now let’s just see if they can deliver….