ARM Fellow takes the mic at AMD event

Attendees at the AMD Fusion Developer Summit here in Bellevue, Washington were treated to an unexpected keynote speaker today: Jem Davies, Fellow and Technology VP at ARM. Davies wasn’t there to announce a grand rapprochement between AMD and ARM. He did, however, provide some interesting commentary on the two companies’ rather similar vision for the future.

Davies’ main point seemed to be that, while Moore’s Law isn’t dead yet, its relevance is shrinking. Linear increases in CPU performance are a thing of the past, and heterogeneous computing is the only sensible way forward.

This point of view shouldn’t come as a shock to anyone, of course. CPU and GPU cores have been sharing die area in ARM-based system-on-a-chip devices for some time, and the same goes for the latest wave of processors from AMD and Intel. Davies’ illustration of the underlying causes and their future implications was interesting, though. These two slides from his presentation sum up the problem:

Increases in clock speeds may turn out to be considerably less than expected as chipmakers move to 28 nm, 20 nm, and finer process nodes. As the second slide shows, finer processes will definitely allow more transistors per square millimeter. However, the amount of power used by each transistor won’t drop dramatically. Davies said these constraints will mean some parts of the silicon will have to remain "dark" so that current power envelopes aren’t outgrown by future chips (which will have higher transistor counts, similar-sized dies, yet similar or not much lower power consumption per transistor).

He added that heterogeneous computing will be the only effective way to make efficient use of such chips. Instead of a faster monolithic design, developers will face chips with different types of processors tailored for different workloads. Processing workloads will need to be moved to the particular area of the chip capable of executing that workload the most efficiently. (Before you start getting any ideas, no, Davies made it clear that he does not expect CPUs and GPUs to merge in the future. They may simply remain perpetual roommates.)

The way forward is clear, then, at least from the standpoint of companies like ARM and AMD. Things will get trickier for developers, though, which presents a real challenge. Davies didn’t lay out a clear solution; he admitted that abstracting the extra complexity of heterogeneous processors would be a must if developers are to take advantage of them, but he cautioned against wantonly modifying standards or creating new ones. In the ensuing question-and-answer session, though, Davies said of OpenCL, "I’m completely convinced."

Comments closed
    • dmjifn
    • 11 years ago

    [quote<]In any case heterogeneous architectures are not a scalable solution, and they're hard to develop complex applications for.[/quote<] Or you can look at this as a macro level of ALU vs FPU. I don't disagree whether specialized components complicates things, or whether homogeneous architectures are easier to scale. But figuring out how to develop for FPUs or SIMD went pretty well - I wouldn't be surprised if static analysis could take care of a lot of this to the point that it's just yet another compiler issue. By "moving" workloads, they may just mean "retarget", not "redirect at run time".

    • c0d1f1ed
    • 11 years ago

    Jem Davies: “Processing workloads will need to be moved to the particular area of the chip capable of executing that workload the most efficiently.”

    Moving data around also consumes power, and the latency overhead can kill performance. So effective performance/Watt may not be higher at all. In any case heterogeneous architectures are not a scalable solution, and they’re hard to develop complex applications for.

    Instead, we need each core to have a well balanced ISA, with possibly some task specific instructions. The most valuable instructions are those which are still fairly generic. Vector gather and FMA are great examples. x86 is an ugly duckling but the reality is that it’s quite efficient at many workloads. AVX2 expands on this success.

    The real challenge going forward is to find ways to lower the power consumption of control logic. With AVX this could be achieved by executing 1024-bit instructions as four sequenced 256-bit operations. The sequencing logic could be quite tiny, not unlike how GPU’s process wide vectors on more narrow ALUs.

    • gbcrush
    • 11 years ago

    Really. I just think it’s awesome that the guy included “Area ^ -1” as part of his slides.

    • stmok
    • 11 years ago

    At the AMD Fusion Developer Summit…

    Trinity (Llano’s replacement in 2012)
    Apparently => 2nd generation Bulldozer (dual/quad-core) + Cut-down version of Radeon HD 6900 class IGP?

    Llano = 2011 = Sets the stage for mainstream APU.
    Trinity = 2012 = Lights the afterburners?

    It won’t be using the expect VLIW5 based GPU, instead it’ll be VLIW4.

    • lilbuddhaman
    • 11 years ago

    Every Apple story involves many many negatives.

    • lilbuddhaman
    • 11 years ago

    Spammer. Bot.

    • odizzido
    • 11 years ago

    I am really interested in what they finally come up with to keep things going.

    • ronch
    • 11 years ago

    I just did you a favor. Thank me.

    • jimmy900623
    • 11 years ago

    Duke Nuked – SPAM

    • Arag0n
    • 11 years ago

    Agree, but I don’t see how that can improve the power efficency of the systems…. it would be bothersome once we go down 2 or 3nm and we need to layer because we can’t have less than 5 atoms wide designs…. It would help to scale up complexity of chips but not the power efficency…. dunno what we will do by then. Specialized parts that turn on/off maybe… but I think there is also a limitation with the “number of specialized parts” one person can have… so I think we will defnitly required steup in number of cores and step down in frequency… but some applications are damned hard to thread or simply impossible right now at least.

    I was working in a project that was requiring some image processing, do some processing to a picture then apply another filtering, then check for some things and then apply a final filtering.

    While it’s possible to thread the filtering it’s impossible to thread the process. Step 1 must be done before Step 2, etc, and given a picture size there is a limitation about the number of processors that can help to speedup the process in a threads vs mhz scenario…

    So honestly, I feel that we are starting to reach the roof about what nowdays silicons can do… we need something different with higher frequency limits, and use the middle time to develop the multicore technics, api’s and compilers. I’m pretty sure that some threading may be done in compiler level, specially for this kind of “for’s” that work with different data and are totally unrelated without requiring “openMP”….

    • Arag0n
    • 11 years ago

    And I think that’s the reason about why CPU’s are stepping up from dual to quad channel… at the end I think should be expected to get bandwiths over 200Gbps instead the nowdays 20-30Gbps…

    Anyways, Llano isn’t designed for you defnitly, Llano is designed for casual gamers with no hardcore requirements like you…

    And next, Llano would be awsome in a world where we can harvest GPU computing power for general processing. Think about total computing capabilities instead of x86 single thread performance and Llano trashes any i7..

    • TaBoVilla
    • 11 years ago

    sure is =) ..I need someone to thumb down my post so the circle is complete =P

    • TaBoVilla
    • 11 years ago

    I thought this was serious, until I read the second para

    • mutarasector
    • 11 years ago

    [quote<]And if we see the introduction of layered designs, this could keep Moore's law applicable to processors for another decade or two.[/quote<] SoC POPs are coming, thanks to WideIO...

    • willyolio
    • 11 years ago

    i think there’s still room for 3D (or, at least, layered) processors on silicon.

    • ronch
    • 11 years ago

    It’s hard to imagine that we can keep shrinking transistors the way we (or they) do now. Someday, we’re gonna hit a wall. It’s not a question of if, but when. The laws of physics can’t be broken. We can look for new materials, but right now we don’t see any of those new-fangled quantum research hitting shelves. And they’ve been reported many years ago. ‘Course I could be wrong and we’d live to see the day of Cray-level performance in our wristwatch.

    • ronch
    • 11 years ago

    Earth calling OneArmedScissor… Earth calling OneArmedScissor… please respond… can you hear us… buzz.. crackle… please respo…

    • willyolio
    • 11 years ago

    it’s quite red, for sure.

    maybe we should take it more as a mass-effect style, Paragon vs Renegade.

    • ronch
    • 11 years ago

    Yeah. It’s fun thumbing down posts, isn’t it? 🙂

    • ronch
    • 11 years ago

    Good to see AMD has come a really long way since they’ve started out simply cloning Intel processors. Although they’re still small compared to Intel, people should realize that competing with Intel is terribly difficult and AMD has succeeded in doing what so many others have failed at. Now, it’s good to see AMD filling in more pieces of the puzzle such as chipsets, graphics, and servers. Moving forward, they should start thinking about living outside Intel’s shadow, explore new horizons, and pursue those goals continuously without leaving x86. Although they’re denying it at this point, it looks like AMD and ARM are, or will be, cooking something up soon.

    • DeadOfKnight
    • 11 years ago

    I think they should just change it to a facebook approach. You either like it or you leave it alone.

    • BobbinThreadbare
    • 11 years ago

    Everyone in here has a negative score from voting.

    I love it.

    • MethylONE
    • 11 years ago

    Quantum electronics is what makes a semiconductor in the first place. I don’t think the ‘transistor’ design is going anywhere anytime soon, just changing elements and sizes.

    • Skrying
    • 11 years ago

    I bet you thought this was awesome.

    • NeelyCam
    • 11 years ago

    I thought you believe clock frequencies beyond 4GHz don’t exist.

    • NeelyCam
    • 11 years ago

    [quote<]Prescott's leakage issues was the clarion's call to the entire semiconductor industry. [/quote<] It wasn't leakage; it was the blind "up the clock for performance at any cost" approach that cause the power consumption to blow up. If there was something inherently power-inefficient in scaled-down transistors, we wouldn't have the continuous power efficiency improvements we've seen in the past couple of years. Overall, you sound like the guys who used to say "CMOS [b<]can't[/b<] scale beyond 1um".

    • sschaem
    • 11 years ago

    I dont think I will see this transition in my lifetime
    (And no I’m not 65years old and about to die as I should according to social security computations)

    Optical in interconnect yes…

    genetic, bio chemical, quantum…. not in traditional programmable computers.
    Maybe in some AI function, but dont expect the xbox 365 to include a Direct GenX api for AI thats based on a bio chemical chip.

    My prediction is that in the next decade we will manufacture traditional transistor in sick&massive quantity.
    Mostly destined to storage, but still. Unless the solar industry use all the sand in the world!

    • sschaem
    • 11 years ago

    Moore’s law wasn’t stated at the transistor level, but to a collection of transistors.
    This can take the form of larger die, higher density in any dimensions keeping the same price constrain.

    So I agree, if the new material is 10x more costly but only improve density by 2x, its doesn’t follow expectation.

    • DeadOfKnight
    • 11 years ago

    New materials could also be extremely expensive. And die stacking is not an example of Moore’s law. With Moore’s law eveything gets smaller and cheaper. At the worst, when Moore’s law ends all you can do to improve performance is make chips larger and more expensive. At best they partially mitigate this problem with efficiently optimized architectures and manufacturing processes that allow for greater yields of usable silicon.

    • sschaem
    • 11 years ago

    If you do the math we are not that far off.

    2300 transistor in 1971 in the 4004
    1.17 billion transistor in 2010 in the Westmere EP

    Moore’s law indicate that if we start with 2300 in 1971 we would have chip with 1.1 billion transistor in 2008

    He was off by only 5% , not a bad prediction that spawn 40 years!

    And if we see the introduction of layered designs, this could keep Moore’s law applicable to processors for another decade or two.
    He would then have been right for over half a century.

    BTW, he himself says that there is a point of saturation… Personally I think we are far from it for transistor density.
    m^2 will become m^3 and literally open a new dimension to the process.

    • OneArmedScissor
    • 11 years ago

    I’m not a professional chip designer, but I play one on the internets, and I’m here to tell you that this is a lie perpetrated by two uncompetitive companies, who have run out of ideas and are colluding to manipulate the average joe, who does not understand technology. This is anti-competitive!

    I know this because I met this one guy (IRL) who thinks his new Droid is cool because he can talk for a few hours before the battery dies – except what he didn’t stop to think about is what it would be like if his Droid were overclocked to 10 GHz, with [i<]ten times the single threaded performance.[/i<] Why talk for a few hours, when you could talk [i<]ten times as fast[/i<], and turn the phone off faster? I told him this, and he just stared at me blankly, because he couldn't even understand. That's what CPUs would do if they weren't wasting heat putting all these "specialized" parts in. Turbo boost would be limitless if we just had two Pentium 4 cores. First it was GPUs in every CPU, stealing bandwidth from my 2 GHz DDR3 and limiting my 4.5 GHz OC. I could have written this post twice as fast! What next? I read an article saying there are going to be octal core CPUs that don't use real cores, but they cost more! That doesn't even make sense! How could anyone expect to sell something that's worse for more? But Intel is going to play their game, anyways, because they can do the same thing at 6 GHz with 10nm 4D CPUs in 2018. It's over, so what now, beyotches?

    • Krogoth
    • 11 years ago

    Mr. Moore’s “observation” was already been invalidated with Prescott.

    Transistor budgets aren’t doubling the same rate as did back in the Pre-Prescott era.

    • Krogoth
    • 11 years ago

    Semiconductor-based computing is on its way out.

    Silicon is the first victim, moving to another material will only delay the inevitable. It always bring its own set of potential issues (cost, viability for mass-production, unknown quarks/issues etc.)

    The entire microprocessor and transistor boom is following nicely with exponential growth. Physical limitations will always curve it. Prescott’s leakage issues was the clarion’s call to the entire semiconductor industry.

    In the end, this will help spark more interest in other potentially viable computing models (quantum, optical, genetic/bio-chemical etc.)

    • sschaem
    • 11 years ago

    I can imagine the introductions of new materials that nearly eliminate electron flow resistance allowing layered transistors.
    Starting with 2 layers and doubling each 18month to please Mr. Moore…

    • DeadOfKnight
    • 11 years ago

    Slightly off-topic, but I honestly think Intel’s 22nm 3D tri-gate process will be the last one of any real significance.

Pin It on Pinterest

Share This

Share this post with your friends!