AMD’s heterogeneous queuing aims to make CPU, GPU more equal partners

Slowly but surely, details about AMD’s Heterogeneous Systems Architecture are trickling out. In April, we learned about hUMA, which will allow the Kaveri APU’s CPU and GPU components to access each others’ memory. Today, we can tell you about hQ. Otherwise known as heterogeneous queuing, hQ defines how work is distributed to the processor’s CPU and GPU.

The current queuing model allows applications to generate work for the CPU. The CPU can generate work for itself, too, and it does so efficiently. However, passing tasks to the GPU requires going through the OS, adding latency. The GPU is also a second-class citizen in this relationship; it can’t generate work for itself or for the CPU.

Heterogeneous queuing aims to make the CPU and GPU equal partners. It allows both components to generate tasks for themselves and for each other. Work is packaged using a standard packet format that will be supported by all HSA-compatible hardware, so there’s no need for software to use vendor-specific code. Applications can put packets directly into the task queues accessed by the hardware. Each application can have multiple task queues, and a virtualization layer allows HSA hardware to see all the queues.

AMD’s current hQ implementation uses hardware-based scheduling to manage how the CPU and GPU access those queues. However, that approach may not be required by the final HSA specification. Although hQ is definitely part of the spec, AMD says the OS could get involved in switching the CPU and GPU between the various task queues.

At least initially, Windows will be the only operating system with hQ support. AMD is working with Linux providers, and it’s also looking into supporting other operating systems.

Comments closed
    • Wirko
    • 7 years ago

    I’m pretty confused at the concept of task queues as it’s presented here. Are the queues part of the hardware, or the task scheduler in the OS? And how is the CPU supposed to generate tasks for itself? That sounds contrary to CPU executing OS and application code in memory. Thanks for any explanation.

    • TwoEars
    • 7 years ago

    The more I read about this the more interesting it seems.

    I knew about kaveri before but I thought that it just was just about unified memory, not changing the entire instruction set structure! That’s a pretty big move! If this really happens it will be one of the biggest changes in computer graphics since… oh I don’t now… a really long time!

    I can’t help but wonder if it will be enough though. Intel will have broadwell which is 14nm and perfect for laptops since it will be super power efficient, even more so than haswell aleady is. Will Kaveri have the power-saving features to compete with intel in the laptop market? I certainly hope so!

    If kaveri is brilliant on paper but can’t deliver in the real world in terms of performanc/watt and price/performance AMD will still be in trouble.

    But this certainly is interesting tech! I’ll be watching it much more closely now than I was before!

      • BlondIndian
      • 7 years ago

      I agree with your point about AMD needing to deliver on perf/watt to succeed in laptops .
      More importantly , there need to be design wins .
      I looked for a trinity laptop in my market, online and retail(in india) . The only ones had dGPUs . Whats the use of APU if dGPU are present ? I would rather they gave me a 900p or 1080p IPS for the same price.

    • chuckula
    • 7 years ago

    Ok… after a little more analysis… AMD is trying to turn the GPU into a coprocessor where binary machine code can get dispatched to the GPU in a similar manner to how it already gets dispatched to the CPU.

    That has both advantages and disadvantages compared to how things happen today according to their first slide. The big advantage is that you can gain access to the underlying GPU hardware more directly.. yay, no massive driver layer. If this works up to its potential then you can execute a compiled program that runs using both the CPU and GPU to execute different instruction paths.

    The disadvantage is this: the GPU now has to have a compatible ISA for your compiled code in pretty much the same way that x86 boxes need to have a compatible binary ISA or else your compiled program isn’t going to run.. just like if you try to run a compiled x86 binary on an ARM chip or vice-versa. In the past, the GPU micro-architectures have *not* been compatible across vendors and aren’t even compatible between different releases from the same vendor (see the complete lack of Mantle support on AMD’s GPUs from 2011 as one clear example of this trend).

    So basically, you either get locked into one vendor’s proprietary GPU ISA… OR… there needs to be a just-in-time compiler present that turns source code into binary machine code on the fly when you run a program, in much the same way that a huge chunk of your current graphics driver is actually a rather complex compiler that crunches shader programs into proprietary-format binary blobs. Nothing in AMD’s presentation implies that the requirement for the driver to be present for doing this compilation is going away, just that they have a hardware setup to dispatch binary commands after the compilation process is completed.

    This announcement + Mantle says one major thing: AMD is sticking with GCN for a *very long time* so get used to it.. for better and for worse.

      • madtronik
      • 7 years ago

      You are very, very wrong. There is absolutely no ISA lock-in nor compiler overhead. That’s what HSAIL is for. There will be a compiler that generates HSA bytecode that will be JITted to binary by a finalizer.

      From what I’ve read the compiler and a lot of the software will be open source. A lot of companies will reuse it, so the software development burden is lessened a lot. If you have an exotic ISA your main job is writing the finalizer for your architecture.

      The overall HSA idea is VERY well thought. For a small technology company it will be very easy and cheap to jump on the bandwagon.

        • chuckula
        • 7 years ago

        [quote<]You are very, very wrong. There is absolutely no ISA lock-in nor compiler overhead. [/quote<] Skip a nice sentence that uses another buzzword... [quote<]There will be a compiler that generates HSA bytecode that will be JITted to binary by a finalizer.[/quote<] So no compiler overhead at all... except for the JIT compiler... that is required because of the incompatible underlying ISAs... So you go out of your way to call me wrong only to delve directly into a series of statements that confirm my original post was in fact correct. This is getting embarrassing.

          • madtronik
          • 7 years ago

          You talked about

          [quote<] in much the same way that a huge chunk of your current graphics driver is actually a rather complex compiler that crunches shader programs into proprietary-format binary blob [/quote<] The HSA finalizer is quite simple and fast. Most of the work is done by the compiler that generates the bytecode. Current graphics drivers do the entire process, from the initial code to binary. HSAIL is designed so it will be easy and fast to get the final code from it.

            • chuckula
            • 7 years ago

            [quote<]The HSA finalizer is quite simple and fast.[/quote<] Sounds more like an assembler than a full-blown compiler then... but who does the optimizations? Or are there no real optimizations? [quote<] Most of the work is done by the compiler that generates the bytecode.[/quote<] Once again, if this actually is platform agnostic, where do the optimizations to the code take place? You can do some generic optimizations in a platform-agnostic way, but to really wring out the performance there's got to be some program that has knowledge of the low-level architecture, and from your own description the finalizer isn't that program.

            • madtronik
            • 7 years ago

            Here you have the full details:

            [url<]http://www.slideshare.net/hsafoundation/hsail-final-11junepptx[/url<]

        • chuckula
        • 7 years ago

        [quote<]There is absolutely no ISA lock-in [/quote<] Good, then show me the implementation that works with AMD's VLIW4 GPU architecture. Just post a link.

          • pTmd
          • 7 years ago

          There is no ISA lock-in, yeah, and everyone jumped on the board can build their finalizers for their underlying hardware. But the assumption is that the hardware complies the specification, otherwise what’s the point of having a standard?

          Lack of support from a particular hardware and ISA lock-in of a particular system architecture are two different things. Period.

      • BlondIndian
      • 7 years ago

      yawnn ….
      Yet another Speculative post by chuckula with a lot of “facts” , their inferences and conclusions .
      Everytime AMD releases a new slide , chuckula comes with a line of reasoning(based on random “facts he pulls out of his a&&) that puts AMD in a bad light in a subtle way (not really . Everyone knows chuckula is the resident intel fanboy).

      I really tried reading this post , but it just doesn’t make sense . Have you heard of HSA before ? Do you have specifics of the communication protocol between modules ? If so , point out some sources. We are all interested .

        • chuckula
        • 7 years ago

        [quote<]Yet another Speculative post by chuckula with a lot of "facts" , their inferences and conclusions .[/quote<] Fascinating, so please tell me again how binary code magically executes using ISA-incompatible hardware. Since that's my main statement, and it's obviously "speculative" in nature, you can easily prove it wrong... if you even know what an ISA is....

          • BlondIndian
          • 7 years ago

          Hey , unless you give me some source for the fact that binary “machine” code is dispatched to the GPU , you are speculating .
          I can expand ISA and know the difference between RISC/CISC , micro-ops in x86 ,etc. Still I’d say I am an amateur in this field . That doesn’t mean I have to listen to BS from you :/

            • chuckula
            • 7 years ago

            [quote<]Hey , unless you give me some source for the fact that binary "machine" code is dispatched to the GPU , you are speculating .[/quote<] Sigh.. did you even bother to read madtronik's own link here: [url<]http://www.slideshare.net/hsafoundation/hsail-final-11junepptx[/url<] If that's not enough for you, how about this 298 page long reference guide: [url<]http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf[/url<] You'll see the description of low-level instructions that GCN actually executes to get anything done starting on page 2-6 and it goes on from there.. in particular I recommend you memorize all the opcodes starting in chapter 11.. of the GCN ISA. This may come as a shock to you but processors... yes even "magical" GPUs... execute these things called "instructions"... EVEN AMD ONES! OMG! Madtronik and I are having a technical discussion in which we disagree on certain points. You on the other hand think that GPUs run on puppies and rainbows as long as they are vaguely associated with AMD. I don't really care how many up-thumbs your sockpuppet accounts give you, you are still an idiot.

        • Bensam123
        • 7 years ago

        It always seems to come to the one irrevocable, undeniable, truth at the end… The source… AMD sucks. I mean who doesn’t know this?

        Based on what I’ve seen though it’s not really that he Intel fanbois as much as he really hates on AMD. Posts like this seem to pertain to anything positive that has to do with AMD. So he’s sorta like a inverse fanboi or just a AMD hater.

          • chuckula
          • 7 years ago

          Hey Bensam… I saw that thread where you schooled when you started spewing crap against mdrejhon, a guy who’s obviously a whole lot smarter than you are.

          The good news is that you gave yourself some sage advice in that thread, and I invite you to take it:
          [url=https://techreport.com/blog/25542/a-few-thoughts-on-nvidia-g-sync?post=769526<][quote<]Stout (sic) spouting your half assed hypotheses around the comments as fact.[/quote<][/url<] -- Bensam123: The Self-Help edition.

      • anubis44
      • 7 years ago

      “Ok… after a little more analysis… AMD is trying to turn the GPU into a coprocessor where binary machine code can get dispatched to the GPU in a similar manner to how it already gets dispatched to the CPU.”

      Shades of the Commodore Amiga. The Amiga had a CPU, a graphics co-processor chip and sound co-processor chip. So we’re FINALLY about to get the technology of the 1985 Amiga on the PC. Who says the best technologies don’t always win out!

        • chuckula
        • 7 years ago

        The Amiga was a fascinating machine for its time. In the long run the thing that doomed it (aside from Commodore’s complete mismanagement) was that PCs finally started to catch up using brute-force to match and eventually surpass what the Amiga’s did with some very clever engineering and integration. Having the chips that were perfectly tuned for the NTSC frequencies for video editing work was a brilliant play.

        • sschaem
        • 7 years ago

        “Back to the future”

        It also had a unified memory architecture and fully leveraged DMA.

        Far from perfect, but this architecture did so much with so little.

    • Aliasundercover
    • 7 years ago

    Security will be the hard part.

    You can’t just shovel bits at present GPUs unless you trust the source. Given the wrong thing they lock up entirely. CPUs together with modern operating systems have a great deal of support for confining errors so an errant or even malign program can only mess its own bed but not the whole computer. GPUs don’t have this, instead depending on trusted programs like drivers.

    This is a lovely plan for getting the driver out of the loop. Does it really include giving GPUs the kind of process and memory isolation CPUs have so it can actually work with untrusted applications? All that virtual memory, TLB, page fault, trap and restart stuff took a long time developing and represents a big part of modern CPUs. Will a GPU still be a GPU if it does these things?

      • chuckula
      • 7 years ago

      [quote<]Security will be the hard part.[/quote<] Considering all the potential security holes that already exist in GPU drivers that are intentionally isolated from main memory, you just hit the nail on the head when it comes to the potential exploits we might end up seeing.

      • UnfriendlyFire
      • 7 years ago

      Anyone remember this?: [url<]http://it.slashdot.org/story/10/09/27/1422205/malware-running-on-graphics-cards[/url<] Anti-virus companies, MS, Apple, and the Linux developers will have to address potential GPU-assisted malware. Or even malware that only reside on the GPU's VRAM to evade detection.

    • Bensam123
    • 7 years ago

    Neat… This sounds great. Hopefully we’ll get to see some products based off this soon. AMD may not have the performance crown as far as CPUs go, but they’re definitely changing things up in big ways. I can’t wait to see how things change in the next year or so!

    • Geonerd
    • 7 years ago

    I’m still rather skeptical that the GPU will have enough memory bandwidth available. If you look at the GPU applications we’re most familiar with, various distributed projects, the bandwidth available to the GPU usually has a huge impact on performance. If both the CPU and GPU are sucking through a 128 bit DDR3 straw, it seems there will be times they go thirsty. It all depends on what sort of apps AMD has in mind. A quick Photochop filter may do OK running out of CPU cache, or maybe not? Despite all the hype over HSA, much remains to be demonstrated.

    Can the HSA memory sharing principles be extended to apply to discrete GPUs? For ‘big’ GPU jobs, the PCI-E latency penalty and increased complexity would be nicely offset if the GPU could use its local VRAM to full effect. As a modest enthusiast who will be running a video card for the foreseeable future (again an APUs bandwidth just isn’t there yet), a tiered CPU > APU > Discrete GPU hierarchy sounds intriguing. Or just do away with the APU (a waste of transistors at the moment, IMO) for now and give me a ‘big core’ CPU closely coupled via HSA tech to a discrete card.

      • bcronce
      • 7 years ago

      You don’t need lots of memory bandwidth for much of the kind of calculations they want to offload. A long while back when T&L just was getting popular, someone asked what kind of load that data was putting on the GPU’s memory.

      10KB of static data was consuming about 30% of the GPUs total bandwidth, but that 10KB was static so easily cache-able in the GPU.

      There are some popular cases where the datasets are small, but require A LOT of computational power. These kinds of work loads will move lots of data back and forth between the GPU and CPU, but will rarely touch main memory.

      This is why APUs are important compared to discrete GPUs. Small amounts of data needing low latency communications and high throughput.

      A great example is tessellation. It is faster to computationally increase the poly count by orders of magnitude in real time than it is to store the poly data in memory(not even worth caching) and constantly read it.

    • TwoEars
    • 7 years ago

    Will this change anything at the coding level?

    Because if it means more work for programmers it’s pretty much a death sentance right there.

    But if on the other hand it gives the application more direct access to the GPU, akin to something like mantle, and needs little extra effort we could be on to something.

      • anotherengineer
      • 7 years ago

      Why is that?

      Is it because it is a ton of extra work?
      It’s just easier to stick with the de facto standard?
      Is it because programmers don’t like extra work or new different work?

      I don’t know what the reason is or why software development has been so slow compared to hardware development. I still see quite a few programs that run on a single thread, and/or are 32-bit when 64-bit would be beneficial (games, etc.)

      It’s too bad there wasn’t more focus on developing real good efficient software.

      • bcronce
      • 7 years ago

      All of their changes will over-all make for less work. Some of the really cool things they will support is context switching, full protected mode, and full C++ support.

      This helps in several ways

      1) Preemptive Context switching means you don’t need to break work up into small blocks to make it play well with multi-tasking, it will all be handled by the scheduler

      2a) Full protected mode means the application can play directly with the GPU and memory without having to go through system calls for everything because the hardware will keep maintain permissions. This also allows the CPU to use pointers from the GPU and vise-versa because they both play in the exact same virtual memory.

      2b) The GPU also supports page faults now and which allows it to interrupt and notify the OS to page in data from swap. This also means the GPU can support data-sets that are not only larger than the GPU memory, but also larger than the system memory without having to do strange streaming setups.

      3) Full C++ support means you can easily port code that works on the CPU to the GPU. Optimizations will have to be different, but proof-of-concepts will be a lot easier, not to mention function pointers can make and easy abstraction.

        • TwoEars
        • 7 years ago

        I still don’t understand even half the implications of all this but the more I read the better it seems… thanks for the reply!

          • bcronce
          • 7 years ago

          Nutshell is programming the GPU will be much more like the CPU. It should be relatively simple to take something that works on the CPU and make it work on the GPU. Then you just need to worry about optimizations.

    • ronch
    • 7 years ago

    I’ve always wondered why the PC seems much less efficient when it comes to gaming compared to dedicated game consoles. For example, if you compare the old NES (1.7MHz 6502-based CPU) to an IBM PC/XT with a faster 8MHz 8086 CPU, the NES can render prettier graphics at good frame rates. Fat chance that happening on an XT. For other examples, if you followed the development of gaming consoles throughout the 1990’s you can see that putting the PC alongside their ‘much less’ powerful console counterparts, PCs seem to require much more powerful hardware to match consoles. Is this what AMD has the solution to?

      • torquer
      • 7 years ago

      Its pretty much a universal law that dedicated custom hardware will always beat general purpose hardware, all else being equal. Thats why GPUs are so much better at graphics than CPUs.

      That being said, this is a great approach as GPUs and CPUs share more workload. The next 5-10 years should be really interesting in terms of overall system design.

        • ronch
        • 7 years ago

        Of course, but the 6502 is hardly dedicated hardware yet a far faster 8086 still can’t keep up with it. I think it has something to do with the system architecture, software stack, OS, all of which may add overhead to slow the whole system down, and not just the underlying hardware.

      • Bombadil
      • 7 years ago

      The IBM PC and XT came with Intel’s gimped 8088 not the 8086.

    • LaChupacabra
    • 7 years ago

    Sounds interesting, but if it’s done with virtualization will this only be available on systems where the OS is installed directly on the hardware?

      • Flatland_Spider
      • 7 years ago

      It might not matter. Virtualization is so pervasive now that they might have already baked in support for VMs, and it’s just up to the hypervisors, or VM software, to support it.

    • UnfriendlyFire
    • 7 years ago

    Management: “Damnit Bob, we don’t have time for that multithreaded nonsense! Get the game out by the scheduled date!”

    Bob: “But sir, Intel and AMD keep adding more cores, and yet only slightly improve each core.”

    Management: “I don’t care! Tell them to OC their dual core or something! Sell first, then fix it later!”

    Frank: “Uh, our DRM server keeps crashing… And we’re not sure if it will recognize the game.”

    Management: “Whatever. Launch the game on date.”

    • dzoner
    • 7 years ago

    “AMD is working with Linux providers, and it’s also looking into supporting other operating systems.”

    AMD’s Ritche Corpus: “It’s not that Mantle is the initial language with which developers are writing their games on each platform, as some have surmised; the point of Mantle is that it’s easy to reuse, in whole or in part, the development effort expended on the next-generation consoles when bringing the same game to life on the PC. This is because Mantle allows developers to use the same features and programming techniques they are already utilizing for next-gen game consoles. And while the initial iteration of Mantle is intended specifically for PCs, [b<]it has been designed from the beginning to be extensible to other platforms as well[/b<]." HSA/Mantle -> Windows -> Linux -> Android -> ?? With AMD providing developers efficient and effective Mantle porting to Linux/Steam OS Kaveri might well become [u<][b<]the[/b<][/u<] killer Steam Machine set-up with an untouchable cost/performance advantage running Mantle optimised next gen games.

      • chuckula
      • 7 years ago

      [quote<]killer Steam Machine set-up with an untouchable cost/performance advantage running Mantle optimised next gen games.[/quote<] Yeah... when AMD hires back even one of those Linux developers... that they fired in one of their many rounds of "cost cutting".. then these fantasies might have a greater than zero chance of coming true. I'll believe that AMD actually cares about Linux when they produce working software and drivers... powerpoints and semi-functional windows-only demos do not count. P.S. --> What happened to the "am" at the beginning of your username? Or did that account get banned already?

        • dzoner
        • 7 years ago

        AMD has been cash strapped and resource tight. AMD’s United Gaming Strategy is, of necessity, being progressively implemented based on priority … consoles, the foundation, first, then Windows, then Linux.

        Linux is getting attention. As resources and cash are freed up it’s likely to get a lot more.

          • chuckula
          • 7 years ago

          Holy crap… not only are the shills not even trying to be discreet anymore, now they are getting their friends to up-thumb them. Are you capable of stringing together a sentence that wasn’t spoonfed to you by AMD’s marketing department?

          Since you represent AMD how about responding to [b<]real complaints[/b<] from [b<]real users and developers[/b<] who are flat out recommending against using Catalyst on Linux because it just doesn't work right: [url<]http://www.phoronix.com/scan.php?page=news_item&px=MTQ5MjA[/url<] [url<]http://www.phoronix.com/scan.php?page=news_item&px=MTQ5MTE[/url<]

            • Flatland_Spider
            • 7 years ago

            That was an entirely reasonable answer.

            The first link, Football Manager, is the fault of some bad localization handling. If you read the comments in the Phoronix Forum, you’ll see the program incorrectly handles some data when commas are used to denote decimals. 1.00 == 1,00, but 1,00 is handled incorrectly by the program.

            Yes, the AMD Catalyst drivers are super ghetto. It would be more productive to ask them to put all of their effort into the FOSS driver.

            • dzoner
            • 7 years ago

            Holy crap… the trills are not even trying to address the reference post anymore.

            • BlondIndian
            • 7 years ago

            Come on , this is a classic pot calling the kettle black . dzoner is just not as subtle as you chuckula . Give him time , in around 6 months he will reach your level . .

            • Fighterpilot
            • 7 years ago

            How amusing…Chuckula calling someone else a shill…

      • torquer
      • 7 years ago

      Edited 7 times?!

        • bcronce
        • 7 years ago

        8 am, needed coffee to kick in.

      • maxxcool
      • 7 years ago

      “”the point of Mantle is that it’s easy to reuse, in whole or in part, the development effort expended on the next-generation consoles when bringing the same game to life on the PC. “”

      except it is not in use on either major console.

        • dzoner
        • 7 years ago

        Contextually nonsensical.

    • Chrispy_
    • 7 years ago

    Great idea, waiting on AMD for successful execution.

      • DPete27
      • 7 years ago

      I don’t know much about this hUMA and hQ stuff besides the quick articles on TR like this one, but I think AMD desperately needs this technology to come through for them. Seeing as though they’re losing the CPU battle with Intel, this might give them an edge back….at least for a year or so until Intel implements the same stuff.

    • tipoo
    • 7 years ago

    This would be for APUs and not for discrete GPUs over PCI I assume, right? The bottleneck for compute on GPUs has normally been the time spent swapping over that bus, too much latency for the work to be worth it in a lot of cases. On APUs, I can see this being quite a boon though. I wonder if the 8G consoles are already doing something similar.

    • Star Brood
    • 7 years ago

    I wonder if this would give the CPU some slack even in single-threaded games like StarCraft 2. If it did, then the best CPU for money would be an FX8350 because then it can perform admirably in all gaming situations.

      • Krogoth
      • 7 years ago

      Starcraft 2 is actually dual-threaded. The point still stands though.

        • derFunkenstein
        • 7 years ago

        It migth be “dual threaded” but it’s not very well optimized. Maybe physics runs in a separate thread or something. When I alt+tab out of the game and look at Task Manager (which can show each core in Windows 8 if you use the “logical processors” view, one core is maxed and 2 others are around 30-40% at most.

          • Krogoth
          • 7 years ago

          One thread handles all of the graphical rendering, while the second thread handles all of the non-graphical stuff (physics, unit AI, general AI, audio processing etc.)

          The problem is that SC2 is simply demanding on the modern CPUs just like Supreme Commander 1 and 2. Because they are handling scores and hundreds of units at a given time with a physics engine on top. BF3/BF4 run into the same problem with 48-64 player matches in “hot zones”.

      • Klimax
      • 7 years ago

      Just beware of reducing resources on GPU side. (Aka balancing)

      • Theolendras
      • 7 years ago

      I would seriously doubt current gen CPU will benefit from that technology. With a mentionned “hardware scheduler”. It looks like hardware support has to be the CPU and GPU to enable something like this. Still, Kaveri is suppose to improve on IPC and with hardware queuing, this should be great.

      All this makes me wonder what kind of performance Mantle enabled game will be with AMD CPU (cause AMD uses it’s hQ there I suppose)

      • Bensam123
      • 7 years ago

      I think it’ll be interesting definitely how the 8350 and the module variants perform in the next year or two. Mixing something like this with the direction consoles are going, which will also take games along for the ride, yielding significant results in mulithreading, may actually cause a paradigm shift.

      Imagine the 8350 being better then Haswell in two years? …possibly

        • Theolendras
        • 7 years ago

        Hmmm, Piledriver benefitting more than Haswell might be a more realistic scenario than outright beating it. But yeah, I kind of think the single thread performance might be less of an issue as more software will exploit further multi-threading. Tough, Kaveri definitely look attractive if HSA/hQ or multithread-heavy get more common.

    • jimbo75
    • 7 years ago
      • Klimax
      • 7 years ago

      Try again.
      [url<]http://en.wikipedia.org/wiki/Intel_i860[/url<]

      • Vasilyfav
      • 7 years ago

      AMD is so innovative in fact, that they invented the first ever NEGATIVE IPC gains.

        • jimbo75
        • 7 years ago
        • oMa
        • 7 years ago

        The first? In the frequency race, every generation had lower IPC. Pentium 3 vs Pentium 4 for example.

    • Shouefref
    • 7 years ago

    The old system was the result of having GPU’s and CPU’s separated.
    I’m really curious to see the new products.

    • Unknown-Error
    • 7 years ago

    Oh bye the way, this “Heterogeneous” nonsense is either dead or the poor mans alternative. The future is in Homogeneous computing. Chinese have a rather radical “Unified Processing Unit” design which has the CPU & GPU completely integrated into one unit. Whether it’ll gain acceptance especially due to the need for a new ISA is another thing. Its still in its early stage. A far more practical approach is what Intel is doing with AVX. Eventually most GPU like functionality will be moved to the CPU and unlike AMD’s stupid Heterogeneous turd, the Homogeneous solution will not have Heterogeneous overheads and large rewrites. Ideally only a recompile is needed. I honestly thought when AMD bought ATI that they would create a ground-up Homogeneous solution that’ll revolutionize things as we know it…..

      • bcronce
      • 7 years ago

      I case someone believes any of these half-truths and out-right lies, GPUs and CPUs work completely differently and have completely different use cases. They can never fully merge. It’s a scaling issues. SIMD(AVX) can only do so much.

      Registers can only get so wide before it’s more efficient to have lots of registers. If you have alot of registers, you need a lot of execution pipelines to fully make use of them, and that’s what a GPU is. A VERY wide execution pipeline with a lot of registers. Highly optimized for SIMD.

      GPU and CPU is like a trade off between fuel consumption and power. There are limits before you have to choose one or the other, there is no “merge them”.

        • sschaem
        • 7 years ago

        Now, if you dont mind a pentium class x86 pipeline, then Intel already done this with their MIC.
        512Bit SIMD, x86 execution pipeline.

        Skylake should intro AVX3.2 / 512bit
        At 14nm, skylake xeon are expected to be a massive compute platforms.

        What I dont see ever happening, is having the GPU as a graphic unit being fused at the x86 execution level.
        So no matter what, we need progress in ‘heterogeneous’ computing… So AMD is on the right track.

          • bcronce
          • 7 years ago

          To add to this, the Intel many CPU core solution is different than GPUs. Each CPU has it’s own separate thread of execution and while they share the same memory bandwidth, they can handle random memory access decently and have large on-core L2 caches.

          GPUs tend to have small L2 caches and do NOT support separate threads of execution well. Branching logic will crap all over a GPU and make a 80 800mhz cores run like a single 800mhz core.

          GPUs are broken up into logical groups, where cores in a given group all run the exact same instruction at the exact same time. This means if even one core in the group branches, all of the other cores have to stop and wait for that core to re-merge back into the same execution path. GPUs have about 8-16 of these groups, but thousands of cores. This means a single core can potentially stop hundreds of other cores from doing anything until that core is finished. Random memory access causes similar issues.

          On the other hand, GPUs are freaking awesome at matrix crunching.

            • sschaem
            • 7 years ago

            One thing I will note…

            Kepler cant be thought to be a 64 core processor, executing 1024bit SIMD code.
            So 2048 compute unit at ~1ghz
            (2048*2(fmac) * 1.1ghz = 4.5TFlop. what nvidia is claiming for kepler)

            Skylake will also introduce 1024bit simd with AVX3.2. an 8 core xeon then should have
            256 compute unit at 3-4ghz

            In the end OpenCL code will face the same type of branching limitation on CPU or GPU.
            Code will diverge and will need to be handled gracefully.
            GPU have been built niceness from day one to handle this fact, but avx3.2 will close the gap.

            So from what I can tell, nvidia better have a 20nm product by skylake timeframe…

            • bcronce
            • 7 years ago

            Thanks for the info πŸ™‚

            Now I have more stuff to read up on.

        • WaltC
        • 7 years ago

        Thanks for your post straightening out the Unknown-Erroneous. It’s always delightful to see knowledge and facts triumph on the Internet.

        Really, in a sense “merging them” is what AMD is working on long term (remember FUSION?) and hQ is like a first step–because they first have to become “equal partners” before a “merger” of sorts can take place. Sort of. Well, kind of. OTOH, UE sounds as if he’s been reading too much Larrabee web propaganda from a few years ago…;)

      • sschaem
      • 7 years ago

      “Thats Fusion Baby”

      There is serious issue for many use if you truly fuse GPU and CPU into one.
      Intel got the right approach with Knight Corner.

      BTW, AMD also run AVX on their CPU. I think today is the right balance. Expected that Kaveri is 3 year late.

      • Theolendras
      • 7 years ago

      I seriously doubt it. CPU single thread performance probably won’t improve quite well and requires tons of silicon to gains a few percent performance. It’s got to an inflexion point. Intel itself had to recognize it when going multi-core this was the first move indicating the Moore’s law scaling if finite.

      A totally merge solution would have benefit, but would create tradoff as well. It will never be as efficient in embarrasingly parralel silicon wise than a GPU. It would probably be easier to code but utimately less flexible and less efficient hardware-wise.

      By “less flexible” I mean in the way it could scale-out. HSA might very well consider dGPU as a peer for example.

      There might be breaktrough eventually, but right now the name of the game is efficiency, and GPU are way more efficient silicon wise for some loads, so you might as well tap into it whenever it makes sense. Asyncronous computing is probably inevitable at some point. This is were the industry is heading. ARM, Mips, Intel, AMD, Oracle are leveraging some form of it, or are at least considering it seriously.

    • ssidbroadcast
    • 7 years ago

    I am totally in favor of CPU/GPU marriage, and would support that bill.

      • Srsly_Bro
      • 7 years ago

      What do you have against CPU/CPU marriages? After all, they are all silicon! I believe our electronics should be free to marry whomever they want!

        • Dagwood
        • 7 years ago

        Sid “I am totally in favor of CPU/GPU marriages”
        S_Bro “what do you have against CPU/GPU marriages?”

        looks like nothing, unless the definition of “totally in favor” changed since I was in school.

        On topic…
        What makes this any different than what goes on in your smart phone or playstation?

          • f0d
          • 7 years ago

          i think he was actually talking about same silicon marriage (cpu/cpu) not cpu/gpu marriages

          i have nothing against same silicon marriage or different silicon marriage – i think silicon should be able to marry whatever it likes πŸ˜›

            • Diplomacy42
            • 7 years ago

            I’m in favor of traditional silicon, if we allow same-silicon marriage, it will eventually lead to CPUs marrying memristors or photons, maybe both at the same time, who can ever draw that line? this slippery slope must be ended before humanity destroys itself.

            • Srsly_Bro
            • 7 years ago

            LOL

            • ermo
            • 7 years ago

            Dagwood pointed out that, strictly speaking, Srsly_Bro’s question was a textbook straw man argument, anchored in a false dichotomy. Being [i<]for[/i<] something does not automatically imply that you are [i<]against[/i<] something else. Anyway, all in good fun.

            • Srsly_Bro
            • 7 years ago

            Of course. Someone got it πŸ™‚

            • f0d
            • 7 years ago

            i agree actually you are right and i think we all just got a little excited trying to find ways to make a fun comment (at least thats what i did anyways – wasnt really having a go at anyone or anything)
            just trying to have some fun with it πŸ˜›

            same silicon / different silicon marriages are all ok by me

            what about different materials? or completely different computers?
            will we be ok with graphene having relations with silicon?
            will we be ok with quantum computers marrying silicon computers?
            if they like each other why not πŸ™‚

            πŸ˜›

            • Srsly_Bro
            • 7 years ago

            Lots of things must have changed since he was in school.

          • derFunkenstein
          • 7 years ago

          That isn’t actually what he asked. He asked what ssid has against CPU/CPU (homosilicon) marriages. but the truth is we’ve had that for a long time. My 3570k is homosilicon polygamy at its finest.

        • albundy
        • 7 years ago

        well, you cant be spectin a deevorce!

      • Musafir_86
      • 7 years ago

      -Of course you should, it’s [b<]hetero[/b<], not [b<]homo[/b<]. πŸ˜€ Regards.

      • sschaem
      • 7 years ago

      Actually we are talking about the end of slavery here.

      The proclamation is that all processors are now equal.
      AMD want to end the CPU domination over the GPUs as slaves. (Read AMD slide)

      CPUs will need to accept they are not master anymore and do not have the right to enslave GPU to do their heavy lifting.

      AMD is working toward making GPU a first class citizen, with the rights to take make its own path.

      Freeedom!

        • sirroman
        • 7 years ago

        *Braveheart’s Mel Gibson* FREEEEEDOOOM!!!

        • entropy13
        • 7 years ago

        Not just equal, but [b<]MORE[/b<] equal!

    • Unknown-Error
    • 7 years ago

    oh please! AMD should become a graphic designing company making posters, banners, brochure, wedding cards etc, etc. because they seem to be making very nice shiny presentation slides for absolute turd products. It should be [b<]A[/b<]dvanced [b<]M[/b<]edia [b<]D[/b<]esigns.

      • anotherengineer
      • 7 years ago

      Poll
      1. Troll?
      2. Hater?
      3. Intel fanboi?

        • dzoner
        • 7 years ago

        4. Banana’s Foster?

        • Bensam123
        • 7 years ago

        Chuckula?

          • chuckula
          • 7 years ago

          Really Bensam…. please provide one link where I have ever made a post like that about AMD… just one. Please note that accurate portrayals of your hipocrisy don’t count as attacks on AMD… believe me Bensam, I’d never stoop so low as to equate you with AMD.

          P.S. –> Where’s all the hate for the magical “dzone” account that’s appeared out of nowhere and spews AMD marketing copy in the same way a cult member spews religious mantras right before drinking the koolaid? Oh wait I forgot… one set of rules for the “chosen ones” another set of rules for everyone else…

            • Bensam123
            • 7 years ago

            If negatives are a sign, it appears your BS voodoo is wearing off. You had a good run though.

            Just keep lobbing more hyperbole, more spins on the truth, more cherry picked facts and maybe something, SOMETHING! will stick. This is what politicians do when they’re trying to misdirect.

            You know what the best thing was I learned to deal with you? Vaguely reading over your posts and not actually listening to what you say, because 95% of it is usually complete and utter shit supporting other crap, which in turn supports other crap, which in turn has some hyperbole in there, a joke about AMD being bad, a cherry picked insult directed exclusively at someone, and a line that can be misconstrued as satire if you need a way out.

            Little did I know you’re the type to self-destruct if I leave you to your own devices.

            If you didn’t notice the ‘Chuckula’ post above yours is a clever joke. This is where you laugh because you can be an option on the list (insinuating that the first post is actually yours), the list could be talking about you, or both at the same time which is what really makes it funny.

            • jimbo75
            • 7 years ago
            • NeelyCam
            • 7 years ago

            “Intel fanboys” don’t exist. I think you mean “consumers with above average intelligence”

            • Unknown-Error
            • 7 years ago

            OUCH!

            • Krogoth
            • 7 years ago

            Tell that to the people who pick-up late generations of Netburst and first generation of P6x chipsets. Also to those who got the first generation of the Pentium Classic. πŸ˜‰

            • jimbo75
            • 7 years ago
            • chuckula
            • 7 years ago

            Jimbo75 sez: [quote<]I have a 2500K[/quote<] Carrying on the proud tradition of many AMD fanboys on this site who... if they are to be believed.. have done a whole lot more to contribute to Intel's bottom line than I ever have. Once again, with fanboys like this, AMD doesn't need enemies.

            • Spunjji
            • 7 years ago

            KNEEL BEFORE THE MIGHT OF MY OBVIOUS PROCESSOR SELECTION

        • Unknown-Error
        • 7 years ago

        πŸ˜‰ none of the above. Sometimes the sheer number of “thumbs-down” is a great sight. :p

          • NeelyCam
          • 7 years ago

          Soo… a troll in denial?

            • Unknown-Error
            • 7 years ago

            noooooo…….I ain’t a troll πŸ˜‰

            PS: Haven’t really seen neely the troll in action lately πŸ™

            • NeelyCam
            • 7 years ago

            Still trolling AMD fanbois a bit on the GPU side with NVidia.

            But good targets are starting to disappear now that winners are becoming clear. AMD has given up the CPU side altogether, ARM is getting killed on Windows, Bay Trail is starting to take over non-Apple tablet market. Non-Apple phones are still all-Qualcomm for the next six months. Intel owns mid-high-server end CPU markets.

            Really the GPU market is the only one that’s relatively competitive

            • kc77
            • 7 years ago

            You are not serious are you?

            [quote<] ARM is getting killed on Windows[/quote<] How much of the market is that happening in? I think it's like 3%. [quote<]Bay Trail is starting to take over non-Apple tablet market[/quote<] In what universe? 95% of the tablet market is made up of Android and Apple and the number of Windows tablets is the remaining. The number of Baytrail tablets sold is barely a drop in a the bucket compared to the major players. [quote<] Non-Apple phones are still all-Qualcomm for the next six months.[/quote<] Most still will be...

        • BlondIndian
        • 7 years ago

        Noob troll

      • forumics
      • 7 years ago

      you obviously don’t see the benefits of their implementation.
      following this, in the distant future, CPUs will not have a FPU module any further. there will no longer be a need for FPU when the GPU does all the floating point calculations.

      this would mean that the remaining CPU space can be utilized to building more integer units or better code/decode units or it could open up the possibility of having 256mb on cache CPUs.
      in essence, this could potentially mean over 10 fold increase in CPU and GPU capabilities over what we have today.

      those 2 slides up there are the front pages of what could be a whole new generation of x86 processing all over again. think about how far we’ve come since vacuum tubes. that is how far this will bring us into the future.

        • Deanjo
        • 7 years ago

        [quote<]this would mean that the remaining CPU space can be utilized to building more integer units or better code/decode units or it could open up the possibility of having 256mb on cache CPUs.[/quote<] Only if you think that the additional die space won't be taken up by additional GPU or just simply reduced to get more chips per wafer.

      • torquer
      • 7 years ago

      As someone who almost exclusively uses Intel/Nvidia these days, I still respect AMD for pushing forward with what is a pretty forward thinking approach. I don’t personally use AMD graphics but I appreciate that despite all of the challenges they face as a business they are still able to compete with Nvidia in a very strong way.

      Its sad to me that AMD is no longer able to compete at the high end with Intel on the CPU front, but frankly raw CPU power doesn’t much matter for gaming which is about all I care about. I was an AMD fan for a very long time and only finally made the switch when the Core i7-860 came out.

      Again speaking as a current Intel/Nvidia fan, I hope for a long life and thriving business for AMD if for nothing else than to keep my chosen companies at least somewhat in check. Competition is a wonderful thing.

      • anubis44
      • 7 years ago

      Yeah, you’re right, Mr. Paid Shill/Troll.

      Hypertransport, x86-64, SIMD instructions like 3DNow!, dual-core, unified cache architecture in the first true x86 quad core CPU, Radeon 9000 series, Radeon HD 4000, 5000, 6000, and 7000-series, Mantle API… It’s all just a load of hooey.

      They’ve been around since 1969, but somehow, AMD just NEVER builds anything worthwhile, do they?

      Riiiggghhhttttt.

        • Modivated1
        • 7 years ago

        AMD SPAAANNNKKKEEED Intel for about 4 years with the Athlon 64 and the first 64 cpu’s! If they had have their engineers cranking out another masterpiece to maintain the reign then they would be on top today.

        Ah, the coulda, woulda, shoulda, but they didn’t moments……

        CURSE YOU HECTOR RUIZ!!!!

      • Theolendras
      • 7 years ago

      You sure look like an uninformed consumer that bought the FX9950 5Ghz chip which looked at benchmarks after the fact.

      • jimbo75
      • 7 years ago
    • willyolio
    • 7 years ago

    i really hope this takes off. This is the kind of creativity that let AMD take on Intel, the giant that was 10x their size. Last time it was AMD64 + Hypertransport.

    But Intel hasn’t made any missteps this time, so i’m not sure if this will get much traction.

      • Klimax
      • 7 years ago

      WIll likely benefir Intel much more. CPU side is still killing AMD and pushing their APU performance under. (They weren’t able to surpass CPU-only performance in OpenCL with APU)

        • forumics
        • 7 years ago

        this would open the way for bulldozers with at least 2x more integer cores.
        it’d be difficult to see how intel can catch a bulldozer with 16 highly efficient cores.

          • maxxcool
          • 7 years ago

          until 2015, intel still has this locked up. until the redesign of the fpu to be more dual issue than it is currently the BD cores are gimped and slightly behind.

          • chuckula
          • 7 years ago

          [quote<]it'd be difficult to see how intel can catch a bulldozer with 16 highly efficient cores.[/quote<] Let me fix that for you: [quote<]it'd be difficult to see how intel can catch [b<]the Easter Bunny riding on Santa's rocket-sled running from the Tooth Fairy[/b<][/quote<]

          • Klimax
          • 7 years ago

          How would they fit them along GPU? (FPU doesn’t take that much space)

          As for second scenario, you’d lose too much making P4 and old BD look like just small mistake. Not every FP code is good fit for GPU, yet FP code is even in Integer dominated workload. (+OS)

          And you wouldn’t even see much from 16 Integer cores without massive upgrades to infrastructure on chip (and fixes for many performance problems plaguing BD) because their utilization would be low and murdered by FP design.

          Simply too many problems to be solved and for little to no benefit.

      • Flatland_Spider
      • 7 years ago

      AMD64 is the exception to the rule.

      Does anyone besides AMD use Hypertransport?

        • willyolio
        • 7 years ago

        i wasn’t talking about the entire industry shifting to AMD’s tech, just that AMD’s tech was good enough to beat Intel despite a budget/process disadvantage.

      • bcronce
      • 7 years ago

      It should get traction in the console area where AMD has 100% control, and the consoles are now all x64 with the same general archs as PC. AMD has a fighting chance, they best not blow it.

      • Bensam123
      • 7 years ago

      Intel is just striving for the performance crown in all forms, while AMD is striving for that and innovation in the entire industry. I think that’s a pretty large misstep for Intel.

    • Welch
    • 7 years ago

    This sounds like it could be a big win for AMD. If hQ is able to deliver some serious reductions in GPU latency and possibly even offload simpler tasks to the CPU thus freeing the GPU to crunch more complicated tasks… this may be one of the largest boosts to GPU performance in a long time.

    Curious, if hQ is going to be for PCI-E GPUs at first. Their explanation is vague enough to not explain how they plan on doing it via hardware. It would require direct access to PCI-E slots (x16 slots anyhow) otherwise PCI-E traffic goes through the CPUs memory controller.

    Either way, this reminds me of AMDs first few generation dual cores where they added the bridge between the two cores to share information. Changed the way the entire industry did things, opposed to Intel’s first first dual cores being two CPU cores glued to one chip.

    I’ll be watching this new standard as it unfolds.

      • Maff
      • 7 years ago

      About PCI-E GPU’s. Apparently the latency is too high when you have to wait for these small scale tasks, meaning that the CPU is probably better off calculating stuff itself instead of waiting on the data go back and forth between the PCI-E GPU.

      More on it [url=http://semiaccurate.com/2013/10/21/amd-makes-gpu-comute-reality-hq/<]here[/url<].

        • willg
        • 7 years ago

        A relatively in-depth technical article from Charlie, and without a single mocking of Nvidia. Amazing.

        • Welch
        • 7 years ago

        S|A Article – “It doesn’t take much imagination to put a QoS value in to a reserved field on the packets and have a simple sort shuffle things around here and there.”

        OK, so they were wondering the same thing I was. So essentially the author is suggesting QoS packets similar to a TCP/IP header in a network packet. Obviously it would work, but as the article also suggests (and you stated) you’d have high latency via a PCI-E bus, why exactly? If that’s the case you’d also effectively lower the bandwidth by a very small amount by having a header for QoS in that packet too.

        So from the sounds of it, at release… discrete graphics taking advantage of hQ isn’t going to really be beneficial. /shrug. This could be the start to the end of discrete finally. The performance of on chip GPUs is getting pretty damn good, even in the Intel realm they are acceptable. I used to scoff at Intel IGPs for even basic computer use. Now with HD4000 and the newer 4600 full 1080p and even some semi-decent gaming is possible. Of course the AMD camps integrated stuff is even more useable.

        I guess they are hoping that by unlocking their CPU/GPU to one another, they will strengthen their GPU further and give traditional CPU tasks a fighting chance on their APU platform that is struggling to compete with Intel. I sure hope it works, would love to see another PC price war. Whats killing the PC isn’t even its lack of mobility, its the cost. Why buy a $500-$1000 computer when on your phone or a $230 tablet can do the majority of what most consumers can even imagine doing with technology like email, facebook, surf.

          • DPete27
          • 7 years ago

          [quote<]Why buy a $500-$1000 computer when on your phone or a $230 tablet can do the majority of what most consumers can even imagine doing with technology like email, facebook, surf.[/quote<] (Climbs out from under rock) What? People don't buy desktops and laptops just for checking emal and basic web surfing anymore? By the way, $500-$1000 is a gaming PC IMO. Let's face reality here, desktops have been losing market share for years because of notebooks and more recently smartphones/tablets. The only niche left for "traditional" desktops is workstations and gaming. That's not necessarily a bad thing, we just have more choices. Smartphones, tablets, notebooks, nettops, and desktops all have their strengths and weaknesses. Back in the 90's you (basically) either had a desktop or no computer at all. Nowadays, all these devices have to share the pie. IMO, the reason for the tablet boom is that they're best suited for the tasks that the vast majority of consumers use frequently.

      • chuckula
      • 7 years ago

      [quote<]Either way, this reminds me of AMDs first few generation dual cores where they added the bridge between the two cores to share information. Changed the way the entire industry did things, opposed to Intel's first first dual cores being two CPU cores glued to one chip.[/quote<] Uh.. AMD is playing catch-up right now because Intel has had a shared-cache architecture for the IGP and the CPU going all the way back to Sandy Bridge. You really see it come into play with the Iris pro parts where the shared L4 cache gives a nice boost in compute workloads in addition to games. AMD up to and including Trinity is still using the standard memory controller to control interaction between the GPU and the CPU as if they were connected over a memory bus that happens to be built into the chip in silicon. The heterogeneous stuff is interesting, but there needs to be a massive amount of work done to make it available to developers in a meaningful way and it starts to make you wonder if AMD is about to abandon OpenCL in favor of another low-level AMD-only approach.

        • Fighterpilot
        • 7 years ago

        Why don’t you change your user name back to Rollo/Brian_S?

      • Theolendras
      • 7 years ago

      I would bet APU first, discrete eventually. It’s easier in a APU, close communication will make it more efficient, and as an hardware maker, AMD control everything in it’s APU. PCI-E trip will probably still be costly. I hope some software will be able offload some of it’s CPU load on the iGPU and leverage dGPU for the actual rendering.

    • NeelyCam
    • 7 years ago

    So when is Kaveri coming out? First of January?

      • sschaem
      • 7 years ago

      It will be out not long after Intel flood the market with 14nm Broadwells…

      • Unknown-Error
      • 7 years ago

      Neely, you should know that the NDA lifting date is also under NDA when it comes to A-M-Douche.

      • Srsly_Bro
      • 7 years ago

      I’ve heard April 1st from a solid source.

        • Klimax
        • 7 years ago

        Incorrect. I heard it’s 30th February…

      • willg
      • 7 years ago

      My guess is paper launch at AMD APU 13 in November, then full reviews/availability around CES.

Pin It on Pinterest

Share This