news amd sheds light on kaveris uniform memory architecture

AMD sheds light on Kaveri’s uniform memory architecture

At the Fusion Developer Summit last June, AMD CTO Mark Papermaster teased Kaveri, AMD’s next-generation APU due later this year. Among other things, Papermaster revealed that Kaveri will be based on the Steamroller architecture and that it will be the first AMD APU with fully shared memory.

Last week, AMD shed some more light on Kaveri’s uniform memory architecture, which now has a snazzy marketing name: heterogeneous uniform memory access, or hUMA for short.

Current APUs have non-uniform memory access (NUMA) between the processor and graphics logic. In those solutions, the CPU cores and IGP are both tied to system memory, but they each have their own separate memory pools. The processor cores must jump through hoops to access memory being used by the graphics hardware, and vice versa. Different heaps and different address spaces are involved, and when data needs to be shared, it has to be copied back and forth between the CPU and IGP pools. There is, as you’d expect, a performance cost to all those intermediate steps.

In Kaveri, hUMA takes away the hoops: the processor cores and integrated graphics have a shared address space, and they share both physical and virtual memory. Also, data is kept coherent between the CPU and IGP caches, so there are no cycles lost to synchronization like in current, NUMA-based solutions. All of this should translate into higher performance (and lower power utilization) in general-purpose GPU compute applications. Those applications tap into both the CPU cores and the IGP shaders and must pass data back and forth between them, which would require extra steps without hUMA. AMD said Kaveri’s hUMA architecture has been implemented entirely in hardware, so it should support any operating systems and programming models. Virtualization is supported, as well.

Will hUMA mean CPUs and discrete GPUs can share a unified pool of memory, too? Not quite. When the question came up during the briefing, AMD said hUMA “doesn’t directly unify those pools, but it does put them in a unified address space.” The company then stressed that bandwidth won’t be consistent between the CPU and discrete GPU memory pools—that is, GDDR5 graphics memory will be quicker, while DDR3 system memory will lag behind, so some hoop-jumping will still be required. (As an interesting side note, AMD then added that “people will be able to build APUs with either type of memory [DDR or GDDR] and then share any type of memory between the different processing cores on the APU.”)

Kaveri is due out in the second half of 2013—likely late in the year, judging by AMD’s latest processor roadmap. In addition to Steamroller CPU cores and hUMA, Kaveri will feature integrated graphics based on the same Graphics Core Next architecture as current Radeon HD 7000-series GPUs. Also, the chip will be fabbed on a 28-nm process, finer than the 32-nm process used to manufacture today’s A-series APUs (a.k.a. Trinity and Richland).

0 responses to “AMD sheds light on Kaveri’s uniform memory architecture

  1. Have you considered the obvious? You can use the chip in FM2 but you don’t get the GDDR5 access unless you use the new socket?

  2. That would be all you see. Perhaps you’re applying a bias to what he’s saying before reading it?

    He’s saying stop giving AMD shit, the world would suck without either them or Intel, but people, like yourself often times forget this and instead choose to pander to the masses in order to drum up popularity for yourself and make yourself feel better by making easy jabs. You did this to me for a bit before I called your shit on it and then you switched to AMD (because they can’t defend themselves).

    You’re way too literal, so much so you can’t even see the points of posts besides the list of comments supporting his main point inside. So here, I’ll remove the points backing up his overall argument so you can see it:

    [quote<]but we all forget the difficult times amd had when i.e. there were rumors that they were going out of the cpu market (thank goodness they didn't) i don't think that making such a product is easy, nor does it take so little time to release. i just hope that amd will be able to push forward and give intel a hard time cause if it fails, we may end up buying a cpu for 1000 euros that is a bit faster than haswell. i'm no amd fanboy. i have an intel cpu. but if steamroller matches intel's counterpart, hell yeah, i will buy one.[/quote<] It really still amazes me how you can call someone a fanboi when they own products for the people they don't 'support', myself included. That must really blow your mind.

  3. still, i’m pretty sure AMD announced their plans for an integrated GPU first, and Intel as usually managed to charge in with loads of money and fabs. I’m not sure they’d have considered it seriously if AMD hadn’t made their “fusion” plans a few years earlier.

  4. In the voice of Bobby as done by Howie Mandel, “That’s what I’m afraid of!”

  5. I read the question as about OS support. The ‘page tables’ are in memory which is shared and state change is interupt driven. These are the most relevant to the OS, and again, necesarily the same as before.

    The TLBs are just an optimization and a cache in some kind of local memory, and are not strictly functionally necessary and not necessary to share. How they work is very device dependant and they just for a particular ‘core’.

    I made no statements toward difficulty, only toward “how is this dealt with, how could you not have to update the OS”. This was for clarification.

  6. Name one thing in his comment that has anything to do with decency or courtesy.
    All I see is a poorly translated wishlist of AMD fanboy talking points from about 2005 before Core 2 came out and AMD decided that buying other companies for too much money and churning out powerpoints was more important than competition.

  7. I said TLB. The TLB is a circuit in the processor. The data describing the page mapping is stored in memory, but the logic to fetch it and parse it is the processor. Keeping that coordinated between identical CPUs is hard enough, but doing to between a CPU and a GPU–a device that has never had to deal with a TLB before.

    The kind of TLB implementation you would need to keep GPU fed is harder to design than the kind you’d need for a CPU–different kind of memory access patterns, etc.

    It’s not hard to see this is a very non-trivial undertanking.

  8. I believe he’s talking about a fully integrated GPU, not simply a GPU slapped on the die alongside the processor.

  9. Holy shit a comment about common sense and courtesy, don’t let Chuckula see this.

  10. It could be also like other AMD chips where it has more then one mode of operation… Like this could be dropped into existing sockets with a bios flash, but it doesn’t have super fast memory so it’s performance is cut down.

    No one mentions whether or not the has any GDDR on the chip either.

  11. Actually, comparing Kaveri to Ivy is just about as fair as comparing Kaveri to Broadwell. Let’s do the second one, so we can see some dismemberment again

  12. Kaveri “launching” in late 2013 is exactly the same thing as Haswell “launching” in 2012: the fabs start turning out chips, but you aren’t going to be able to buy them for several more months. You’ll see some buzzwords like “ramping” and “shipping for revenue” bandied about to justify that Kaveri “shipped” this year, but that doesn’t mean there will be one next to your Festivus pole.

  13. Uh… nothing about accessing GPU memory on a discrete card makes one lick of sense. You would basically be copying data to and from the video card in exactly the same manner that has been done for years using, at best, an 8GB/second PCIe bus (since AMD refuses to upgrade to PCIe 3.0). So basically, you found a way to make highly expensive and highly power consuming GDDR5 memory substantially slower than the DDR2 memory in my 5 year old desktop…

    The whole point of that presentation was that the exact same chip would have direct access to memory that would be shared between the GPU and CPU components transparently. If you want that, then you need a direct interface. Will there be DDR3-only versions of Kaveri that fit in FM2 sockets? Sure, but that’s exactly what you are getting: a DDR3-only chip. Anything with DDR5 is going to be soldered to a PCB that looks like a video card on steroids.

  14. It’s all that Sony music they keep piping through their building speakers. If they didn’t dance, they’d weep.

  15. Intel was the first to integrate the GPU onto a CPU, but it wasn’t a great IGP. AMD simply said they wanted to do it first. It took Intel to do it first. Another similar case was Intel being the first to go quad core. AMD said they wanted to do it first, but didn’t want to cut the same corners Intel did (putting two dual core CPU’s on a board to merge them).

    AMD went x64 and the reason it succeeded was because EVERYONE believed/followed. What “noone believed/followed” was the Itanium failure. That’s why AMD succeeded.

    I think you should have mentioned the fact that AMD was the one who proved definitively that an integrated memory controller really made performance fly for CPU’s if you’re trying to make the case that AMD is technologically leading Intel by the nose.

    Not that I think it’s a great argument. I mean, sure, those instances are true. Then again, Intel has been leading the way on power/performance beyond having better IPC with Hyperthreading, Speedstep, and Turbo Boost. Moreover, Intel brought SSD’s to the mainstream-ish price range. Intel gave us far more reliable chipsets back in a time when they were prone to being horrible by other companies. Intel gave us Centrino that gave us far more performance for the same amount of battery life. Intel gave us SSE and all its derivatives, which were far and away so superior to AMD’s alternatives that even AMD uses them now.

    That’s ignoring the obvious: Intel gave us x86. Intel gave us socketable CPU’s available to the regular user. Intel has continued to make enthusiast CPU’s despite the fact they can’t make a lot of money off the market segment.

    To my eyes, Intel is like the rich uncle who’s stuck up but (mostly) knows how to hang with the kids because they make allowances for him because he’s so rich he has all the cool toys. AMD’s like the dirty uncle who wants you to get in “at the ground floor” on “an exciting opportunity” if you’ll just excuse his bad breath and the creditors calling every hour or so he keeps dodging.

    Neither one is really all that awesome in reality, but they have their uses and their drawbacks. Neither one is “better” because they’re corporations. When AMD was the only GPU maker for a generation, they’ll rip you off (ie., 7970). When AMD needs a review to go their way, they’re not above limiting what the reviewers can say to put them in a favorable light (ie., Piledriver).

    Intel does the same when they need to (ie., end of Pentium III/all of Pentium IV-RAMBUS/V era).

    They both have advanced the industry, but lately AMD’s done not much more than stall and hope their money situation gets better by the end of the year. They’re treading water with their bundles, but that’ll only work for a little while before it’s seen for the desperation it is.

  16. Both your assumptions could be true. Kaveri could be compatible with socket FM2 because we haven’t seen the actual chip yet and the pin rearrangement could mean it’s a simple BIOS update to get it on. Using hUMA, Kaveri could also access two memory pools – the DDR3 system RAM and the GDDR5 RAM on the discrete graphics chips.

    So you can have your cake and eat it, in theory. All that’s left to do, is wait.

  17. If you put that ‘laptop’ on your lap, the only thing it will be steamrolling will be your balls.


  18. on the other hand, to answer to rootheday3, yes, intel was 1st. but amd having advertised the fusion platform so long before release makes me wonder if intel’s move was just to release such a product 1st.. we all know that intel being the huge company they are have the resources to make a drastic change.

    anyway, next year seems to be a really interesting one since amd seems to try to catch up to intel on the performance level. i surely hope they succeed cause it’s better for us.

    competition drives the prices down while (usually) everything else up

    Cheers !!!

  19. i don’t get why someone posted a negative on this one… weird… anyway, just looked it up in case i made a mistake. the 1st generation of the i3 had 2 separate cores (maybe using “die” confused someone here… oh well, i already apologized for my bad english)


  20. The battle of the mediocre that no self respecting gamer would use for their graphics needs.

  21. This started long, long ago, years before consumers thought it was possible that companies were thinking of these concepts. There was one pointer, though – the Xbox 360. It gave us the first APU in 2010 (“Trinity” hardware revision) and had an early version of hUMA with dynamically expanding memory splitting the 512MB RAM between the GPU and CPU as needed in various games as far back as 2006.

    AMD saw the writing on the wall for integrated chips years before and the reason why they bought ATi was because they already had the expertise the company was looking for, having produced the Xbox’s Xenos graphics core. Before then they were the first to integrate the memory controller onto the processor with the Athlon 64.

    So yeah, Bulldozer was all part of a master plan. It might not have been a very good idea given market conditions on its release, but it was necessary (Hey! Just like Windows Vista) to bring us where we are today.

  22. At comparable price points, Kaveri rapes Haswell front and back and kicks it into the gutter … but no dismemberment.

  23. A comparable price point would be Kaveri vs. I3-3220 straight up. That would be a violent rape followed by dismemberment.

  24. PS4 programmers are dancing in the aisles every day on the way to their workstations.

  25. I’m running an FX-8350 right now but honestly, I think I’d be perfectly happy with a Kaveri-based desktop. Augment that with a Kaveri-based laptop and you’d be Steamrollin’ all the way!

  26. [quote<][Edit: I wonder how many people got the irony of the "dancing in the aisles" and "blows X away in every dimension" lines. Some people need to review the history of the hypefest that occurred during the Barcelona development process.][/quote<] Those were probably [i<]ironic[/i<] downvotes. /sarcasm The scary thing is, very little separates an ironic post from a fanboi post except intent.

  27. I disagree with other two replies on point #4. GPGPU/OpenCL applications are the ones that will benefit from this most. Today’s games aren’t really OpenCL-aware (with exception of some newcomers such as Tomb Raider) but other applications are. They will transparently benefit because data won’t have to be transferred around anymore.

    As more OpenCL-aware applications and games appear (especially thanks to AMD’s design wins in PS4 and Xbox 720) the difference will be even more pronounced, with AMD finally catching up to Intel’s raw x86 performance advantage or perhaps even overtaking them.

  28. Try reading the system guide: [url<][/url<] Completely dropping the GPU buys you a 20% price savings... on an already cheaper system.

  29. not really, they said the PS4 was using an AMD chip and that Sony “tweaked” it. For all we knew the shared memory was part of that tweaking, and AMD was still going with NUMA designs for their own products.

  30. This is better. essentialy a memory hyper-visor in metal to main memory…. whatever it may be. Along with the volatile bit for gpu/cpu l2 sharing this has potential.

  31. This discussion is about unified memory between CPU and GPU. I don’t think it matters what kind of CPU cores are used.

  32. This isnt really a right or wrong thing. This is enevitable. The questions are: how much later from now? in what form? and whos driving? This is the 64-bit qustion… so to speak. Same as before.

  33. Easier prediction: Ivy Bridges already on the market now + 2011 era GTX 560 DESTROYS Kaveri in gaming.

  34. Im saying the 32MB cache is a relevant closely related analog to what OP was suggesting. Not mean to suggest is or is not stacked in that case.

  35. [quote<]To quote Aerosmith: It's the same ol', same ol' situaaaaation.[/quote<] Hey, hey i bought a Phenom II right after SB was released. I'm doing my part unlike you guys.

  36. Page tables are in… memory, and the OS side is driven by interupt automatically so it really doesnt *have* to know whats going on at that level. The rest is APU side implementation. A lot of this is already being dealt with in one way or another just in order to deal with all the ‘PC’ features that exists now.

  37. I like your thinking on this one. Not to say Intel hasn’t broken quite a lot of grounds on their own, I have to agree that they have been less aggressive in their developments in the recent years. To give them credit, there are apparently lots of complications with these lower process nodes.

    You might be interested in this thread on the forums:


  38. My guess is that there will be different Kaveri releases, one that’s socket and DDR3 compatible and another that uses GDDR5 exclusively and that the latter will most likely be in embedded systems only.

  39. ok, Intel released SandyBridge with GPU integrated in the CPU die (Jan 2011) before Llano was released (~April 2011?) and approximately the same time as Brazos (Jan 2011).

    For what its worth, other SoCs in the phone space had the GPU integrated much earlier.. And Intel had a design with CPU, memory controller, and GPU all integrated on one die back in ~2000:
    [url<][/url<] Unfortunately, Timna was a victim of the RDRAM fiasco...

  40. 1. Kaveri is compatible with socket FM2!
    2. Kaveri directly accesses GDDR5!

    At least one of those points ain’t going to hold up in the real world. I’ve got some bad news for all you Intel haters out there: If AMD ships Kaveri that actually supports GDDR5, then you are either getting BGA’d (likely) or there’s going to be another incompatible socket for those chips. Commence the spin-cycle for how Broadwell-BGA is evil but Kaveri-BGA is a miracle in 3, 2, 1…..

  41. if i remember correctly it wasn’t embedded in the cpu die rather than being aside it. 2 different dies in a “mother die”

    something like the 1st athlon 64 which was 2*x32 cores with some registers to operate some x64 orders…

  42. [quote<]Hopefully the anti-trust regulators will do their jobs and destroy all of Intel's inventory and "fabrication" facilities where they make their "fabricated" fake-chips so that consumers finally have the freedom to buy the AMD products that they truly want.[/quote<] Because nothing else spells better "freedom to buy" than only one choice (in this case AMD, since Intel will be no more) I hope you were joking, or at least you were a little drunk.

  43. Holy TLB nightmare, Batman!

    The TLB in the CPU and the TLB in the GPU will have to be kept in sync (let’s assume only the CPU can actually change it) and the CPU will have to take page faults for the GPU when it references memory with a missing TLB entry. How is that going to be done without OS support? Maybe they’ll include a dedicated chunk of table walking logic in the GPU side of things so that’s not needed, but, yikes. TLB manintenance is already one of those parts of OS design that has big headers warning “IF YOU DON’T KNOW EXACTLY WHAT YOU’RE DOING, DO *NOT* EDIT THIS CODE!!!111eleventy!”

    Please let them have correctible firmware for that little chunk of logic, because it’s going to have bugs.

  44. AMD could be wrong again. But as of today AMD put kaveri to Q4 2013
    AMD position Richland as a less then a year stopgap between trinity & kaveri.

    Haswell will make sure nothing changes (keep AMD down to the low margin market)
    AMD will have a hard time with kaveri, unless it win some ‘gaming’ tablet type mind share

  45. [quote<]amd put the gpu in the cpu die and again noone believed they could make something. now intel follows.[/quote<] It's my understanding that the first Intel Core i3/i5/i7 processors had an IGP launched Q1 2010 (first "Intel HD Graphics") whereas Llano wasn't launched until Q1 2011.

  46. But, they also pointed out that th PS4 was going to use Jaguar cores not the Steamroller cores used in Richland. So, that revelation has no bearing on this discussion.

  47. Serously doubt kaveri is launching that soon. Why would they take all the time to do richland then release kaveri a few months later? I doubt kaveri will be out in 2013.

    I want to see kaveri in action and I think I will like what I see but intel’s haswell will also have the unified memory structure (but no GDDR5).

    I’m also afraid that we may initially see some security issues if the gpu can access cpu memory.

  48. They already did. When the PS4 specs were released they explicitly pointed out that the 8GB of GDDR5 onboard would be shared by the CPU and GPU.

  49. [quote<]Along with CPU dedicated ddr3 slots also.[/quote<] Which would break the whole point of unified memory. I guess they could make DDR3 and GDDR5 have a unified address space in theory, but managing it would be hard (uneven latency etc) and I don't think the APUs are that capable yet. That's why I'm very eager to see what CPU performance impact GDDR has.

  50. i never meant that. my post was totally on the “comment section” of amd doing everything they can to survive and stay competitive.

    i DO know that it’s not our comments that have put amd in the situation it now is. but reading comments on other forums too, everyone puts them down forgetting that it’s them that are risky and truly innovate. it’s them who help to drive the market to somewhere.

    and of course, i understand that their mistakes + other stuff has put them to the current position. but if it weren’t for Athlon 64, we might not have Windows 7 x64. like this move with the apu. who knows where the market will be headed after 3-4 years. but they do have a vision (or a dream) and they try to get there. intel makes really fast cpus and that’s it. they may innovate in other areas (trigate or whatever) but do they truly innovate to drive the market? i would expect more from the cpu giant intel is.

    oh well

    Cheers !!!

  51. Doubt Kaveri sports 7000 class GCN architecture when it is on the same engineering/fabrication timeline as the PS4 APU which uses 8000 class GCN architecture – GCN 2.0.

    Kaveri is first and foremost about gaming, which is the key to AMD’s broader HSA effort, and is the primary realizable differentiator from Intel. The obvious route there would have been to have a Kaveri modification team running parallel with the PS4 and Xbox 720 teams, optimizing Kaveri to take fullest advantage of the console architectures to increase it’s gaming chops. Hence Richland and the delay to the end of 2013, which also happens to be concurrent with the release of the PS4 and Xbox 720.

    So the PS4 and Xbox 720 release and concurrently the PC versions of an array of next gen games release and … Voila! … AMD releases Kaveri which has turned into a synergistic gaming monster and starts chowing down on Intel’s gaming market share. Along with Kaveri AMD releases the 8xxx cards, which are now additive to Kaveri’s GPU and AMD starts chowing down on Nvidia’s market share.

    And a huge and growing pool of programmers get more and more experienced with HSA in general and AMD APU HSA in particular that will substantially help the broader HSA effort into the future.

    HSA has been the missing piece of the Bulldozer architecture. With Kaveri we’re going to finally see it all come together. Gonna be fun.

  52. It would be nice to be able to have GDDR5 on sticks and instead of having the GDDR5 soldered onto the MB. It could have a GDDR5 dedicated memory slot or slots for GPU dimms. Along with CPU dedicated ddr3 slots also.
    I wonder if having both types of dedicated memory installed on the board at the same time will supercharge memory bandwidth and system overall performance.
    I think removing the memory bottleneck the current AMD APUs would give Kaveri 7770 performance or more.
    Making it ideal for Micro atx systems that can game great at 1080p and make fantastic HTPCs.

    I am really looking forward to seeing IGP graphics performance along with GPGPU performance. Along with CPU performance if GDDR5 memory is used with the CPU along with the comparison performance figures against DDR3 1866 memory.

  53. You’ve clearly been reading too much S|A :). At least, to the extent that S|A still allows it.

  54. [quote<]If AMD follows what they've done in the last few years the feature would stay unused until Intel launches the feature on their CPUs along with a decent compiler.[/quote<] Sad, but funny because it's true. It's always perplexed me why AMD isn't more pro-active about this. They may not be able to change proprietary software without dropping lots of cash, which they don't have, but they could work with the FOSS community better. GCC and LLVM are open source compilers, and they could sponsor someone to write patches and get them integrated upstream.

  55. I would like to see people going Ballmer at AMD. On a more large-scale version of frenzy than just Ballmer, of course.

    For the sake of the joke, shouldn’t we be calling the current lineup an overbuilt P3? A few more spelling errors might help too.

  56. The best part is that GF don’t have a 14nm node even on the roadmap. They have 20nm-SHP, and 10nm-SHP. 14nm-XM is not suitable for desktop or laptop AMD CPUs, it’s for mobile.

  57. [quote<]Also, the chip will be fabbed on a 28-nm process, finer than the 32-nm process used to manufacture today's A-series APUs (a.k.a. Trinity and Richland).[/quote<] Oh Yeah! Here it comes! Where are Intel's products at 28-nm? Not seeing any? That's because Intel doesn't have the sheer manufacturing power of GloFo (which is a wholly-owned subsidiary of AMD and is devoted to serving AMD's needs exclusively). Intel isn't even in the same league as GloFo, that's for sure!

  58. I was down at the AMD HQ the other day and they were dancing in the aisles over the latest stepping of Kaveri… we’re calling it the Miracle Stepping since Kaveri now has the full APU experience with 8000 series graphics and the ultimate in Steamroller core technology. Kaveri blows Broadfail and Skyfail away in every dimension.

    You see, AMD is at least 20 years ahead of Intel in real innovation right now. Kaveri’s integrated design blows away Hasbeen’s joke of a 1990’s era GPU that is superglued to a poorly overclocked P4. Do you know that Intel won’t even have products that use DDR4 out until 2015? Kaveri is already shipping in products that use DDR5 [i<]today[/i<]. The game is over and AMD has won. Hopefully the anti-trust regulators will do their jobs and destroy all of Intel's inventory and "fabrication" facilities where they make their "fabricated" fake-chips so that consumers finally have the freedom to buy the AMD products that they truly want. [Edit: I wonder how many people got the irony of the "dancing in the aisles" and "blows X away in every dimension" lines. Some people need to review the history of the hypefest that occurred during the Barcelona development process.]

  59. Either is probably possible. I would guess the chips would be soldered on the board, but I’m sure a company like Sony or Microsoft could pay a manufacturer to manufacture some GDDR DIMMS.

  60. [quote<]but if x matches intel's counterpart, hell yeah, i will buy one.[/quote<] To quote Aerosmith: It's the same ol', same ol' situaaaaation. AMD has been running behind for almost 7 years now since Core 2 popped out of the evil evil bad Intel labs. It's not lack of faith, belief, following, or positive thought that hurts AMD. It's the lack of compelling processors.

  61. Bulldozer was heavily biased towards servers workloads, so the weak floating point capability wasn’t a detriment in that space. Having the IGP take over the floating-point duties may have been there, but at the time AMD made a conscience decision to focus on the needs of the server market instead of the consumer market. Remember, they said they were ceding the extreme performance desktop market to Intel at that time.

  62. [quote<]"people will be able to build APUs with either type of memory [DDR or GDDR] and then share any type of memory between the different processing cores on the APU."[/quote<] How will that work, will they sell GDDR5 memory on sticks? Or is it more of an integrated into the board thing? In which case, can you still upgrade the memory somehow? I'm eager to see that tested, I wonder if GDDR5s latency will hurt the CPU, or if the bandwidth will help more.

  63. I expect that’s true of most people that read this site. We might be happy with an APU in a tablet or a smart phone but it’s likely everyone reading the site wants more performance than an APU can currently provide for their laptop or desktop PC. However there’s a huge market out there that would be perfectly happy with the performance of an APU and that’s what AMD is after.

    They’ve said they are not so interested in pushing products for the enthusiast now as it’s a limited market. Given their size and monetary issues I don’t blame them. The margins may be less in the mass market but the potential revenue is much greater. Of course Intel sees this too and is why they are pushing hard on their low TDP designs.

    In the end we can all only win as AMD and Intel compete. (At least I hope AMD continues to produce products that can compete.)

  64. Oh don’t be such snobs. Obviously these slides are being seen by a broader audience, and I’m sure they anticipated that. There’s nothing wrong with trying to make technical information a bit more accessible to a broader audience!

  65. This is both awesome and not awesome at all. I really like that AMD is making their APUs good, but I have very little interest in desktop ones. The only way I would be interested in buying one is if I build a computer for someone who plays casual games.

    What I am looking forward to though is seeing this in whatever APU AMD makes after temash. AMD has great low power laptop APUs and I am looking forward to seeing them keep making progress with them.

  66. (hello everyone. i’ve been reading TR for more than a decade but only now i decided to make an account to post a comment.
    P.S. excuse my English. i’m Greek so, as you understand, English is not my mother tongue… 🙂 )

    so, please, correct me if i’m wrong here.

    amd started so many things that we tend to forget. and the bad thing is that, while amd is bombarded with non-believer comments, they still drive the industry forward.

    amd went to x64 and while noone believed/followed, we now have 64bit windows/cpus/apps.
    amd put the gpu in the cpu die and again noone believed they could make something. now intel follows.
    amd keeps the cpu sockets for far longer than intel and noone thinks positively.
    now we’re talking about gpu/cpu unified memory pools and again we’re all complaining about ATi buyout and late in market products (see kaveri).

    but we all forget the difficult times amd had when i.e. there were rumors that they were going out of the cpu market (thank goodness they didn’t)

    i don’t think that making such a product is easy, nor does it take so little time to release. i just hope that amd will be able to push forward and give intel a hard time cause if it fails, we may end up buying a cpu for 1000 euros that is a bit faster than haswell.

    i’m no amd fanboy. i have an intel cpu. but if steamroller matches intel’s counterpart, hell yeah, i will buy one.

    anyway, sorry for the long post

    Cheers !!!

  67. 1) Drop some die for stack and simd based. 2) Let “gpu’ operate in same memory domain. 3) deciminate from bottom to top of product stack. 4) Profit.

  68. [quote<]Does it make me a sandwich?[/quote<] That depends on what you want in the sandwich. 🙂

  69. The eventual long term goal for AMD is to create a hybrid microprocessor that will have GPU elements on a processor at an architectural level…To have a microprocessor that seamlessly and dynamically adjusts itself based on workload in order to get the best of both GPGPU and CPU worlds in one convenient solution. They’ll eventually achieve their long term goal in 2015/2016-ish. (Presuming nothing major bad happens!)

    Right now, the GPU-based IGP and CPU are still in distinct segments on the APU silicon. This uniform memory architecture is just one of the stages on getting to where they want to go.

  70. late 2013 and thats the best the ATI+AMD ‘merger’ was able to accomplish toward fusion?

    Disappointing, but finally a step forward.

    I recall a few developer asking for a simple version of this over 10 years ago.
    Back then the memory pool, even so unified on IGP motherboards, was mapped via two bus, and required AGP arbitration.
    So to have the cpu/gpu work on a block of memory , it required AGP bus transaction, even so its was on the same physical ddr dimm.
    This could have easily been solved back then, but nobody cared about CPU/GPU working on the same data set.
    How painfull was it ? ~10 megabyte a second.
    Well, that was until TR exposed this in 2002, and nvidia and ATI turned on AGP DMA for a 10x to 20x speedup for sharing data between CPU & GPU.

    What a long, long road this as been.

  71. I wonder how far ahead they plan this stuff. Was Bulldozer designed with such a lopsided integer-to-floating point capability because they intended to marry it with some Radeon cores and unified memory from the start?

  72. AMD does not provide drivers for this but you can probably find third party drivers that will allow this APU to make you a sandwich.

  73. The pointer thing is the only thing I could see. I couldn’t get past it. Pretty funny.

  74. 1. No, it’s in hardware; the BIOS has to know about it, but it would have anyway
    2-3. See 1
    4. Likely, unless bottlenecked by the CPU cores or IGP
    5. They’ll come when the hardware is available. The cores can have AVX/2 and the IGP to help. My 2 cents is that everything is in the compiler. If AMD follows what they’ve done in the last few years the feature would stay unused until Intel launches the feature on their CPUs along with a decent compiler. Hope I’m wrong.

  75. Too many pins and DMA exists already. This doesnt solve the conherency and address space problem, which is what this change is about.

    For faster, like the pins, its kind of a physical problem and why gddr forms get soldered. Socket has issues. You could concieveably solder some gddr but you would still have mammoth physical issues for pins. The current direction is the stacked die/vias memory, which is sort of like the opposite of dealing with large parallel socketed high speed issues. Mem sticks are the last of its kind in a ‘pc’.

    Smaller stacked mem in package is like what you are talking about, and sort of like the XBOX next way. The huma part is sort of like PS4. In reality its alacarte, you can do both at the same time, and probaby what you will see in the end. 🙂 The gating factor is getting the testing and packaging economical right now.

    So yes. Not now, and not on the board.

  76. [quote<]Current APUs have non-uniform memory access (NUMA) between the processor and graphics logic. In those solutions, the CPU cores and IGP are both tied to system memory, but they each have their own separate memory pools. The processor cores must jump through hoops to access memory being used by the graphics hardware, and vice versa. Different heaps and different address spaces are involved, and when data needs to be shared, it has to be copied back and forth between the CPU and IGP pools. There is, as you'd expect, a performance cost to all those intermediate steps.[/quote<] Question, why not add another RAM slot on the motherboard that will be used only the the IGP? So instead of 2 RAM slots on small form factor mobos we'd get 3 and 5 on bigger ones. Wouldn't they be both (CPU and IGP) faster with dedicated RAM compared to shared RAM? Or would the performance be about the same, well except for cases where you'd run out of physical RAM on the shared solution.

  77. So, all along they’ve been talking about unifying CPU and GPU. This doesn’t really make them one, but at least now they’re more able to work with one pool of memory. And it’s a simple and elegant solution, pointers, that is.

    Personally I’m not sure exactly how they’re going to completely unify the CPU and GPU into one truly hybrid core. And more importantly, is it going to dramatically improve performance or energy efficiency? Or cost? To me, this is about as far as combining CPU and GPU can go. Then again, I’m not Mark Papermaster.

  78. 1. No. Breaks too many things at once. Pretty much everything currently copies bitmaps, including fallback vesa drivers on Windows.

    2. Blue would not be late enough to risk explicit support like explicit GPU drivers would. Other support may be enhanced peripherally for the CPU/s.

    3. Same for ‘1.’ for Linux. Needs to boot. Needs to support bulk of existing software. Driver updates are a different issue.

    4. In a blissfully unaware way, like limited optimizations in previous APUs exist altready. This dramatically changes the game for the driver and friends. i.e. DMA was fun, but this is fundementally way better.

    5. No. But its like having the opportunity to read the same letter sitting on the table together rather than having use the postal system. 🙂 As far as float and vector math there are plenty of of good comparisions out there for throughput between them.

  79. Wow, technical briefing slides with a footnote that explains “pointer”? Oh…kay.

    Nothing really unexpected here, though if it’s true that the APU can be used with any combination of GDDR and DDR we may see the return of “sideport” memory in a form that actually makes a difference, though I’d expect to see that more as a point of differentiation amongst various mobile configurations rather than an open slot that end-users can upgrade).

  80. -Few questions:
    1. Does this hUMA need an operating system that’s aware (explicitly supports) the feature (does “any operating systems” means current/existing ones too)?
    2. If yes, will Windows “Blue” contains the required enhancement(s)?
    3. Also, is the current Linux kernel already prepared for this/patches already being put in (if yes to question #1)?
    4. Will it has any impact on existing applications? Will current games benefit from video driver-side update/tweaks (with this hUMA)?
    5. Any performance estimates? Comparison between calculation on CPU’s FPU+AVX/AVX2 and the APU’s integrated GPU+hUMA would be nice.

    -BTW, I’m a bit disappointed that the imminent “Temash” and “Kabini” don’t have this feature, considering they use the same “Jaguar” cores to be used in PS4, which have unified memory (between CPU & GPU). Yes, PS4 mostly implement its unified memory using AMD’s hUMA technique.


  81. so this is essentially what’s going on with the PS4? why didn’t they say so in the first place?