Llano CPU-GPU chimera begins sampling

That Llano processor with its four cores and integrated GPU with may arrive sooner than you think. When AMD revealed its latest desktop and mobile roadmaps last November, Llano’s arrival was set some time in 2011. During AMD’s latest financial conference call, however, AMD CEO Dirk Meyer provided more specific information. X-bit labs jotted down Meyer’s statement:

We plan to commence volume production in the back half of this year. We do now have internal samples of both of our initial Fusion designs, we are learning quite a lot, and are quite happy with what we see, and we started sampling to select customers, one of those two designs.

Perhaps those "select customers" include Apple, which is now rumored to have AMD chips kicking around in its labs. Llano seems like a prime candidate for integration in future Macs.

As we learned in February, Llano will feature four Phenom II-derived cores clocked above 3GHz, 4MB of L2 cache, a graphics processor, and a DDR3 memory controller, all sharing the same slice of 32-nm silicon—and manufactured using GlobalFoundries’ silicon-on-insulator process technology, of course. Considering AMD’s Turbo Core dynamic clock scaling scheme will soon debut in six-core Phenom IIs, we wouldn’t be surprised if Llano included a similar feature, as well.

Comments closed
    • NeelyCam
    • 10 years ago

    Chimera had three heads.

      • UberGerbil
      • 10 years ago

      In this case it’s a /[

        • Sahrin
        • 10 years ago

        A better list would be:

        GPU, CPU and Northbridge/Uncore.

        And given those three elements, I think it *certainly* qualifies.

    • ronch
    • 10 years ago

    Any word on how good the GPU on this thing is? GPUs normally run around 700MHz, and the CPU cores are probably gonna run at around 3GHz. Will this be a mixed clock device? Or will the GPU run at 3GHz too? Any word on how many stream processors the GPU will have?

      • MadManOriginal
      • 10 years ago

      If the GPU ran at 3GHz that would be f*ing insane…but very cool. I’d think it would have problems with memory bandwidth at that speed though.

        • UberGerbil
        • 10 years ago

        It’s going to be limited by memory no matter what, if it doesn’t have something like sideport (but better).

      • Anonymous Coward
      • 10 years ago

      All these modern on-die memory controller, L3-ladden chips already are mixed clock devices. The cores run at all sorts of speeds independent of each other and of the L3 cache and the memory controller.

    • NeelyCam
    • 10 years ago

    Is the graphics core any good?

      • jdaven
      • 10 years ago

      It has been said that it is on par with the Radeon 5670 for Llano’s fastest spec since it has 400 SP’s.

        • NeelyCam
        • 10 years ago

        I’d call that fairly sufficient… As long as it doesn’t suck too much juice, it looks like a winner.

        I can’t wait to see what intel has up its sleeve to battle this baby. They must know how formidable Llano will be…

        • BlackStar
        • 10 years ago

        It will probably be slower due to lack of dedicated memory (unless we get some sideport memory). Still, that’s a couple of orders of magnitude faster than Intel’s offerings, plus it’s going to have better drivers to boot (OpenGL 4.0, D3D11, OpenCL). Even if the CPU cores were 20% slower than Intel’s offerings (doubtful), that’s a net win in the end.

        Llano is shaping up into a formidable combo.

        Not to mention that 3d developers everywhere would rejoice. Intel’s IGPs are our personal nightmares!

        • bimmerlovere39
        • 10 years ago

        So that’s roughly 4770 performance. Wow.

          • OneArmedScissor
          • 10 years ago

          Whoa there! Don’t get ahead of yourself. The 4770 has 640 SPs at a relatively high clock speed, and like 60GB/s of bandwidth.

          That is literally impossible for an integrated GPU at this point.

        • MuParadigm
        • 10 years ago

        It’ll probably be running on DDR3, though, which would give it something like low-end 5570 performance.

      • wiak
      • 10 years ago

      miles ahead of all Intel integrated crapstics
      LIano is basicly 4 CPU cores on the side of the 400 DX11 GPU Shaders
      intel has cpu on the same die but not together

      this looks to be a winner why? Apple do support OpenCL, and we do know know amd does to

        • maroon1
        • 10 years ago

        Sandy Bridge’s CPU and GPU are going to be on one die, and L3 cache will also be shared with the integrated graphic core.

          • Jigar
          • 10 years ago

          Still 1 km ahead of it. 😛

          • mesyn191
          • 10 years ago

          Yes, lots of on die integration is impressive in its own way. That however doesn’t mean that GPU performance will be anywhere near what Llano’s is rumored to be (ie. near 4750 or 4770 performance).

          If all Intel does is update their current GPU with a clock speed bump and/or a bandwidth increase they won’t get anywhere near Llano’s performance in games or GPGPU apps.

          They’ll probably still win soundly on CPU performance, but a moderately clocked quad core PhII is no slouch either. Unless they screw up somehow AMD looks to have a real solid winning platform on their hands.

            • Meadows
            • 10 years ago

            Those rumours are wrong, it’ll use the equivalent of GDDR2 memory and even have to share it with a CPU.

            • mesyn191
            • 10 years ago

            My understanding is that the CPU doesn’t really need all that much bandwidth most of the time, so the GPU will be able to use most of that bandwidth.

            There are plenty of laptop GPU’s that use DDR3 as VRAM that still get decent performance (in 128 bit configurations too), so I’m not too worried.

            • Meadows
            • 10 years ago

            Sure it might perform, just not where these hopeful people are expecting.

    • esterhasz
    • 10 years ago

    true, latency is a problem but getting the GPU on die and some smart caching will help. IMO, the real problem is getting compilers and dev tools to the point where complexity is sufficiently abstracted without losing too much speed…

    • flip-mode
    • 10 years ago

    AMD is getting back in a stride of sorts. It is so nice to hear good rather than sour news about AMD.

      • sschaem
      • 10 years ago

      AMD is really ‘milking’ this K8 architecture until its dry&crusty…

      If Fusion is not out next year, AMD is done for in the x86 business.

        • StashTheVampede
        • 10 years ago

        Phenom isn’t directly K8. K8 was very successful for AMD, but they have had several upgrades over that tech in what is now the current Phenom2 core.

        Fusion will take the very mature Phenom2 tech and pair it up with GPU to make fusion. On a separate track, they’ll be working on Bulldozer which is all new as well. There will be some time when Bulldozer is mature enough to match it to a GPU.

        And what’s wrong with milking their R&D? Plenty of other companies do it.

          • Goty
          • 10 years ago

          The really amazing thing is that the execution units of all of AMD’s processors are still very similar to those used in the AthlonXP. Talk about a great design, even if performance isn’t exactly bleeding edge anymore.

            • mesyn191
            • 10 years ago

            More like original Athlon, not AthlonXP.

            K11 or whatever its named is supposed to be their first truly new arch. in a very very long time. Hopefully they haven’t screwed the pooch again. We won’t know until very late 2010 or early 2011 though.

            • Sahrin
            • 10 years ago

            You guys have no idea what your talking about. K10 has the same execution units as K7 in the same way that Nehalem has the same execution units as 486. Which is to say, not at all.

            • Chryx
            • 10 years ago

            i7’s design can be traced back to Pentium Pro
            Phenom II’s design can be traced back to K7

            Both have been heavily reworked, but there are definite familial resemblances. unlike, say, the Pentium 4, which is a drastically different beast.

            • ronch
            • 10 years ago

            How can you say K10 is not an evolution of K7? The original K7 had three ALUs with an AGU for each, 3 FPUs, 3 decoders, etc. In fact, as AMD was designing the original K8 design many years ago, it was scrapped in favor of an evolved K7 architecture, and so the present K8 was born. K8 was basically a K7 with Hypertransport, an IMC, more instructions hard wired / fast path, extended pipelines (K7 had 10 stages, K8 had 12, for Integer Execution, at least), provisions for dual-core computing, beefed up branch prediction, and most importantly, 64-bit capabilities. If you look at a silicon die of K8 it is actually very similar to K7. In fact, some reviewers back in 2003 were kinda shocked to see K8 as very similar to K7.

            Now with the K10, AMD actually did less to evolve the K8 than it did with K7 to K8. Fetch operations were just widened, branch prediction beefed up further, some more instructions are now fast path, some new power management features added, widened FPU data paths (the old FPU data path and new data path look quite similar, actually. Copy and paste?), etc.
            And if you look at a K8 core and a K10 core, again, they look very similar.

            From K7 to K10.5, the cores look very similar. In fact, the same 64K + 64K L1 caches are still there, the FPUs are still where they were 11 years ago, the decode logic, ROB, reservation stations, branch logic, AGUs, etc. are still where they pretty much were back in 1999.

            I’m not saying this is a bad thing. In fact, I’m amazed at how far AMD can push its architecture. It really is a great architecture. How much performance delta is there between an Athlon 500 and a Phenom II X4 965? Even Intel must be amazed. But my concern here is that AMD has stuck too long with this architecture. They haven’t been exercising their brains for too long in coming up with newer, more innovative architectures. Bulldozer is gonna prove me wrong, but they constantly have to prove that they have the engineering prowess to match Intel. They have to have a better tick tock strategy, not just their new Velocity stuff where they promise new GPU architectures glued to their CPUs every year. That’s cool if you can tap the GPU for general purpose computing, but for graphics functions, it’s not that amazing, honestly.

            Going back, yes, AMD has milked the original K7 architecture for too long. It’s time to move on. I just hope Bulldozer wouldn’t be milked till 2022 or something.

            • Sahrin
            • 10 years ago

            “How can you say K10 is not an evolution of K7?”

            I don’t think you can, which is why I didn’t say that. I was pointing out that saying that the K10 is ‘the same as’ K7 is the same as saying that Nehalem is the same as P6/486. (I should’ve said Pentium Pro/P6 instead of 486, though).

            • ronch
            • 10 years ago

            Oh, ok. Yeah, Pentium Pro/P6 would’ve been much more appropriate.

            No harm done. 😀

            • Anonymous Coward
            • 10 years ago

            I think there is a lot more validity in equating K10 to K7 than there is equating Nehalem to P6.

            • mesyn191
            • 10 years ago

            Uh, the post I replied to said similar, while I didn’t use that word I also didn’t say “exactly the same” either. I was just pointing out that PhII’s lineage went back further than “just” the AthlonXP.

        • Corrado
        • 10 years ago

        The same could be said for Intel. They really milked that P3 core.

          • cygnus1
          • 10 years ago

          exactly, p pro, p2, p3, p m, core, core 2, core i

          if that’s not milking an arch, i don’t know what is

        • Xenolith
        • 10 years ago

        Actually x86 is getting dry/crusty.

          • shank15217
          • 10 years ago

          Funny it’s also becoming the fastest and defacto standard.

            • khands
            • 10 years ago

            It’s reaching a lot of limitations actually, x64 looks like the future.

        • SomeOtherGeek
        • 10 years ago

        I don’t get it… Why fix something that isn’t broken?

          • OneArmedScissor
          • 10 years ago

          Because then you can spend $1,000 to show off screen shots of a benchmark that dates back to 1995 going 5% faster than everyone else and feel justified.

            • SomeOtherGeek
            • 10 years ago

            LOL, yea true. I just think that people should just be happy if something works.

        • blubje
        • 10 years ago

        like Intel is milking their P6 core?

        • alwayssts
        • 10 years ago

        I think milking it is an apt term. Even with the slight architectural differences that have been incorporated into products since K8, they’re still fairly similar.

        I don’t think it’s crazy to essentially call Llano an Athlon X4 with 1MB L2 per core, or an Ontario an X2 with 512k per core. Sure they’ll have the turbo clocking and other minor improvements, but that’s essentially the gist. What’s more interesting is that you COULD call bulldozer more-or-less two K8 cores melded together although one core can use all available ops; so K8 does more-or-less live on, although granted with a different cache structure. Instead of 2 individual cores with 1MB L2 cache each like K8-Llano, you have one module with twice the ops sharing 2MB of L2 cache that can be load-balanced. While awesome, it’s still similar.

        Fusion is supposed to be refreshed every 12 months. In early 2012 it seems likely we’ll see a Fusion product mirroring the lower-end bulldozer product (2 modules, 4 ‘cores’), just like Llano is to Athlon, and later an Ontario product with one such module. 28nm should be in full swing by then…

        22nm should mirror what happens with 28nm a year later, so in 2013 it would make sense the process repeats with another doubling, just as it likely will on the desktop. The Fusion products likely getting what the old discreet products had and using the ‘extra’ space from the new node for GPU transistors. It’s a trickle-down tick-tock, with no real overlap because the replaced products then contain a GPU.

        Flame me for theory, but it’s ‘informed’ theory.

        At any rate, depending on how you define low-end, we probably won’t see the end of the K7/K8 architecture until the successor to Ontario, sometime in 2012, and that’s IF you don’t define bulldozer as a carry-over of K8. That’s a damn long life.

        • alwayssts
        • 10 years ago

        Sorry, dbl post.

        • sschaem
        • 10 years ago

        My comment relate to the fact that a 90nm K8 at 3ghz IPC is not worse then a 45nm 3ghz K10 (tweaked k8).
        If it wasn’t for updating the SSE instruction set (not considered an architectural change) things haven’t moved much in the past 6 years.
        The IPC is stagnant on the AMD front.

        Intel in same time frame moved from P4, to core2 to core i7
        The IPC of the core i7 is much, much higher then the P4.
        (specially on the SSE front)

        Intel played catch-up when AMD released a 64bit processor with an integrated memory controller, but the role switched and the gap is widening. thats what concern me if Fusion/buldozer is a flop.
        Intel already gets significantly more performance per transistors, better IPC, higher clock.
        If Bulldozer dont close the gap, I think AMD margin are going down, way down. AMD cant afford being 3+ year behind.

        So milking something is fine if the competition is weak or years behind…

        The Amiga died because of ‘milking’ a great product, I dont want to see AMD follow.

        All that to say, Fusion success (bulldozer, not llano) is a huge deal,
        AMD exhausted their early 2000 lead.

        BTW, I’m still hopeful for AMD. (but mainly because of ATI expertise)

        They could probably survive by selling their CPU business since it bring like a third of their revenue and is the most profitable…

        AMD would then become ATI…
        And maybe Arab.Technology.Investment would acquire AMD cpu business.

          • oMa
          • 10 years ago

          80% of the performance of a dual core K10.5 with 50% more transistors than a single core k10.5 is what AMD has promised. Bulldozer is going to increece(?) performance per transistor, but if the total performance is better is unknown.

          • kc77
          • 10 years ago

          That’s not true at all. If you think that a K8 is clock for clock similar to a K10 with 6MB of L3 cache you’re greatly mistaken. They are not. K8’s are essentially Athlon II’s which perform considerably slower than a Phenom.

        • Anonymous Coward
        • 10 years ago

        Not everything which hasn’t changed recently is bad.

    • Shining Arcanine
    • 10 years ago

    If AMD is able to manufacture 32-nm parts at Global Foundaries, what is keeping ATI from manufacturing Northern Islands?

      • BlackStar
      • 10 years ago

      [deleted- oops!]

      • BlackStar
      • 10 years ago

      The fact that each design is tailored specifically for each foundry. Add the fact that chip design is a 2-4 year process and it’s obvious that GF won’t be producing Ati chips before 2011 (or 2012-13, more likely).

        • flip-mode
        • 10 years ago

        Yeah, don’t chip fabbers supply chip designers with process node libraries? I think they’re pretty analogous to libraries a software developer uses with any particular programming language. Switching to a new process node is like switching to a new programming language, so you need to use the corresponding library.

          • BooTs
          • 10 years ago

          /[

      • ztrand
      • 10 years ago

      Probably because NI started development 2+ years ago when GF was an unknown entity. A design is tailor made for a specific foundry process and can’t just be moved over to another one.

      I suspect the time and work required makes it not worth the effort. Rather go with plan B (Southern Islands) and swith to GF as soon as possible for future designs that wont be impacted as much.

      edit: d’oh! beaten by 2 minutes….

      • Voldenuit
      • 10 years ago

      What others have said. Also, the pure GPU parts have been on bulk silicon, not SOI, and GF has also cancelled its 32nm bulk node, just like TSMC. To make matters worse, GF and TSMC have opposite implementations of 28nm (gate first and gate last), so designs will not be transferrable across the two.

      #1, OpenCL and APUs is probably why AMD is focusing on increasing integer performance of Bulldozer rather than FP, as they expect GPUs to do most of the heavy lifting in that space in the near future.

      We’re gonna co-processor like it’s 1986!

        • khands
        • 10 years ago

        Haha, pretty much, although if they can get the gains they’re talking about we could see real competition in the CPU front again.

      • TheEmrys
      • 10 years ago

      IIRC, for a spin-off to be legal a certain percentage of business must come from companies other than the former parent. Not sure where I read that.

        • Flying Fox
        • 10 years ago

        Huh? What does the legality of GF have to do with manufacturing?

        • Goty
        • 10 years ago

        That was a term of the cross-licensing agreement between Intel and AMD before AMD won its lawsuit, if I recall correctly.

      • JrezIN
      • 10 years ago

      …also SOI process is quite different from the “bulk” process used for GPUs…

      • wiak
      • 10 years ago

      it takes 3 years or something to make a GPU, and when something goes wrong, you have to do the thing you can do and thats use 40nm and wait for nextgen later

      CPU 32nm is different from GPU 32nm as far i know, but not that big
      there are some differences, and cost alot of millions and takes time to implement

    • esterhasz
    • 10 years ago

    hmm, if there is a global thermal envelope, decent power gating, etc. it may be interesting to include the GPU part in a dynamic switching scheme. in the long run, these hybrids could also map the fp part of SSE & similar instructions to the GPU part to save redundant circuitry…

      • sweatshopking
      • 10 years ago

      exactly what i was thinking.

      • jdaven
      • 10 years ago

      Very nice comment. Power and clock gating would allow the GPU to “act” like an integrated GPU when you don’t have a lot of graphics work or FP calculations and then ramp up to full power for heavy graphic/FP work similar to the performance of a discrete GPU. Maybe even switching on a sideport connection to dedicated memory in this “discrete” mode.

      • maxxcool
      • 10 years ago

      And thats the point of Fusion. While it may not have the same IPC as a intel part they are going to beat intel to the punch on this one by a long shot. more over it will give them more time tune their api and silicon to better use direct compute.

      I am not even remotely surprised that we have a new version of windows for 2011… in the same time frame that amd will release a true on cpu gpu and intel right behind them. i suspect that 2011 and 2012 will be a banner year for floating point power. with the upcoming direct compute enhancements and eventful gpu pass through NOT needing a api to route work to the gpu arithmetic units i can see us getting some very serious floating point horsepower in the next 18 months… regardless of who you use… its gonna be kick arse

        • ew
        • 10 years ago

        *[

Pin It on Pinterest

Share This