RWT goes inside Sandy Bridge’s graphics architecture

David Kanter over at Real World Tech has been producing a nice string of processor architecture articles this summer, and his latest may be one of the most interesting. He’s managed to wrangle a wealth of information about the graphics processor integrated into Intel’s Sandy Bridge CPUs, and he’s mapped out that IGP in quite of bit of detail.

Nearly everything one could want is there, from an exposition of the shader processing resources to a map of the ROP hardware, and David makes direct comparisons to the IGP in AMD’s LLano processor throughout, to provide context. Crucially, he sorts out some of the rather confusing terminology differences between AMD, Nvidia, and Intel, so you’ll end the article having a better sense of how the Sandy Bridge IGP’s "12 EUs" match up to the seemingly vastly more numerous "shader cores" in the Llano IGP. He addresses with clear eyes the cases where the Intel IGP is deficient—support for DirectX 11 and OpenCL, for instance—and where it leads the industry—integration on a common ring and smart sharing of the last-level cache. Recommended reading.

Comments closed
    • lilbuddhaman
    • 8 years ago

    Offtopic:
    When will we be able to use the GPU in sandy for secondary processing along with our dedicated cards ? Such as how you can use a 2nd (or 3rd) nvidia card strictly for physx ?

      • mesyn191
      • 8 years ago

      For games? Probably years away from common use. IIRC only recently have the API’s been standardized and tools gotten decent enough for widespread use.

      • PenGun
      • 8 years ago

      You can use Crossfire on the LLano GPUs if you get a dedicated card to go with it.

        • Voldenuit
        • 8 years ago

        I’d rather see hardware accelerated physics on an APU. But that’s been a pipe dream for a while now…

        • cygnus1
        • 8 years ago

        But… it can’t be a high end Radeon it has to be the low end cards. I’m not even sure you can use a midrange card with it. If you try to Crossfire the Llano GPU with something high end, it doesn’t work because the disparity in performance is too great and the Llano GPU would be overwhelmed. Apparently the crossfire algorithms just don’t work when the GPUs are too far apart in performance.

        It’d be nice if they came up with a new crossfire mode that just offloaded some small part of the process to the Llano GPU instead of trying to have it render entire frames. Or if they could get it running the physics stuff or something. For that they’d have to work out a deal with nVidia for Physx or really push game developers toward Havok

          • Voldenuit
          • 8 years ago

          Sadly, since Havok is owned by intel and SB graphics don’t support OpenCL (OpenCL code is run on the CPU), we’re unlikely to see hardware acceleration from Havok until intel updates their IGP.

            • cygnus1
            • 8 years ago

            But we already have hardware acceleration for Havok on the Radeon. I don’t know if it’s been kept up to date, but I know they were showing it off several years ago.

            • Voldenuit
            • 8 years ago

            You remember correctly. The project was called Havok FX, and was supposed to have been cross compatible with both ATI (at the time) and nvidia GPUs.

            However, it was the first thing to be canned when intel acquired Havok back in 2007.

            • cygnus1
            • 8 years ago

            Intel, ruining good things since 1968…

      • Sahrin
      • 8 years ago

      Next generation of Fusion begins software integration, withing two generations (so the one after that) AMD’s roadmaps say that the scheduler will be able to dispatch instructions to either the CPU or the GPU from a common instruction stream (meaning hardware integration that’s transparent to the software).

    • Farting Bob
    • 8 years ago

    SB and Llano have brought the long awaited “golden age” of IGP, or at least have brought it very close. Ever since discrete GPU’s started becoming common in the 90’s the IGP offering has basically meant “dont play games that are less than 5 years old”. Its still a struggle to play modern games, but they are finally getting there. Within a generation or 2 i expect IGP’s to be able to compete with most low-mid range cards, which is a huge achievement. Obviously things like on board RAM, more connectivity options and cooling will usually mean a discrete is the preferred option but we may finally be able to start recommending IGP’s to light gamers.

      • LoneWolf15
      • 8 years ago

      I think we have a chance to see that in Ivy Bridge.

        • Vulk
        • 8 years ago

        It’s possible. It’s also possible that they’ll not be able to make significant gains on the GPU front. It’s all speculation until we see working silicon.

        • mesyn191
        • 8 years ago

        Supposedly IB will have Llano-esque GPU performance.

        Which is pretty good for a IGP but not really close at all to mid range performance.

        I think AMD/Intel will have to put large amounts (ie. 64-128MB) of fast RAM on package or go with quad channel system RAM for mainstream use if they want to get mid range performance out of a IGP.

      • maxxcool
      • 8 years ago

      Once the die space is available to add in a 32-128 meg frame buffer on die with a ridiculous bus width and access to secondary ram on a separate dmi/transport protocol i can see that being the best IGP can get.

        • mesyn191
        • 8 years ago

        32-128MB buffer on die for a mainstream low cost part? I don’t think you’ll see that for years…

          • maxxcool
          • 8 years ago

          yeah, hence the when 🙂 I give too the next revamp for amd and the next chip after that will be intels..

          • BobbinThreadbare
          • 8 years ago

          They have a couple megs of cache right now. 32 isn’t a huge leap. 2 die shrinks could probably do it. Which would be around 3 years away, but still, it’s coming (maybe).

            • mesyn191
            • 8 years ago

            Nah, 32MB is for high end enthusiast/server chips right now. The sub $200 SB chips have 6MB right now so we probably have a long way to go.

    • maxxcool
    • 8 years ago

    Do both support 1080p@60fps? yes, do they both support bitstream, yes… after that i could care less until you touch my wallet. then cheaper TCO wins.

      • willmore
      • 8 years ago

      For what, video playback? If that’s all you want, pick up a G210 for $10 after rebate and go home happy.

    • chuckula
    • 8 years ago

    From the Article:

    Competitively, Sandy Bridge’s graphics was quite impressive at introduction in January 2011, but lags behind AMD’s Llano, which was launched in the middle of 2011. The Llano GPU is essentially twice as fast, which is not surprising given that it is also twice the die area.

    A few days ago I pointed out this exact same fact when noting that Intel could easily improve the graphics performance of future architectures just by adding extra resources to the graphics side of the equation. Of course, several AMD fanboys led by Goty responded by:
    1. Claiming that Llano uses less die than Sandy Bridge for graphics (seriously, and this post was actually up-modded by the AMD squad).
    2. Downmodding me when I corrected Goty’s inaccurate comments.

    I’m sure this post will also get downmodded by the AMD squad, but I’m still right and they’re still wrong.

      • ish718
      • 8 years ago

      You seem to be oblivious of the fact that llano supports more features(DX11,OpenCL, etc..) which also take up die space. So you can’t directly compare them that way…

      Not to mention llano has 4 cores while the SandyBridge with IGP has 2 cores…

        • sschaem
        • 8 years ago

        4core llano is 1.45 billion transistor, 45% dedicated to the GPU.
        4core (Hyper threaded) Sandy Bridge is <1billion transistor, 20% dedicated to the GPU.

        There is no way around it, AMD had to spend massive amount of transistor to beat Intel GPU for general graphics (DX10) and games.
        When Dx11/opencl/AMP matter, ivy bridge will be there.

          • ish718
          • 8 years ago

          Until intel has a CPU/GPU with 4 physical cpu cores and a GPU that supports DX11, OpenCL, and OpenGL 4.1 all in one, your argument is weak…

          You keep talking about Ivy Bridge, did you forget about fusion 2? Bulldozer cores and northern islands gpu with more cores…

            • maroon1
            • 8 years ago

            Please delete

          • kalelovil
          • 8 years ago

          Not all transistors are equal, while Llano has 1.45 billion transistors its die area (which is what counts for manufacturing cost) is barely more than the .995 billion transistor Sandy Bridge.

          Also, Llano offers more than double Sandy Bridge’s graphics performance if you do an apples to apples comparison. In all the reviews so far Sandy Bridge is taking signficant short cuts and producing noticeably less image quality than Llano: [url<]https://techreport.com/articles.x/21099/11[/url<]

        • OneArmedScissor
        • 8 years ago

        And Sandy Bridge uses the L3 cache and ring bus for the GPU, which are not free, and have their own advantages and disadvantages that dictate the way the entire chip functions. Apples and oranges.

        So many people are trying to dumb increasingly complex CPUs down into an e-pene comparison where you just arbitrarily pick one number to judge them by. They weren’t built for the same purpose, and the next generation of them will continue to diverge and become even more specialized, not pile on more of the same. We’re so many years past that point it’s just silly to even consider it.

        A year from now, it will be a whole different playing field, with all sorts of things we’ve never seen before. If you’re truly a technology enthusiast, that should be the exciting part, not that it’s feasible to make one CPU do something another CPU already does. Good grief…

        • EtherealN
        • 8 years ago

        Wait, sorry?

        Sandy Bridge with IGP has 2 cores? I swear my i7 2600K has 4 cores along with it’s HD3000… Maybe I’m hallucinating though?

        Or were you comparing specifically with price points? Well, then I sort of noticed this one: [url<]https://techreport.com/r.x/amd-a8-3850/civv-lgv.gif[/url<] Hm. 2-core Sandy beats Llano? That's funny... 😛 Fact is, at the end, that the two target different markets. Llano is better for lightweight gaming as long as the gaming in question doesn't require too much CPU, because it has a better GPU component. Sandy slaughters Llano whenever CPU power is the important thing, but the Sandy iGP is sort of disgustingly bad. I remember the article about fanboyism fondly. 🙂

      • Kurotetsu
      • 8 years ago

      I wasn’t part of the discussion you are citing, but as an impartial bystander I have to say that while it’s great you were proven right, you don’t have to be so infantile about it.

      Oh, and before you start tossing out “AMD FANBOI!!11!!1!” accusations, I’m running a Core 2 Quad/Nvidia system.

        • derFunkenstein
        • 8 years ago

        Yeah “I’m right! You’re wrong!” is not productive and only incites people. It doesn’t put anything to bed.

      • swaaye
      • 8 years ago

      Yeah I’ve said something like that too. Another angle I’ve seen is saying that Intel’s architecture is inferior due to the filtering quality, but that is just them saving transistors in the texture units because higher image quality probably doesn’t matter.

      Anyway, Intel has always been selling zillions of their IGPs since forever. Even while some enthusiasts whine endlessly about how terrible Intel IGPs are. It doesn’t matter because their IGPs serve the intended purpose of running the OS GUI. 3D performance beyond Aero’s needs and GPGPU features don’t mean jack to the majority of people buying PCs. This is undoubtedly why Intel hasn’t spent more transistors on their IGPs, or bothered to develop their own graphics cards.

      • derFunkenstein
      • 8 years ago

      Did you ever think maybe you were downmodded because you’re an asshole?

        • sweatshopking
        • 8 years ago

        I lol’d at this

          • derFunkenstein
          • 8 years ago

          No, seriously. I mean, that’s why I gave him a minus. Not because I’m on the AMD Squad.

            • Vasilyfav
            • 8 years ago

            You’re more of a jerk for downmodding someone because of their tone, rather than their argument (which was correct).

            • kc77
            • 8 years ago

            Except the argument wasn’t correct. It looked at rather simple number and really guesstimated performance within a fixed environment. Want AA? Then that 2X deficit balloons to 3X times the performance easily if you look at the quality of the AA being done. Tessellation itself adds to GPU real estate, and decent compute adds even more which the Intel GPU doesn’t even do. That’s before we even talk about image quality which not only relies on the quality of the driver but also GPU resources.

            Hell with all of that want to talk about power? Please GPU’s of AMD and Nvidia are leaps and bounds above what Intel churns out. The deficit in power per mm is so great that it nullifies the poor CPU performance of AMD’s CPU cores.

            • cygnus1
            • 8 years ago

            except humans tend to judge a conversation by it’s content, where both tone and his argument are components

            • Anonymous Coward
            • 8 years ago

            Its not worth making a reasoned argument if the delivery method takes center stage. Nothing wrong with down-modding jerks.

            • derFunkenstein
            • 8 years ago

            Uh, no? I mean, I’m not even aware of the conversation he referred to, but the content of the comment I replied to was full of asshattery. If that’s how he talks to people, he had it coming.

            • NeelyCam
            • 8 years ago

            That is rather pathetic.

        • NeelyCam
        • 8 years ago

        Yeah – when you can’t attack the argument, attack the person and wield your thumb. Good stuff.

        When one can’t formulate a solid counterargument, one should stfu and go crying to one’s mommy.

          • derFunkenstein
          • 8 years ago

          what? Being a dick is *the* premier reason to get downvoted. There’s nothing worth discussing if you can’t be civil.

            • NeelyCam
            • 8 years ago

            That’s an excuse AMD fanbois are using. “What? Fusion chips have weak CPUs? You’re a DICK!!” (thumb thumb thumb…)

            • derFunkenstein
            • 8 years ago

            Believe it or not, it’s not all about you. Quit changing the topic; chuckles was being a dick and rightly got downvoted.

      • Vulk
      • 8 years ago

      Can they improve performance purely by adding more resources though?

      Read the whole article and get into the memory bandwidth limitations the ring bus provides. The whole chip is reduced to a single 32b internal memory connect. There are going to be severe diminishing returns after a certain size of GPU is reached. Intel may not currently be there, but I’ll bet that is part of the reason they’re redesigning a lot of ivy bridge, not just to support OpenCL and DX11.

      You’re right that Llano uses far more die space than Sandy Bridge. The two products are clearly trying to capture different market segments, with Intel focusing on media play back to the extent of including fixed hardware for H.264 support, and Llano focusing less on media performance and more on game acceleration.

      Your focus on being right and them being wrong is at best a partial victory. Yes they were wrong. However, you are also basing your argument on pure speculation. Most things in life don’t scale cleanly as you add resources. Intel could probably make gains, and almost certainly will with Ivy Bridge, but one of the reasons they were able to make a x25 improvement with Sandy was that their previous generation of graphics were so incredibly poor.

      Also remember that a lot of Intel’s issues traditionally, and currently (for that matter), are caused by their drivers, not the hardware. AMD and nVidia products, in many cases, actually get faster as their drivers mature. Intel so far is lacking in this area rather noticiably.

      All of this means that you MIGHT be right, but that’s hardly the unqualified win that you seem to think you have. The only thing that is certain is that there won’t be another iteration of Sandy with better graphics, which makes the whole thing moot for another quarter or two.

        • mczak
        • 8 years ago

        You are quite right that there may be serious problems just simply upscaling resources (FWIW it doesn’t work THAT well for radeons neither past the 8 simd mark or so).
        I think though the ring bus won’t be the issue. It currently has twice the bandwidth of both Llano’s two buses (garlic/onion) combined, arguably SNB might need more bandwidth there for a speculative same-performance Llano IGP because it doesn’t really have dedicated ROP/L2 cache like Llano does (well it has texture cache but it’s small). Also, there are ways to improve the bandwidth without even changing the ring bus architecture – for example a rather trivial way would be to add two ring stops instead of one for the igp (remember ring bus bandwidth is per-stop, you just need to make sure the data doesn’t need to travel through the same parts of the ring which should be trivial for this scenario).

        • sschaem
        • 8 years ago

        Have you checked the GT1 vs GT2 benchmark with games like Left for Dead, CoD, … ?
        Its close to a 2x speedup. 6EU vs 12EU, double the units, double the performance.

        Sure its not a given Sandy Bridge scale that well past 12EU, but Ivy Bridge is not just old Sandy with more unit, far from it.
        AMD betting on Intel failing to scale their architecture to more transistor would be a grave, grave mistake.

        Also check GPU spec of a 5450 vs an HD3000, power vs power, transistor vs transistor…
        Intel got the winning architecture for graphics for chip using <400 million transistors.

          • mczak
          • 8 years ago

          HD6450 is still below <400 million transistors, and there it’s not so obvious which is faster (at least not with gddr5 memory).
          Also you can’t really compare it like that to a discrete chip. The HD3000 most assuredly benefits from the L3 cache it can use, if you’d just include the cache in these calculations it wouldn’t look so economical in transistor count. This is of course the benefit of “getting the integration right”.

      • cal_guy
      • 8 years ago

      Don’t forget that Intel’s anisotropic filtering is atrocious being on par to the quality of Radeon 8500 generation.

      • Goty
      • 8 years ago

      Ooh, I’m popular.

    • chuckula
    • 8 years ago

    <remove dupe>

Pin It on Pinterest

Share This