David Kanter dissects Haswell

Intel’s upcoming Haswell processor is likely to be as big an advance over today’s Sandy and Ivy Bridge CPUs as the "Bridge sisters" were over the prior-gen Nehalem chips. That’s huge progress, but how could Intel achieve such gains once again?

The answer comes down to three broad areas of change: a nicely improved CPU microarchitecture, design work more carefully tailored to 22-nm fabrication process, and quite likely an improved system architecture. We don’t yet have details on the Haswell system architecture, but Intel has disclosed quite a bit about Haswell otherwise. As usual, David Kanter at Real World Tech has put together a very informative write-up covering Haswell’s microarchitecture, with some nods to coming changes in the other areas, as well. The article is replete with custom diagrams of the microarchitecture’s high-level layout, comparing Haswell directly to Sandy Bridge, and is very much worth taking the time to read carefully.

The scope of changes coming with Haswell may be difficult to absorb if you’ve been focusing on the breathless talk about 10W variants of the processor. As Kanter notes:

Turning to the microarchitecture, the Haswell core has a modestly larger out-of-order window, with a substantial increase in dispatch ports and execution resources. Together with the ISA extensions, the theoretical FLOPs and integer operations per core have doubled. More significantly, the bandwidth for the cache hierarchy, including the L1D and L2 has doubled, while reducing utilization bottlenecks. Compared to Nehalem, the Haswell core offers 4× the peak FLOPs, 3× the cache bandwidth, and nearly 2× the re-ordering window.

He estimates Haswell will achieve 10% higher performance on existing software—we take that to be a per-clock improvement, with higher frequencies also possible—versus Sandy Bridge. The potential ramps up from there when software takes advantage of the additional FLOPS and integer throughput available with new instructions like fused multiply-add and hardware lock elision.

Interestingly, Kanter doesn’t expect the performance gap between Intel and AMD to widen with the coming clash between Haswell and Steamroller. His reason? "Realistically, the performance gap should narrow given the scope of opportunities for AMD to improve, but Haswell will continue to have significant advantages." Hmm. Time will tell if he’s right about that.

Comments closed
    • HisDivineOrder
    • 7 years ago

    I expect Steamroller to be delayed for all chips. AMD’s ability to execute new designs in a timely manner is highly questionable.

      • BaronMatrix
      • 7 years ago

      Why do you want to be a poopy head…?

      I definitely doubt ANYONE ELSE’S ability compete with EVIL INSIDE…

    • ronch
    • 7 years ago

    I’ve read at VR-Zone that AMD is expecting 45% better performance with Steamroller compared to Piledriver, and that, although the building blocks look the same from a high level, Steamroller is actually made up of completely new building blocks. If true, Jim Keller must really be pushing his engineers hard. 45% looks like an optimistic claim but if they actually achieve it at 95w or less, it’s gonna be a big boost for AMD regardless of what Intel will have on offer by that time.

    I’m also wondering why it will be built using 28nm, not 22nm. 28nm will only make the die ~25% smaller, all things being equal, instead of ~50% smaller the way 22nm can.

    Good luck, AMD. We’re all counting on you.

    • chuckula
    • 7 years ago

    Another interesting take-away: While Haswell doesn’t look that much like Bulldozer/Vishera architecturally, they both share one property: To get the best performance you’ll need to be running the right type of code. A 10% per-clock boost on legacy code is OK, but if you really look at all of the architectural improvements, you need to have properly written & compiled software to really get a speedup.

    The good news is that AVX/AVX2 are descendents of SSE, so getting older code that is already vectorized shouldn’t be super hard to port to the new setup. However, there are gobs & gobs of unoptimized code out there, not to mention legacy binaries that aren’t getting recompiled with the latest compiler optimizations. I think you’ll see more results along the lines of Bulldozer/Vishera where the same chip is so-so at one workload, while doing quite well at another. Intel’s big advantage should be that Haswell will be, at a minimum, about as fast as Ivy Bridge even in worst-case scenarios.

    • chuckula
    • 7 years ago

    I love these Kanter reports, so much juicy detail. From the [url=http://www.realworldtech.com/haswell-cpu/6/<]last page:[/url<] [quote<] Second, the ring and LLC are on a separate frequency domain from the CPU cores. This enables the ring and LLC to run at high performance for the GPU, while keeping the CPUs in a low power state. This was not possible with Sandy Bridge, since the cores, ring and LLC shared a PLL, and wasted power on some graphics heavy workloads. [/quote<] The separate clock domain means that the big and hot L3 cache can be clocked at a different rate than the cores. In addition to the power saving features he is describing, this could make overclocking Haswell easier: clock the cores at (for example) 5 Ghz, but only clock the L3 at 4 Ghz. IIRC, the Nehalems also had different clock generation for the L3 cache vs. the cores & L2 caches.

    • anotherengineer
    • 7 years ago

    “but how could Intel achieve such gains once again?”

    Easy lots of money buys lots of brains.

      • Meadows
      • 7 years ago

      Does that mean zombies used to be rich?

        • UberGerbil
        • 7 years ago

        Only the fat ones.

    • TurtlePerson2
    • 7 years ago

    I kind of wonder why Intel reveals so much of the internals of the processor. In the RWT article they’re talking about details as minute as the number of words per set in the instruction cache. They keep a lot of algorithm stuff secret, but they seem willing to give away almost any number people want to know.

    I doubt that software developers actually take into account the associativity of the instruction cache when they write code.

      • chuckula
      • 7 years ago

      [quote<] I doubt that software developers actually take into account the associativity of the instruction cache when they write code.[/quote<] For 99% of developers that is true, but in some high-performance applications a whole lot of time & energy is spent in making sure your working set fits in cache. Understanding the associativity (not just the total size) of the cache can be very important.

      • Stranger
      • 7 years ago

      The hard part of what Intel does isn’t the design part its implementing a monster design like haswell in silicon. You can count the number of companies that could implement a design like haswell on one hand(possibly one finger aka Intel). Everyone on the internet likes to focus on design aspects of the CPUs but most design tweaks only add a couple percentage points worth of performance. most of the performance most from things that are invisible to us. We can’t tell if every portion of a chip has been hand tweaked with custom layouts, we can’t tell how consistent intels transistors are(and when there’s a a billion some transistors consistency really begins to mater). And by all measures intel is by far the best chip maker in the world

      [url<]http://www.chip-architect.com/news/2007_02_19_Various_Images.html[/url<] These numbers are a bit dated but if you scroll down to the table of mm^2 per MB of L2, intel's SRAM is about half the size on the same process node(and about twice as fast in terms of bandwidth) "I doubt that software developers actually take into account the associativity of the instruction cache when they write code." People who write compilers definitely do and by extension most everyone else does as well.

    • Silus
    • 7 years ago

    Well, if this is true:

    [url<]https://techreport.com/news/23876/leaked-roadmap-suggests-no-steamroller-desktop-chips-next-year[/url<] Haswell won't even have competition from Steamroller next year...Intel doesn't even have competition in the high-end for a few years now.

      • NeelyCam
      • 7 years ago

      Just watch Intel continue selling Sandy Bridge 6-cores and SB-Es all the way to 2014 (with fully amortized fabs), while Ivy gets cancelled and Haswell becomes a mobile-only part.

        • chuckula
        • 7 years ago

        Intel will eventually be forced to put out new server parts, but I agree that using server parts in the “extreme” edition workstations is annoying because servers move at a much slower cadence and you get stuck with “high-end” parts with cores that are significantly behind the mainstream parts.

      • just brew it!
      • 7 years ago

      I am starting to doubt that we will *ever* see Steamroller-based desktop chips. My guess is that it will end up being released as a server CPU (Opteron) only.

        • chuckula
        • 7 years ago

        It’s possible that we could see the CPU-only socket AM3+ line going away on the desktop and have it relegated to servers only. We’d still see a steamroller derivative in the APU category (that would take over the desktop consumer space from AMD).

          • Deanjo
          • 7 years ago

          I doubt they would move the AM3+ socket to server only. They would more then likely just kill the AM3+ socket all together and stick to using the c32/g34 sockets.

            • chuckula
            • 7 years ago

            Yeah, that’s really what I meant. You’re right that AM3+ by itself is not a server platform. I’m really saying that AMD’s consumer parts could go 100% APU with the only CPU-only parts being in the server world on C32/G34/etc. sockets.

            • derFunkenstein
            • 7 years ago

            You could argue that they have already done that, if you segment out “enthusiast” as something separate from “consumer”. They are really into this APU thing for mainstream consumers.

      • sschaem
      • 7 years ago

      2014 the first ARM 64bit CPU, and performance will be marginal over the A15 (according to ARM)

      It will take probably 2 years after that for any player to make a ARM CPU that can match Nehalem.

      Apple might be the best contender.. but who think apple will license their CPU to samsung or Acer to build Windows RT PCs ?

      Seem like we are stuck with Intel… And their new 4ghz 6 core CPU is pre-listed at $1200.

        • Deanjo
        • 7 years ago

        [quote<]Apple might be the best contender.. but who think apple will license their CPU to samsung or Acer to build Windows RT PCs ?[/quote<] Samsung would just steal the design from Apple like they do already. ;D

        • BestJinjo
        • 7 years ago

        “It will take probably 2 years after that for any player to make a ARM CPU that can match Nehalem.”

        The fastest quad-core ARM CPUs now are only as fast as a Pentium 3. It will take another 5-6 years before an ARM CPU can be as fast as a Core i7 920, maybe longer.

          • Beelzebubba9
          • 7 years ago

          What about nVidia’s project Denver? Or whatever rumored product Apple is working on to replace x86 CPUs in their lineup?

          ARM v8 is just an instruction set – there’s no reason why someone with deep pockets couldn’t build a very high end CPU core using it. Obviously I don’t expect anyone to dethrone Intel any time soon (if ever), but I wouldn’t be surprised at all if nVidia’s Project Denver got ARM at least within shooting distance of Haswell in terms of absolute performance.

            • Airmantharp
            • 7 years ago

            If the market for non-x86 desktop chips widens, expect Intel to try to bring IA64 to the desktop- they’re already bringing it to a Xeon socket in the near future 🙂

          • CuttinHobo
          • 7 years ago

          You mean those quad-core ARM CPUs in thin tablets?

        • A_Pickle
        • 7 years ago

        I’d be amazed if Apple bought AMD. There’d be no point — it’d be a gain in low end integrated graphics, but a loss in power consumption, performance, and probably cost, too.

    • chuckula
    • 7 years ago

    [quote<]Interestingly, Kanter doesn't expect the performance gap between Intel and AMD to widen with coming clash between Haswell and Steamroller. His reason? "Realistically, the performance gap should narrow given the scope of opportunities for AMD to improve, but Haswell will continue to have significant advantages." [/quote<] Is Kanter right? Probably, but remember that Steamroller won't be coming out at the same time as Haswell, so in the second half of 2013 it will be Haswell vs. a flavor of Piledriver. In 2014, we'll see steamroller in action.

      • Game_boy
      • 7 years ago

      With what probability?

        • chuckula
        • 7 years ago

        Probability of: 0.872514263069*

        * I’m precise but not necessarily accurate.

          • bcronce
          • 7 years ago

          I got 0.639857830528475021766603716555840

          I’m much more precise but probably completely wrong.

            • moog
            • 7 years ago

            Let’s do this! LEEROOOOOOOOYYYYYYYYYYYY Jenkins!

            • chuckula
            • 7 years ago

            At least David Kanter has chicken.

      • NeelyCam
      • 7 years ago

      Bulldozer had plenty of opportunities, too..

        • Theolendras
        • 7 years ago

        At least before Bulldozer, Phenom II could still compete on the midrange on just about any type of load. Bulldozer struggled to compete on low end on light threaded loads…

      • kilkennycat
      • 7 years ago

      AMD’s likelihood of survival for another year as an independent entity seems to be very low and rapidly getting lower. They are leaking money like a sieve. And they have just brought JP Morgan in to “explore options”. Don’t bet on Steamroller. Too little, far too late….

        • sschaem
        • 7 years ago

        Its as close as official as it can be : piledriver is the last stop for the bulldozer architecture.

        AMD will focus on Jaguar gong forward.

          • MadManOriginal
          • 7 years ago

          It’s sad because they really could have had something in desktop SFFs and all-in-ones, plus mobile, with their higher than 18W APUs. Trinity is decent and clearly ahead of Ivy Bridge graphics-wise.

        • shank15217
        • 7 years ago

        Interesting conclusion there genius.. even TR benches show that pile driver can go toe to toe with ivy bridge in heavy threaded applications. I fail to see this enormous divide that 80% of the users on this forum see between AMD and Intel’s architectures. DK is right, AMD needs to implement fixes where as Intel is adding enhancements.

          • NeelyCam
          • 7 years ago

          “Genius”? Why the thinly veiled insult?

          I agree with kilkennycat – AMD is in serious trouble. Going toe to toe with ivy bridge doesn’t cut it – AMD has to significantly improve [b<][i<]cost[/i<][/b<]/performance if they're going to become profitable. Performance/watt wouldn't hurt either, as high-margin server market likes that

            • A_Pickle
            • 7 years ago

            Turns out that, in the high margin server market, Bulldozer was actually quite competitive in performance-per-watt. Check out Anand’s review.

      • sschaem
      • 7 years ago

      What information do you have that show AMD releasing steamroller in 2014?

      And from the info we have haswell was accelerated to Q1 2013?

      So H1 2013 is Haswell, and just maybe (all data point to its not happening) steamroller sometime in 2014.

      edit: Just so this doesn’t degenerate (please post where AMD state steamroller is still even a planned product”

      Intel 19th October 2012
      “We expect an increase in inventory reserves as we start production on our next-generation micro architecture product code-named Haswell, which we expect to qualify for sale in the first quarter of 2013,” said Stacy Smith, chief financial officer of Intel.

    • chuckula
    • 7 years ago

    The increased execution resources and instruction dispatch ports could be help improve the efficiency of hyperthreading since different applications could take advantage of different on-chip resources.

      • UberGerbil
      • 7 years ago

      Maybe, but there’s not a lot of evidence that was a bottleneck in most loads right now. The resources are clearly there to enable full-bore AVX2 and (especially) FMA ops, which is how they get to a theoretical 2x throughput over Sandy Bridge; any bonus for SMT is just gravy.

    • sweatshopking
    • 7 years ago

    it’s not going to widen if amd sells itself and stops making cpu’s.

      • Deanjo
      • 7 years ago

      Technically Global has been “making” AMD’s CPU’s for a while now.

        • colinstu12
        • 7 years ago

        I think he meant “designing” by that

          • sweatshopking
          • 7 years ago

          I DID IN FACT.

            • Deanjo
            • 7 years ago

            THEN SAY THAT! 😛

            • sweatshopking
            • 7 years ago

            then what would you do all day?

            • MadManOriginal
            • 7 years ago

            Retire harder?

            • sweatshopking
            • 7 years ago

            i thought you guys were coming over. i’ve been waiting all day….

      • rrr
      • 7 years ago

      And you have anything to confirm it? Vishera is pretty good CPU, quite competitive with similarily priced Intel offerings in fact.

    • NeelyCam
    • 7 years ago

    Six more months…

      • sweatshopking
      • 7 years ago

      WHO WINS?

        • Deanjo
        • 7 years ago

        Via.

          • sweatshopking
          • 7 years ago

          not likely.

          • chuckula
          • 7 years ago

          In 6 months Neely is going to convert to the ARM religion so he’ll be buying a Chromebook and walking around in robes chanting a lot… (just like now except the Chromebook is lighter than his full-size desktop tower).

            • Deanjo
            • 7 years ago

            I thought he was being indoctrinated into the Church of MIPS…

            • chuckula
            • 7 years ago

            You have a good point. I’ve been thinking about becoming a MIPs fanboy just for the fun of calling ARM an inefficient and bloated architecture that can never scale down to truly low-power applications.

            It’s good to stay ahead of the curve.

            • MadManOriginal
            • 7 years ago

            The Church of MIPS is now part of the growing United Church of non-x86 since ARM bought them into the fold.

            • NeelyCam
            • 7 years ago

            [quote<](just like now except the Chromebook is lighter than his full-size desktop tower)[/quote<] I've been using a shopping cart for the desktop, but it's kind of clunky and always pulls towards the curb. Maybe I really should get one of these chrome books. Should score me some street cred, too

      • anotherengineer
      • 7 years ago

      Until your a dad?

      You’d be trading in your gadgets for diapers 😉

        • NeelyCam
        • 7 years ago

        Speaking from experience?

        No; just play with the gadgets when the offspring is sleeping

          • anotherengineer
          • 7 years ago

          Yes, the 2 little rug rats consume most of my time and money.

            • NeelyCam
            • 7 years ago

            Daycare here in the USA is insanely expensive, and time consumption is pretty high too, I agree. My solution has been to sleep and work less. Or at least [i<]try[/i<] to work less.. I've given up on the 'corporate ladder' (as a good socialist would, I guess), but slacking off during project deadlines would feel like betraying the rest of the team

      • flip-mode
      • 7 years ago

      That seems so far away. I’ll see if I can tuff it out. I told myself that if I get a Christmas bonus this year … upgrades … upgrades everywhere.

Pin It on Pinterest

Share This