A quick primer on Sandy Bridge

In a bit of a strange move, Intel disclosed next to nothing about its upcoming Sandy Bridge processor during the opening IDF keynote last week, which you’ll know if you vigilantly refreshed the page as my live blog of the speech descended into tragic irrelevance and hairdo critiques. We’ve not usually been this close to the release of an Intel processor—Sandy Bridge-based CPUs are expected to arrive right as we ring in 2011—without a sense of its basic internals for quite some time. Fortunately, Intel did finally disclose many of the architectural details of Sandy Bridge later at IDF, during the technical sessions led by Sandy Bridge architects. We had the good fortune to attend some of them, but I’ve been traveling and unable to gather my thoughts on what we learned until now.

The first things to know about Sandy Bridge are that it’s a chip built using Intel’s high-speed 32-nm chip fabrication process, with initial variants expected to have four traditional CPU cores, an integrated graphics processor, cache, and a memory controller located together on the same piece of silicon. Intel essentially skipped building a quad-core processor at 32-nm, opting to accelerate the schedule for Sandy Bridge instead. We’ve long known most of the above, that Sandy Bridge would include integrated graphics and would require a new CPU socket and motherboards, and we’ve known that it would support Intel’s AVX instructions for faster vector processing of media workloads and the like. The mystery has been pretty much everything else beyond those preliminaries. 

Sandy Bridge in the flesh. Source: Intel.

A substantially new new microarchitecture

That mystery, it turns out, is pretty juicy, because Sandy Bridge is part of the unprecedented wave of brand-new x86 microprocessor architectures hitting the market. Just weeks after AMD disclosed the outlines of its Bulldozer and Bobcat cores, Intel has offered us an answer in the form of its own substantially new microarchitecture.

Now, making a claim like I just did is fraught with peril, since new chip designs almost inevitably build on older ones, especially when you’re talking about Intel CPUs. That’s the thing about Sandy Bridge, though: one of its architects proclaimed at IDF that it was essentially a from-the-ground-up rebuild of the out-of-order and floating-point execution engines. Such changes were necessary to accommodate the doubled vector width of the AVX instruction set, and it means something fairly momentous. As my friend David Kanter observed, this is, at long last, the breaking point where one can finally say virtually nothing remains of the P6 (Pentium Pro) roots that have undergirded everything from the Conroe/Merom Core 2 to the Nehalem/Westmere Core i-series processors.

Not only has the execution engine changed, but nearly everything around it has been replaced with new logic, as well, from the front-end and branch predictor to the memory execution unit. Outside of Sandy Bridge’s CPU cores, the “glue” logic on the chip is all new, too. The inter-core connections, memory controller, and power management microcontroller have been tailored to accommodate the presence of a graphics processor. Even the integrated graphics engine bears little resemblance to what has come before. If you’re looking for a golden age of CPU design, we’re living in it, folks.

The most monumental change in Sandy Bridge has to be the incorporation of graphics onto the CPU die, and Intel has almost assuredly gone further toward deep integration than AMD did in its Ontario “fusion” chips. Still, that step feels almost like an afterthought, as part of a logical progression like the integration of the memory controller and PCIe logic in the past few generations. The IGP here is more of an application-specific accelerator, not a true co-processor for data-parallel computation. Such lofty goals will have to wait for later generations. For now, the biggest opportunities for head-turning progress come from the sweeping changes to Sandy Bridge’s CPU microarchitecture, where smart new logic may potentially deliver formidable increases in per-clock performance.

The CPU front-end looks fairly similar to Merom or Nehalem from a high-altitude, block-diagram sort of view. The instruction cache is 32KB in size, and the decoder that turns CISC-style x86 instructions into RISC-like internal “micro-ops” can still process four instructions per cycle in most cases. Intel’s architects point to two key changes here.

The first is that rebuilt branch predictor. In most processors, the branch prediction unit uses a clever algorithm to “guess” what path a program will take prior to execution and then feeds the out-of-order engine with instructions to be processed speculatively. If it guesses right, the result is higher delivered performance, but if it guesses wrong, the results must be discarded and the proper program path must be executed instead, leading to a considerable performance hit. Modern CPUs have very accurate branch predictors, causing some folks to wonder whether pushing further on this front makes sense. Sandy Bridge’s architects suggested thinking about the problem not as a question of how much better one can do when one is already at 96% efficiency. Instead, one should think in terms of reducing in mispredictions, where a change from, say, 7% to 4% represents an improvement of over 40%. With that in mind, they attacked the branch prediction problem anew in Sandy Bridge to achieve even lower rates of error. Unfortunately, we didn’t get any hard numbers on the accuracy of the new branch predictor, but it should be superior to Nehalem’s.

This and the other improvements discussed above should lead to general performance increases, even in familiar tasks where we haven’t necessarily seen much improvement in per-clock performance in recent years.

The other innovation of note in Sandy Bridge’s front end is the addition of a cache for decoded micro-ops. Old-school CPU geeks may recognize this mechanism from a similar one, called the execution trace cache, used in the Pentium 4. Again, this provision is a nod to the fact that modern x86 processors don’t execute CISC-style x86 instructions natively, preferring instead to translate them into their own internal instruction sets. The idea behind this new cache is to store instructions in the form of the processor’s internal micro-ops, after they’ve been processed by the decoders, rather than storing them as x86 instructions. Doing so can reduce pressure on the decoders and, I believe, improve the chip’s power efficiency in the process. Unlike the Pentium 4, Sandy Bridge retains robust decode logic that it can call on when needed, so the presence of a micro-op cache should be a straightforward win, with few to no performance trade-offs.

To find the feature with the largest impact on Sandy Bridge performance, though, one has to look beyond the front end to the memory execution units. In Nehalem, those units have three ports, but only one can do loads, so the chip is capable of a single load per cycle. In Sandy Bridge, the load/store units are symmetric, so the chip can execute two 128-bit loads per cycle. Store and cache bandwidth is higher, as well. Removing these constraints and doubling the number of loads per cycle allows Sandy Bridge to feed its formidable execution engine more fully, resulting in more work completed. This and the other improvements discussed above should lead to general performance increases, even in familiar tasks where we haven’t necessarily seen much improvement in per-clock performance in recent years.

Of course, programs that make use of the AVX instruction set may see even larger gains, thanks to Sandy Bridge’s ability to process more data in parallel via wider, 256-bit vectors. AVX should benefit some familiar workload types, including graphics and media processing, where the data to be processed can be grouped together in large blocks. We’ve known the outlines of Sandy Bridge’s abilities here for a while, including the potential to execute a 256-bit floating-point add and a 256-bit floating-point multiply concurrently in the same clock cycle. At IDF, we got a better sense of how complete an AVX implementation Sandy Bridge really has, right down to a physical register file to store those 256-bit vectors. This chip should be in a class of its own on this front, at least until AMD’s Bulldozer arrives later in 2011. Even then, Bulldozer will have half the peak AVX throughput of Sandy Bridge and may only catch up when programs make use of AMD’s fused multiply-add (FMA) instruction—which only Bulldozer will support.

The pathways connecting Sandy Bridge’s cores together have expanded to enable this increased throughput thanks to a new ring-style interconnect that links the CPU cores, graphics, last-level cache, and memory controller. Intel first used such a ring topology to connect the eight cores of the ultra-high-end Nehalem-EX processor. That concept has been borrowed and refined in Sandy Bridge. The chip’s architects saw the need for a high-bandwidth interconnect to allow CPU cores and the IGP to share the cache and memory controller, and they liked the ring concept because of its potential to scale up and down along with the modular elements of the architecture. Because each core has some L3 cache and a ring stop associated with it, cache bandwidth grows with the core count. At 3GHz, each stop can transfer up to 96 GB/s, so a dual-core Sandy Bridge implementation peaks at 192 GB/s of last-level cache bandwidth, while the quad-core variant peaks at a torrential 384 GB/s.

Intel’s Opher Kahn said his team had made significant changes to the ring interconnect compared to the one used in Nehalem-EX, and he expects it will scale up and be viable for use in client-focused processors for multiple generations. The same ring will likely be used in server-focused derivatives of Sandy Bridge with more cores and very modest graphics capabilities, if any.

 

Re-thought integrated graphics and other improvements

The fact that the graphics processor is just another stop on the ring demonstrates how completely Sandy Bridge integrates its GPU. The graphics device shares not just main memory bandwidth but also the last-level cache with the CPU cores—and in some cases, it shares memory directly with those cores. Some memory is still dedicated solely to graphics, but the graphics driver can designate graphics streams to be cached and treated as coherent.

Inside the graphics engine, the big news isn’t higher unit counts but more robust individual execution units. Recent Intel graphics solutions have claimed compatibility with the feature-rich DirectX 10 API, but they have used their programmable shaders to process nearly every sort of math required in the graphics pipeline. Dedicated, custom hardware can generally be faster and more efficient at a given task, though, which is why most GPUs still contain considerable amounts of graphics-focused custom hardware blocks—and why those Intel IGPs have generally underachieved.

For this IGP, Intel revised its approach, using dedicated graphics hardware throughout, wherever it made sense to do so. A new transcendental math capability, for instance, promises 4-20X higher performance than the older generation. Before, DirectX instructions would break down into two to four internal instructions in the IGP, but in Sandy Bridge, the relationship is generally one-to-one. A larger register file should facilitate the execution of more complex shaders, as well. Cumulatively, Intel estimates, the changes should add up to double the throughput per shader unit compared to the last generation. The first Sandy Bridge derivative will have 12 of those revised execution units, although I understand that number may scale up and down in other variants.

Like the prior gen, this IGP will be DirectX 10-compliant but won’t support DX11’s more advanced feature set with geometry tessellation and higher-precision datatypes.

Sandy Bridge’s large last-level cache will be available to the graphics engine, and that fact purportedly will improve performance while saving power by limiting memory I/O transactions. We heard quite a bit of talk about the advantages of the cache for Sandy Bridge’s IGP, but we’re curious to see just how useful it proves to be. GPUs have generally stuck with relatively small caches since graphics memory access patterns tend to involve streaming through large amounts of data, making extensive caching impractical. Sandy Bridge’s IGP may be able to use the cache well in some cases, but it could trip up when high degrees of antialiasing or anisotropic filtering cause the working data set to grow too large. We’ll have to see about that.

We also remain rather skeptical about the prospects for Intel to match the standards of quality and compatibility set by the graphics driver development teams at Nvidia and AMD any time soon.

The concept is that the CPU will recognize when an intensive workload begins and ramp up the clock speed so the user gets “a lot more performance” for a relatively long period—we heard the time frame of 20 seconds thrown around.

One bit of dedicated hardware that’s gotten quite a bit of attention on Sandy Bridge belongs to the IGP, and that’s the video unit. This unit includes custom logic to accelerate the processing of H.264 video codecs, much like past Intel IGPs and competing graphics solutions, with the notable addition of an encoding capability as well as decoding. Using the encoding and decoding capabilities together opens the possibility of very high speed (and potentially very power-efficient) video transcoding, and Intel briefly demoed just that during the opening keynote. We heard whispers of speeds up to 10X or 20X that of a software-only solution.

Sandy Bridge’s transcoding capabilities raise all sorts of funny questions. On one hand, using custom logic for video encoding as well as decoding makes perfect sense given current usage models, and it seems like a convenient way for Intel to poke a finger into the eye of competitors like AMD and Nvidia, whose GPGPU technologies have, to date, just one high-profile consumer application: video transcoding. On the other hand, this is Intel, bastion of CPUs and tailored instruction sets, embracing application-specific acceleration logic. I’m also a little taken aback by all of the excitement surrounding this feature, given that my mobile phone has the same sort of hardware.

Because the video codec acceleration is part of Sandy Bridge’s IGP, it will be inaccessible to users of discrete video cards, including anyone using the performance enthusiast-oriented P-series chipsets. Several folks from Intel told us the firm is looking into possible options for making the transcoding hardware available to users of discrete graphics cards, but if that happens it all, it will likely happen some time after the initial Sandy Bridge products reach consumers.

One more piece of the Sandy Bridge picture worth noting is the expansion of thermal-sensor-based dynamic clock frequency scaling—better known as Turbo Boost—along a several lines. Although the Westmere dual-core processors had a measure of dynamic speed adjustment for the graphics component, the integration of graphics onto the same die has allowed much faster, finer-grained participation in the Turbo Boost scheme. Intel’s architects talked of “moving power around” between the graphics and CPU cores as needed, depending on the constraints of the workloads. If, say, a 3D game doesn’t require a full measure of CPU time but needs all the graphics performance it can get, the chip should respond by raising the graphics core’s voltage and clock speed while keeping the CPU’s power draw lower.

Furthermore, Intel claims Sandy Bridge should have substantially more headroom for peak Turbo Boost frequencies, although it remains coy about the exact numbers there. One indication of how expansive that headroom may be is a new twist on Turbo Boost aimed at improving system responsiveness during periods of high demand. The concept is that the CPU will recognize when an intensive workload begins and ramp up the clock speed so the user gets “a lot more performance” for a relatively long period—we heard the time frame of 20 seconds thrown around. With this feature, the workload doesn’t have to use just one or two threads to qualify for the speed boost; the processor will actually operate above its maximum thermal rating, or TDP, for the duration of the period, so long as its on-die thermal sensors don’t indicate a problem.

We worry that this feature may make computer performance even less deterministic than the first generation of Turbo Boost, and it will almost surely place a higher premium on good cooling. Still, the end result should be more responsive systems for users, and it’s hard to argue with that outcome.

Comments closed
    • sigher
    • 9 years ago

    A few thought come to mind, one being that intel has had a history of math errors hardware bugs in new CPU’s and if this is a all new everything then it’s time to worry.
    But of course MS also claimed vista was 100% new and we hopefully all know now that that claim was 100% BS so maybe we can take this with a grain of salt.

    As for the branch prediction, doesn’t the used compiler play a role there? And with the issue in the past were intel made their own compiler run stuff slower on AMD CPU’s by not doing any optimizations you have to wonder what we’ll hear next, and you have to wonder if this kind of thing won’t lead to stifling any development in compilers and even in higher languages used.

    I’d also like to add a question: do you think intel will eventually come with a news statement that they will ‘enable use of the video encoding unit when you pay $50 for the unlock code’? Because you have to wonder at this point.

    Oh and I like to thank the author for a good chuckle at the line “the prospects for Intel to match the standards of quality and compatibility set by the graphics driver development teams at Nvidia and AMD”

    Quality standards, compatibility, from AMD and nvidia, haha, I’m busting a rib.

      • indeego
      • 9 years ago

      q[<"A few thought come to mind, one being that intel has had a history of math errors hardware bugs in new CPU's and if this is a all new everything then it's time to worry. But of course MS also claimed vista was 100% new and we hopefully all know now that that claim was 100% BS so maybe we can take this with a grain of salt."<]q So I stopped reading after this second sentence. Two statements that aren't true or provide any sort of context or citation tend to do thisg{<.<}g

    • ronch
    • 9 years ago

    Can’t wait to see benchmarks pitting Sandy against Bulldozer. I hope both have what it takes to be called next-generation, especially in the case of Bulldozer.We already know Nehalem is fast, and it will only get faster with Sandy, but little is known about Bulldozer’s performance, scalability and power characteristics outside of Powerpoint presentations. And yes, can AMD still double Bulldozer’s AVX throughput? It looks like it’ll be just half of what Sandy is capable of, and you can bet Intel will pull some strings to make developers use it. I’m already getting a bit worried that Bulldozer may not bulldoze Sandy.

    • Chrispy_
    • 9 years ago

    A processor that runs “as fast as it can” within the cooling constraints is fantastic news.

    I look forward to a renewed interest in heatsinks and watercooling. I know there are plenty of options available but now a better cooler can make a difference to that vast majority of people who *don’t* overclock.

    • Mr Bill
    • 9 years ago

    “…Several folks from Intel told us the firm is looking into possible options for making the transcoding hardware available to users of discrete graphics cards…” Perhaps another opportunity for one of those $50 upgrades we just heard about recently?

    • esterhasz
    • 9 years ago

    I also don’t think that anyone at that time could have foreseen the massive leakage at growing clockspeeds. The initital idea of a designing a longer pipeline that allows for higher clockspeeds and cranking that up high was not nonsensical…

    And if anyone here tells me “I knew all along that the juice would leak all over the place when the P4 architecture was introduced and if Intel had bothered to drive to my junior high school and ask me, shit would’ve gone better for them and they wouldn’t have had to bribe Dell and invent the stupid bunny people” you better have a Ph.D. in engineering or I’ll punch you on the nose…

    edit: that’s what you get for trying to be funny, should have replied to #12

      • sotti
      • 9 years ago

      Actually plenty of people inside did indeed know just that.

      The problem was that the P4 was designed by the marketing department.

      Marketing: Mhz sells, we need more Mhz!
      Engineering: We could do that, but it could cause problems X,Y,Z
      Marketing: But we need more Mhz.

        • DaveJB
        • 9 years ago

        Well, Prescott was really a perfect storm of poor design decisions, Intel’s 90nm process having severe leakage problems, and CPU power management still being in an immature state. If Cedar Mill was anything to go by, then Intel could have kept on pushing clockspeeds for quite a while longer; unfortunately NetBurst had long ceased to be a viable architecture by that point.

        • pedro
        • 9 years ago

        Nowadays it’s all about the GB’s. My how things have changed!

      • WaltC
      • 9 years ago

      /[

    • dpaus
    • 9 years ago

    Start clearing some bench space in the lab, Scott; I sense an epic showdown coming…..

    • flip-mode
    • 9 years ago

    Good write up, Scott. SB sounds freaking awesome.

    • Hattig
    • 9 years ago

    So how does an Intel graphics Execution Unit compare to an AMD or NVIDIA Shader?

    I’ll assume that with only 12 EUs, each EU must process more than one vector component (i.e., x, y, z, t), so would a 12 EU system equal a 48 shader GPU?

    Of course clocks and throughput change things, but maybe the graphics are more equivalent now. Zacate’s 80 SPs should still outperform Sandy Bridge’s 12 EUs in most tasks, and Llano’s 320 to 480 SPs will be in a different class.

      • MadManOriginal
      • 9 years ago

      It seems that the SB EUs are acutally better than NV shaders (at least G92-ish ones) or AMDs that stretch back to HD 2000 series (divide by 5 to get a comparison to NV and SB EUs.) The 12 EU SB outperforms the 16 shader G9400 and ’16’ (80/5) AMD IGPs.

    • jackbomb
    • 9 years ago

    “this is, at long last, the breaking point where one can finally say virtually nothing remains of the P6 (Pentium Pro) roots that have undergirded everything from the Conroe/Merom Core 2 to the Nehalem/Westmere Core i-series processors.”

    Uh-oh. Remember what happened the last time Intel tried a brand new processor architecture? 😛

      • Voldenuit
      • 9 years ago

      Well, there’s nothing to suggest that SB will be a repeat of the Willamette. Performance previews are very solid and the architecture looks to be sound.

      In fact, my only gripes are that the IGP ‘does not compute’ and that they /[

        • Krogoth
        • 9 years ago

        Stop spreading the nonsensical FUD.

        SB didn’t kill off overclocking. The on-die clock generator only makes QPI link speed overclocking extremely difficult. Which means you have to rely on multiplier-level overclocking. K series and Extreme Edition chips are factory-unlocked. The regular line isn’t completely locked. It has some headroom due to Turbo-clocking.

          • Voldenuit
          • 9 years ago

          Being able to overclock any intel processor you like and only being able to overclock a processor intel approves you to are two very different things.

          Yes, there is a price premium involved, and yes, it’s not presently /[

            • MadManOriginal
            • 9 years ago

            I think before deeming overclocking dead you should at least wait to see actual product and how things work out in real life. It’s possible through Turboboost multiplier flexibility even low-end CPUs will have some overclocking potential. I do lament the loss of overclocking the snot out of a cheap CPU but, like you said, CPUs are ‘fast enough…’ and SB may be ‘more than fast enough.’ The only sad thing I see in the CPU lists available is that the 2c/4t ones won’t have Turboboost :-/ so who knows how the overclocking will work out. We’ll also have to see how the Taiwanese mobo makers respond. So at least withhold your final verdict before ven knowing few if any facts.

            Also note how many ‘enthusiasts’ bought way more CPU than they needed just to be ‘tinkerers.’ So instead of buying a Q6600 or E8600 or Q9550 when it wasn’t a smart purchase they’ll just buy ‘K’ CPUs instead.

            • Voldenuit
            • 9 years ago

            What worries me is that ‘value overclocking’ may be dead on the intel platform.

            Sure, everyone talks about the K series, but they are situated near the higher end of the product line and cost more on top of that. So intel is already charging you an ‘OC tax’ for the convenience. Historically, the best value overclocker CPUs have been near the lower end of the product line – Celeron 300A, Thoroughbred B Athlon64s, Q6600s. As you go up the product line, the performance/$ metric goes very nonlinear, so you’re already losing value with the K series.

            It’s the same with the BE chips – especially since the Phenoms have less headroom than Nehalem, but in their case, an unlocked multiplier is more of a ‘bonus’ than a necessity, and they’re not very pricey to begin with. Once FSB overclocking is killed, multiplier overlocking will become a necessity for those still so inclined, and it will be a loss to enthusiasts.

            • pot
            • 9 years ago

            When did he deem overclocking dead? He specifically used the phrase “/[

            • MadManOriginal
            • 9 years ago

            And I said ‘i[

            • Voldenuit
            • 9 years ago

            You say it the same way someone preambles “not to disagree, but…” and then goes on to disagree.

            Locking off the FSB/QPI bus speed is not a good thing for enthusiasts, and many tech sites have voiced their concern over intel’s actions and their potential consequences to the enthusiast. To not even bring up the issue and potential ramifications (and possible fixes/justifications/etc) in an article about Sandy Bridge was rather egregious, hence my original post about it and why it was omitted from the article.

            Although my intent was not to start a flame war about this issue, at least this argument has brought it into the light, whatever one’s personal take on it may be.

            • MadManOriginal
            • 9 years ago

            The core of what I’m saying is simply ‘let’s wait and see what happens in the real world’ before saying the sky is falling. Is it great overclock-the-heck out of a $60 CPU news? No, but it may not be ‘overclocking is/may be dead news either and we won’t know until there are production products including motherboards available.

            • JumpingJack
            • 9 years ago

            Intel will certainl lose mindshare of the enthusiast, as it will certainly cut out the cheap $60 overclocks. However, I am not so sure this is such a terrible disaster, it really depends on what price points the K sku’s arrive at.

            One of the most popular enthusiast processors on the Intel brand was the Q6600 when it dropped into the 300 buck or lower price point, shortly after there were plenty of other and cheaper options, but if it falls in a sweet spot any particular sku becomes the most popular and sought after.

            If Intel blends the product mix with the right price on the K skus, then this will blunt the impact overall I would suspect.

            • Voldenuit
            • 9 years ago

            Yeah, we’re pretty much left at the mercy of intel’s pricing scheme.

            Even right now, the cheapest unlocked Lynnfield is $329. That’s not exactly cheap. Speaking for myself, I’d rather pair a $100 CPU (say, an Athlon II X4 640) with a $250 GPU any day of the week than a $250 CPU with a $100 GPU. The last time I paid $300 for a CPU was for a dual-core Opteron, and even then I thought it was grossly overpriced. At least I was able to OC it to 2.9 GHz, which ameliorated the value aspect somewhat.

            The Corei5 760 at $199 is much more attractive to me than an 875K, even if I have to give up Hyperthreading. It simply is not worth an extra 65% of the price. But if the 760 was locked down, then I’d rather get an AMD.

            Yes, there’s the dual core Clarkdale K at $199, but 2 cores isn’t enough these days. I was doing file recovery work on an old HDD the other day on my C2D, and it basically tied up my PC for the whole day because I couldn’t do any other strenous CPU work while the file recovery was running. Ouch. And where are the LGA1366 unlocked CPUs? Oh wait, there’s only the EE. At $999. If intel is serious about *[

        • Meadows
        • 9 years ago

        I think he referred to Itanium.

          • jackbomb
          • 9 years ago

          Pentium 4

            • Meadows
            • 9 years ago

            That doesn’t make sense, it’s not “brand new” and it wasn’t technically a failure. You /[

            • Voldenuit
            • 9 years ago

            I dunno, Pentium 4 was so bad that they had to revert to an earlier architecture (P6 via Core) that then became the building blocks for their eventual recovery – Core 2, Nehalem.

            An equivalent fiasco for Itanium would have involved hp going back to selling PA-RISC, which obviously didn’t happen. And Itanium2 servers are still being sold today, though of course they are no match in terms of raw performance to POWER6 and POWER7.

            • Meadows
            • 9 years ago

            Ten times as many RISC servers are being sold, and both Itanium and RISC are overshadowed ten times over by x86 servers.

            • flip-mode
            • 9 years ago

            He was making a joke, pinhead. Lighten up.

      • DaveJB
      • 9 years ago

      The Willamette/Northwood incarnation of the Pentium 4 was actually a pretty nifty design in many aspects – the big problem was that it was released too early, on an old manufacturing process, without the cache and bus bandwidth that it needed (the FPU was kinda weaksauce as well, but that was less of an issue once SSE2 became established). It wasn’t until Prescott that Intel REALLY fouled up the design.

        • LoneWolf15
        • 9 years ago

        I’ll give you Northwood, but Willamette was a dog.

          • DaveJB
          • 9 years ago

          Willamette was a bad /[

      • ronch
      • 9 years ago

      Well, nothing lasts forever, not even P6.

    • HisDivineShadow
    • 9 years ago

    I sincerely hope that Intel doesn’t keep nVidia and/or AMD from building some kind of optimus tech to take advantage of the presence of the GPU in SB. That could be the big draw of SB. The fact that, top to bottom, you can have a computer where the 80% of the time you aren’t gaming, a lot of users could have computers that have no power being used by your video card. No more, “idle” measurements of your video card. Just no power. Then when you want to game, it switches automagically to your GPU (or your SLI/CF) and the gaming muscle is there.

    This would be great for laptops, but also useful for desktops. The way they keep saying it doesn’t function at all when a discrete video card is present makes me think they aren’t going to pursue this angle, which is unfortunate. It’d be a lot of power savings over time and a good amount of heat lost from your PC. I guess expecting Intel to want to work with either nVidia or the remnants of ATI within AMD is a lot to ask to improve CPU-GPU interaction.

    Still, the performance sounds nice. Even if a little unnecessary currently. Even without Optimus-like tech, these chips should be a nice power savings and create a generation of cheap laptops with integrated video that we don’t have to COMPLETELY avoid for the occasional WoW session. Get prices down to below $300 and I think there’d be a market for said machines. Cooler, smaller CPU’s with built-in GPU’s that are superior to anything integrated on the market toda would lead to smaller computers, one would hope.

    And if Intel can’t get SB down there, expecting Atom to carry the load, well at least we have AMD’s Bobcat…

      • Voldenuit
      • 9 years ago

      Don’t forget Llano, which will have a GPU with 5 times more shaders than Bobcat does.

    • codedivine
    • 9 years ago

    SB looks interesting on paper but I guess we will have to wait to see how it performs in the real world. But 2011 should be interesting. I wonder though, does the average home consumer care about CPU performance anymore? I sure do, but only for my work. I see no need whatsoever to upgrade the CPU of my home computer.

      • Voldenuit
      • 9 years ago

      Agreed. Most modern CPUs are “fast enough” even for power users, and the areas that could use the most speedup – rendering, encoding, folding – are potentially riper targets for GPGPU than incremental increases in CPU power. Even then, they are pretty niche applications that are not performed on a regular basis by the vast majority of users and/or enthusiasts.

      • OneArmedScissor
      • 9 years ago

      Most people haven’t cared about CPUs since dual-cores came around. There hasn’t been anything remotely close enough to match that sort of change.

      The next generation of CPUs could come close, though, considering the very sudden level of tight integration and memory improvement. We’re going from a hodge-podge of platforms that all have their own compromises, to universal integration of memory controllers, PCIe controllers, GPUs (excluding Bulldozer), and redesigned memory and cache systems tied to them all.

      That should lower the latency of damn near everything and help save power in new ways.

      The end result that everyone is overlooking is that run of the mill laptops will soon be able to completely replace desktops for most people, without killing their battery life as high clocked CPUs and even low end graphics cards did in the past. That will mean something to the average person.

      What remains to be seen, though, is what any of the new platforms can do to outright increase battery life in their most minimal form. Even Bobcat has just been talked up like it’s some sort of high performance part.

      And therein lies the problem. Battery life never seems to make big strides with common laptops because it’s not used as an advertising point. The disconnect between computer manufacturers and the people buying them is mind boggling.

        • Voldenuit
        • 9 years ago

        There is a point of diminishing returns on battery life. Naturally, everyone has different needs and expectations, but after 6-8 hrs (more or less a full working day), I’d rather trade off additional battery life for more power or less weight. There are a lot of poorly designed and poorly optimized laptops that still scrape the 4 hr mark*, but 5-6 hrs is readily available with few compromises today.

        GPU power continues to be a bottleneck. SB’s GPU is about the level as entry level discrete GPUs (Anand showed it performing in the same ballpark as a Radeon 5450), and even that only in the 12 EU SKU**. We still don’t know how many models will only feature 6 EU in the IGP, which will probably bump performance back down to current Clarkdale or AMD IGPs. And there’s the persistent question mark over whether intel can provide graphics drivers as robust as people expect from nvidia and AMD.

        * Actually, that says a lot, when 4 hrs’ battery life is considered ‘poor’ these days. I remember when 2.5 hrs was asking a lot from a laptop back in the Pentium II days…

        ** I’m operating on the assumption that Anand’s preview was with the 12 EU part. I don’t think this has been confirmed one way or the other AFAIK, so if someone could post any extra information, that’d be appreciated.

          • MadManOriginal
          • 9 years ago

          I believe Anand himself verified it was a 12 EU part with his source in an update to the article. It also didn’t have graphics Turboboost though. It almost makes me wish Intel would come out with a CPU that had i[

      • ronch
      • 9 years ago

      I agree, as I don’t plan to ditch my Phenom II X4 for at least a couple more years. But just when you think nothing can touch your computer developers will come up with something to make you want to upgrade. Some of today’s latest games make the fastest PCs crawl, and for those PCs to become mainstream and affordable we need faster chips to cause their prices to drop.

    • Voldenuit
    • 9 years ago

    Good read, but the omission of SB’s hardware lock on the bus speed (and potentially, overclocking) is a bit glaring, especially on an enthusiast site like TR.

    Granted, intel probably wanted to focus on their strengths rather than how they plan to screw enthusiasts, but I don’t see why TR should grant them a free pass on this.

      • Damage
      • 9 years ago

      My choice to focus on Sandy Bridge’s architecture rather than the specifics of its use had nothing to do with Intel’s preferences–we make our own agenda. I simply wanted to cover what I learned from my time at IDF and leave the overclocking issues for later, probably when we’d had hands on the products and could speak from experience about those things.

        • Voldenuit
        • 9 years ago

        Scott, I’m looking forward to TR’s review of SB when it comes out, just thought that something like a decision to place the clock generator on the CPU/package would have at least merited a passing reference in an article about the architecture.

          • MadManOriginal
          • 9 years ago

          But wait I thought you i[

            • Voldenuit
            • 9 years ago

            Because enthusiasts like to tinker, and intel is denying them that ability (some say “right”, but I don’t subscribe to this theory). Most people who OC their CPUs to 4 GHz+ don’t actually need or even use that extra horsepower, much in the same way people who tune their cars don’t redline the engine for every gear change.

            Of course, there are enthusiasts who also max out their CPUs for work/leisure/hobbies, some (not all) of whom overclock. My guess is that OCers are in the minority these days, or at least a smaller subset than they were in the Celeron A days, because the speedup from a faster CPU is less noticeable today than it was in the past. But it doesn’t mean they don’t still enjoy doing it.

            Locking down overclocking is a big “screw you” message to the enthusiast community in general, even if the people directly affected by this may be made up of several small overlapping minorities.

            EDIT: Now for the people who do encode/render an extra 20-50% performance from overclocking is worth a lot more to them than for, say, a gamer who is GPU-bound. But a GPGPU software revolution could potentially produce a many-fold increase in encoding and rendering speed, which is why I find AMD and nvidia’s efforts in GPGPU to be more interesting than incremental speed upgrades in CPUs, whether they come from overclocking or architectural improvements.

            • Krogoth
            • 9 years ago

            WTH are you smoking?

            Intel isn’t killing off overclocking or ignoring overclocking crowd.

            They are separating their needs with a new line, the “K series”. It will probably carry a $50-100 premium over its non-K counterpart. Multiplier-level overclocking is much easier to work with, since you don’t have to worry about the motherboard being unable to handle high QPI link speeds.

      • MadManOriginal
      • 9 years ago

      Intel raped my sister and killed my mother but you don’t see me crying at every opportunity about them ‘screwing enthusiasts.’

        • sweatshopking
        • 9 years ago

        You too…? QQ that must be why we’re kindred spirits!! ♥♥♥ They got my dad too…

      • sigher
      • 9 years ago

      I think the team that designs the core and such are not the same people or even in the same building as the people that decide what is locked and how much and what speeds are sold and at what price and such.
      So it’s a bit of a separate discussion going on at another level, and one that can change on a day to day basis so it’s hard to pin down I bet.

        • Voldenuit
        • 9 years ago

        Yeah. One team goes “how do we make the fastest CPU in the world?” and the other goes “Now that the geeks have made this, how shall we cripple it so we can charge people more for it?”.

        Knowing this, you’d think the CPU design team would have been more reticent to put the clockgen on the die.

    • yuhong
    • 9 years ago

    q[

Pin It on Pinterest

Share This