She's finally here. At last, Intel is taking the wraps off of one of the most anticipated bits of silicon we've seen in years: Sandy Bridge. We've known the architectural details of the processor code-named Sandy Bridge for months—they are formidable, new, and different—but we haven't known exactly how the changes would translate into performance and power efficiency, which is the big question about any product overhauled this extensively. Fortunately, Damage Labs has been churning away for weeks in anticipation of this moment, and we have a pleasantly extensive look at Sandy Bridge's—ahem, I mean "the second-generation Core microprocessors'"—performance ready for your perusal.
Sandy takes the stage
Sandy Bridge is, essentially, a next-generation replacement for Intel's primary CPUs for desktops and laptops, including those based on quad-core Lynnfield and dual-core Clarkdale silicon. Because so much information about Sandy Bridge has been available for months, we're going to skip the architectural deep dive in this review, give you a quick overview of Sandy's key features, and then focus on our test results. The thing is, even a quick overview of this new chip will take some time, simply because so very much has changed.
At the heart of Sandy Bridge is an essentially new processor microarchitecture, the most sweeping architectural transition from Intel since the introduction of the star-crossed Pentium 4. Nearly everything has changed, from the branch predictors through the out-of-order execution engine and into the memory subsystem. The goal: to achieve higher performance and power efficiency, even on single-threaded tasks, where the integration of multiple CPU cores hasn't been much help. Additionally, each of those cores holds a revamped floating-point unit that supports a new instruction set called AVX. These instructions allow the processing of vectors up to 256 bits in width, and the hardware supports them quite fully. The result should be much higher sustained rates of throughput for floating-point math, giving new life to media processing applications and other sorts of data-parallel computation.
Beyond its potent new cores, Sandy Bridge incorporates more of a PC's basic functions on a single square of silicon than any prior CPU in its class. Not only does it have the memory controller and PCIe links (in addition to two to four CPU cores), but it also brings a graphics processor onboard. This creeping integration of system components has resulted in higher performance, lower platform power consumption, and more compact packaging, which is why both Intel and AMD are moving deliberately toward further integration.
At the same time, integrated graphics processors (IGPs) are growing more capable, relatively speaking. Sandy Bridge's IGP bears little resemblance to Intel's past attempts at graphics; its execution units are capable of substantially more work per instruction and per clock cycle. What's more, the IGP's video processing block can both decode and, somewhat distinctively (if you don't count, say, the iPhone 4), encode H.264 high-definition video streams, opening up the possibility of fully hardware-accelerated video transcoding that barely burdens the CPU cores.
To facilitate better integration, Intel's architects gave Sandy Bridge a high-bandwidth, ring-style interconnect between the cores, with their associated L3 cache partitions, and the IGP. This fast (up to 384 GB/s in a 3GHz quad-core chip) interconnect has a number of purported benefits, including easing data sharing between cores, providing the throughput needed for the processor's revamped floating-point units, and allowing the onboard graphics component to expand its available bandwidth by making use of the L3 cache.
Better integration has created new possibilities for power management, as well. Sandy Bridge extends in several ways Intel's Turbo Boost feature, which takes advantage of available headroom in the CPU power delivery and cooling mechanisms to deliver higher clock frequencies at lower load levels. The first change is simply more clock speed headroom generally. Although Turbo behavior varies from model to model, Sandy Bridge reaches higher clock speeds and ramps up more aggressively than older processors. The revised Turbo algorithm also does something that may seem a little counterintuitive at first, allowing the CPU to ramp beyond its maximum rated power use (thermal design power or TDP) for brief periods of time. As I understand it, Intel is taking advantage of the lag between when a relatively cool idle chip begins to warm up its environment and when temperatures have risen to levels where full cooling capacity is needed. During this span of time, the chip may opportunistically push beyond its rated thermal peak by running at higher-than-usual frequencies within its Turbo Boost range. Once the surrounding system has warmed up or enough time has passed (the algorithm is complex, and Intel hasn't shared all of the details with us), the chip will drop back to operating within its TDP max. Intel claims this feature has an important usability benefit for common usage patterns, where periods of high-utilization are "bursty" by nature—think of opening a program or running a PhotoShop filter. Furthermore, Sandy Bridge's Turbo Boost algorithm incorporates not just the CPU cores but the IGP, as well; it can raise the operating frequency of the graphics processor when the CPU cores aren't at full utilization.
|Penryn||Core 2 Duo||2||2||6 MB||45||410||107|
|Bloomfield||Core i7||4||8||8 MB||45||731||263|
|Lynnfield||Core i5, i7||4||8||8 MB||45||774||296|
|Westmere||Core i3, i5||2||4||4 MB||32||383||81|
|Gulftown||Core i7-980X||6||12||12 MB||32||1168||248|
|Sandy Bridge||Core i5, i7||4||8||8 MB||32||995||216|
|Sandy Bridge||Core i3, i5||2||4||4 MB||32||624||149|
|Deneb||Phenom II||4||4||6 MB||45||758||258|
|Propus/Rana||Athlon II X4/X3||4||4||512 KB x 4||45||300||169|
|Regor||Athlon II X2||2||2||1 MB x 2||45||234||118|
|Thuban||Phenom II X6||6||6||6 MB||45||751||346|
The table above shows the key specs for the quad- and dual-core versions of Sandy Bridge alongside other recent chips. Thanks to Intel's 32-nm, high-K metal gate fabrication process, the nearly one-billion transistors in the quad-core version of Sandy Bridge fit into a die area smaller than either the Lynnfield chip it replaces or the "Deneb" Phenom II with which it competes—and neither of those other chips have integrated graphics. If you're counting along at home, Intel tells us each CPU core is made up of roughly 55 million transistors, while the graphics core is 114 million. We suspect a great many of the remaining transistors are packed tightly into the chip's 8MB of L3 cache.
Now that you have a sense of the scope of the thing, it should come as no surprise that there's much, much more to Sandy Bridge than we can cover in this context. If you'd like more detail, please have a look at our Sandy Bridge primer, which considers the microarchitectural changes in more depth. If you want even more detail, we suggest reading David Kanter's overview of the Sandy Bridge microarchitecture, too.