Haswell's CPU cores bring major changes, but Intel's graphics team pushed out a new architecture in the last generation with Ivy Bridge. This time around, Haswell's integrated graphics processor (IGP) is more of a refinement. The IGP is software-compatible with Ivy's, though it has been tweaked for better efficiency. In spite of this continuity, graphics may still be the single biggest area of improvement from Ivy to Haswell, for several reasons.
For one, Intel has completely rewritten the software driver stack for its IGP, and it claims the new driver has some of the lowest CPU overhead in the industry. Combined with this new driver, Haswell's IGP supports the latest graphics and compute APIs, including DirectX 11.1, OpenGL 4.0, and OpenCL 1.2. Also, as part of ever-creeping integration, the display output block has moved from the platform controller hub chip onto the Haswell CPU die. The new display block offers double the bandwidth and can drive displays with 4K resolutions via DisplayPort 1.2. Triple-display configs are possible now, too. And the IGP's media processing block gets it own little "tock" worth of innovation, with claimed advancements in QuickSync video encoding speed and quality.
The biggest factor in Haswell's graphical improvement, though, is simply giving us more of the same. This graphics architecture is modular, with various "slices" capable of being scaled up as needed. Intel is taking advantage of that fact with Haswell, producing three different versions of the IGP, known as GT1, GT2, and GT3, in increasing order of size and potency.
As with Ivy Bridge, the GT2 part will see duty in most desktop Core i5 and i7 models. Haswell's GT2 IGP has 20 execution units, up from 16 in Ivy Bridge. (Each of those execution units has eight shader ALUs onboard, so you're looking at 160 "cores" in Haswell GT2, if you're talking like Nvidia and AMD do about these things.) Haswell GT2 also has the benefit of slightly higher graphics clock speeds and a more expansive top thermal envelope, 84W versus 77W.
GT3 doubles up on resources from GT2, with 40 execution units and thus 320 shader ALUs. The GT3 version of Haswell's IGP will be deployed primarily in laptops, where the additional parallelism should allow for healthy performance gains at fairly modest clock frequencies. In ultrabook-class CPUs, Intel expects Haswell to achieve roughly 2X the performance of the prior Ivy Bridge-based parts.
The most interesting version of Haswell's graphics, though, is something different. Known as GT3e, it's the same GT3 graphics hardware backed up by a massive embedded DRAM cache. The 128MB eDRAM chip is manufactured by Intel on a specially tuned version of its 22-nm fab process, and it's situated on the same package as the CPU. Employing a large eDRAM cache for graphics has little precedent in consumer PCs, but it does address one of the primary constraints that integrated graphics solutions face: the amount of memory bandwidth available onboard a CPU socket.
The eDRAM connects to the Haswell mother ship using a narrow, high-speed, on-package interconnect that offers about 50GB/s of bandwidth in each direction. That's about 4X the bandwidth offered by a single channel of DDR3-1600 memory, and the bandwidth is additive, since the eDRAM can be accessed in parallel to main memory. On the CPU, this connection is routed through the system agent, as is the memory controller interface. The eDRAM chip is fully power managed and can be powered down when it's not needed. Intel tells us it uses between half a watt and 1W at idle, while consuming about 3.5W at peak.
The GT3e cache doesn't work quite like you might expect. It's not a frame buffer; frames are still written to DRAM. Instead, it's a cache for the graphics data used to create frames—and only the graphics data that makes sense to cache. Intel's graphics driver is smart enough to know not to cache certain things, like streaming vertex buffers, that would likely spill out of the cache or otherwise derive no benefit from caching.
What's more, the eDRAM isn't just a graphics cache, but a full-fledged L4 cache, accessed coherently and available to the entire CPU. The bandwidth it provides has the potential to benefit CPU workloads as well as graphics. Certain types of applications, like computational fluid dynamics and OpenCL media processing, are obvious candidates. Once you know that, you may be shocked to hear this: Intel has no plans to bring a GT3e class chip to socketed desktop systems. These things are slated for BGA-style packages, for surface mounting into laptops and the like. I'm not sure why no one thought, "Hey, we have a CPU with a massive 128MB L4 cache. We should sell to people who want to buy it and put it into their systems." But apparently that didn't happen—or at least that guy didn't persuade everybody else. Happily, we do have a GT3e system to test, so we can show you the benefits of its L4 cache in a full suite of graphics and CPU workloads.
Oh, right. To go along with the new graphics goodness Haswell brings to the table, Intel has coined a new brand name: Iris graphics. The wonderfully generic "Intel HD Graphics" is sticking around, attached to the IGPs for slower Haswell variants. The 28W GT3 parts, which will face off directly against Radeons and GeForces, get the new Iris brand name. The higher-end GT3e offering forgoes amateur status to become the Iris Pro 5200.
Dialing up more power management mojo
We've talked about the CPU core and graphics, yet we haven't yet covered the most consequential new technology in Haswell. Although it makes for a fine desktop processor, much of the innovation in Haswell is centered around the quest to bring a full-fledged PC into smaller devices with better battery life. That means—points if you guessed it—even more integration of former platform elements into the CPU die.
This time around, the object of Intel's attention is the power delivery portion of the platform. Haswell introduces what Intel calls "FIVR" for "fully integrated voltage regulator" or something along those lines. In short, the VRMs that used to live on the motherboard have migrated into the CPU silicon for all Haswell-based parts. Essentially, the entire power delivery control and power train for the CPU now lives on the CPU itself.
As with past integration efforts, bringing the VRMs onboard the CPU has some immediate and tangible benefits. Intel claims its integrated VRs can replace as many as seven different VRMs that would otherwise be scattered throughout a traditional PC platform, and it says Haswell systems should realize between 600 and 1000 mm² of physical space savings thanks to FIVR. The firm also expects that systems will be able to tolerate higher peak voltages without costing more to produce.
Because power is delivered into the chip and then distributed, Haswell is able to offer greater control over how power is routed and used. The chip has double the number of internal voltage rails of Sandy Bridge, and it can decide with much finer granularity where power should be delivered, depending on the present workload. The transitions between voltage states can happen dramatically faster with the VRs on die, too. Intel claims FIVR is 5-10X faster than external VRMs at framing voltages.
As a result, Intel has decoupled many of Haswell's internal units to run at independent frequencies and voltages, where before they were linked. For example, Haswell's CPU cores are no longer tied to the chip's internal communications ring. As a result, during a heavy graphics workload, the IGP can pull the ring up to full speed and power in order to take advantage of the bandwidth it provides—without the CPU cores having to clock up, as well. Thanks to this finer granularity, Haswell can more effectively "shift power around" on the CPU die, granting one unit permission to run at a higher-than-usual frequency and voltage because another one is powered down. Back when, Intel talked a good game about moving power around on Sandy and Ivy Bridge, but apparently the mechanisms for doing so were fairly limited. For instance, those parts had a limit set of fixed ratios between CPU and IGP speeds, and transitions between states were relatively slow. With Haswell's finer-grained power distribution and decoupled clocks, the power sharing has grown more sophisticated. Intel's architects now talk about "transferring credits" to track how power and thermal capacity is transferred from one portion of the die to another.
Haswell's VR integration happens just as another effort is bearing fruit. In 2007, Intel started a sweeping program to revamp the entire power management infrastructure of the PC platform. PCs weren't originally built with the same set of assumptions as today's mobile devices, where battery life is paramount. As a result, the various support chips scattered around a PC motherboard weren't built to stay in a low-power state until needed or to make transitions between power states quickly. The communication protocols and the I/O devices that use them weren't built to communicate info about power states, either. Intel has undertaken the daunting task of addressing that problem under a program with the wonderfully generic name "Power Optimizer." This effort touches nearly every PC standard, PCI Express, USB, SATA, and DisplayPort among them.
The goal is to provide instrumentation about power states and state transitions across the entire system, so that the PC can tune itself for low-power operation. Each device can specify its latency tolerance—that is, how long it takes to wake from a low-power state. Armed with this info, the system can decide how deep a sleep state to enter, depending on the current workload. When it's time to wake up from sleep, either to service a timed event or to respond to user input, the system knows which devices to wake, and in what order, to resume operation safely.
To complement this capability, Haswell introduces a new platform-level "active idle" state that is very low power, known as S0ix. Systems based on Haswell should essentially default to this state, seeking it whenever possible, even between keystrokes, in order to pursue power savings aggressively. The user, application software, and the OS should be none the wiser. At last year's IDF, one of Intel's architects characterized S0ix as "automatic, continuous, fine-grained, and transparent to well-written software." As you can see in the diagram above, in S0ix, the CPU cores and caches are power gated, DRAM is powered down, and even the FIVR rail is shut off. When needed, the CPU can recover from being completely powered down in about three milliseconds.
That's vaguely amazing, huh? Of course, the benefits of Haswell's new power plumbing will vary according to the platform. They should be most fully realized in ultrabooks with U-series processors. One key enabler for continuously low-power operation is a tech we've been hearing about for years called panel self-refresh. This nifty feature removes the burden of refreshing the display 60 times a second from the CPU or chipset. Instead, the LCD has its own small pool of DRAM and, when the screen's contents are static, the LCD will continue displaying the same image without outside assistance. This simple optimization allows the CPU and system memory to remain in a low-power state continuously, until something more important happens. Plans for panel self-refresh have been kicking around for years, but the feature is now part of the DisplayPort 1.3 spec. Intel expects the first wave of Haswell-based laptops to make use of PSR, at last.
In larger laptops and desktops, the Haswell chip and its PCH will live in separate packages, just like in past generations. Haswell processors headed for ultrabooks and convertible tablets will be mounted on the package above, which Intel calls a "1-chip BGA solution." You can see the, er, two chips on this package, the oblong Haswell CPU and the smaller platform controller hub. This system-on-a-chip package will squeeze into power envelopes from 28W to as little as 6W. For ultrabooks, the U-series processors will fit a dual-core Haswell and its PCH into a 15W limit, versus 20W for Ivy Bridge (17W CPU + 3W PCH.) For convertibles, the Y-series parts will slide into a 6W envelope, compared to 10W total for the comparable Ivy Bridge/PCH combo. Those are just peak numbers, though, and don't reflect the benefits of Haswell's active idle mojo. Intel expects Haswell U-series parts to offer 40-50% better battery life than the last generation, with over twice the battery run time in connected standby mode—at measurably higher performance levels.