Call it a tick-plus?
In spite of holding such a commanding lead, Intel hasn't simply shrunk Sandy Bridge and left it at that. The ~50% increase in Ivy Bridge's transistor count should be a clue that there's much more going on here.
As often happens with "ticks," Ivy's CPU core microarchitecture has been tweaked in a host of small ways in order to improve per-clock performance. Intel architect Stephen Fischer told us he estimates the cumulative effect of those improvements to be a 4-6% gain in IPC, or instructions per clock. Among the changes is deeper pipelining in the divider unit, which should result in double the throughput for both integer and floating-point math. The cache prefetcher has gotten smarter and is able to cross page boundaries, allowing it to better track and anticipate complex access patterns. The prefetcher also has an adaptive mechanism to avoid hogging memory bandwidth; when queues grow too deep, it will throttle back its activity. There are other tweaks to improve Hyper-Threading (a few queues are now partitioned dynamically between two threads, rather than shared statically at 50-50) and AVX performance (more registers to help deal with memory access that cross cache lines).
The neatest trick is probably the virtualization of move operations; rather than moving data through the ALU, such operations can be accomplished via register renaming, so long as the source and destination datatypes are the same. Fischer told us this feature alone results in an IPC gain of roughly 1.5%.
Ivy has even added several new instructions. Some are related to a new feature intended to prevent escalation of privileges exploits. Another accesses a new on-chip digital random number generator, which will act as a high-quality entropy source for encryption algorithms of all types, not just the AES algorithm that's already accelerated explicitly. Ivy also adds AVX instructions to convert quickly between 32-bit and 16-bit floating-point datatypes, allowing for high-precision 32-bit computation to be combined with more compact 16-bit storage.
All in all, the microarchitectural changes are fairly extensive for a "tick," but they are just the tip of the iceberg. This chip has a number of new power-saving features, too numerous to recount in any detail here. One of the big ones is the power-gating of DDR memory at idle, which should help notebook battery life quite a bit. Also, interestingly enough, Intel now tests the optimal voltage for each chip at multiple frequencies and stores that information on the die, where it can be used by the power management controller. Previously, only two frequency points were tested, and the power controller would interpolate between them. Products with Turbo Boost enabled should presumably operate more efficiently, and Intel has some related tricks up its sleeve, such as products with configurable TDPs. A laptop chip could, for instance, operate at one TDP while on battery power and switch to a higher TDP when snapped into a docking station.
Meanwhile, Intel graphics architect Tom Piazza isn't content to call Ivy Bridge a "tick" all all. He calls it a "tick+" because the graphics architecture has been extensively overhauled, more along the lines of what happens with a "tock" refresh on the CPU side. At IDF last fall, Piazza acknowledged some risk in introducing an "unknown" new graphics core in concert with a process shrink, especially because "the last thing you want to do at Intel is hold up a factory," but the move was apparently a success. In fact, he said he saw no reason not to continue with major graphics architectural improvements like this one, particularly since "graphics move fast."
Ivy's new graphics core adds a broad range of new capabilities, in many ways bringing Intel up to feature parity with AMD's Llano (and forthcoming Trinity) IGPs. The headliner is support for the DirectX 11 graphics API, with all that implies, including hardware tessellation capabilities and a broader selection of texture formats. Additionally, like most DX11 GPUs, Ivy's IGP supports a range a compute-focused features, making it compatible with both Microsoft's DirectCompute and the OpenCL 1.1 standard. As we understand it, all of the major compute-focused capabilities are truly present in the hardware, not just emulated in software, including double-precision FP datatypes, denorms, and support for atomic transactions.
The IGP's execution unit count is up from Sandy Bridge—from 12 EUs to 16—but don't let that number lead you astray. The EUs have been totally restructured in what amounts to a doubling of almost all resources versus Sandy Bridge, with the exception of memory bandwidth. Another interesting change is the addition of a 256KB L3 cache in the graphics core, a feature Piazza said was originally intended for Sandy Bridge but was "retracted" because it didn't offer much performance benefit. Piazza claims this cache delivers an "amazing" reduction of bandwidth utilization between the graphics core and the 8MB last-level cache. Those reductions in ring traffic translate directly into power savings, which turns out to be the cache's primary benefit.
Overall, it sounds like Intel is cleaning up quite a few loose ends with this IGP refresh. Although the firm has miles to go in catching AMD and Nvidia in terms of software support and game compatibility, we do expect changes like the expansion of texture formats to go a long way toward improving compatibility with existing games. Another issue Piazza says they're cleaning up this time around is the anisotropic filtering algorithm, which in Sandy was highly variant depending on the surface's angle of inclination. Now, he tells us, the IGP will "draw circles instead of flowers" in the aniso tunnel test. In part thanks to the doubling of texture samplers, the IGP's media processing capabilities should be substantially faster, too, including QuickSync video encoding.
In a bit of a surprise to us, Intel has upped the number of discrete displays the IGP can support from two to three, of any major output type, including DVI, HDMI, and DisplayPort.
Piazza told us the IGP has been laid out in five physically distinct "slices" comprised of different resource types. Most notably on that front, the EU/texture sampler slice can be scaled up and down. The first use of that capability will surely be the lower-end versions of Ivy with Intel HD 2500 graphics, which should have 8 EUs and half the texturing capacity of the HD 4000. However, Piazza explicitly mentioned future "scale-up opportunities" in this context, as well. Hmmm. We're unsure whether he was thinking of the next "tock" code-named Haswell or something more imminent.
A new-ish platform, too
Although Ivy Bridge fits into the same LGA1155 socket as Sandy, hardware compatibility will depend on the motherboard maker and chipset type. At the very least, motherboards based on Intel's older 6-series chipsets will require a BIOS update to ensure compatibility with Ivy. Some of Intel's business-focused chipsets officially won't support Ivy Bridge at all.
Instead, Intel has introduced a range of new 7-series chipsets to go along with its new CPU. The one of most interest to enthusiasts will surely be the Z77 Express. We've already published a nice round-up of Z77 boards right here, for those who are interested. The only major update in the 7-series platform controller hub (PCH) silicon is the addition of support for USB 3.0. There are a few software enhancements, though, including the addition of a suspend-to-SSD feature inherited from Intel's mobile offerings.
Above are a couple of pictures of the Core i7-3770K alongside the MSI Z77A-GD65 motherboard in our test system. As you can see, Ivy's packaging will be difficult to distinguish from Sandy's by looks alone.