Single page Print

Intel's Core i5-750 and Core i7-870 processors

Nehalem plays for the masses

Almost a year ago, the Core i7 burst onto the scene and created a well-deserved stir with its incredibly bandwidth-rich system architecture and sometimes-astonishing performance in multithreaded applications. For the right sort of jobs, such as 3D rendering or scientific computing applications, the Core i7 delivered a performance leap beyond its precursors that was nearly unprecedented. With this new processor, Intel removed any semblance of doubt about who held the lead in CPU technology.

Trouble was, the Core i7's most dramatic performance gains were largely confined to specific types of applications, many of which have little relevance to the average computer user. On top of that, the price of entry for a Core i7-based system was fairly steep. The CPU themselves weren't especially cheap, nor were the motherboards, and one had to populate those boards with six memory modules to achieve optimal performance. All of this was a natural consequence of the fact that those first Core i7 products were repurposed silicon mainly intended for servers and workstations in the guise of Nehalem Xeons, roles for which those CPUs are exceptionally well suited.

Thus, the Core i7 has held undisputed technology leadership in desktop processors, but Intel's older Core 2 technology has remained the bread and butter of its product lineup. Against this less potent opposition, a resurgent AMD has made headway in the middle of the market all year with the steady improvements to the Phenom II. Today, however, Intel is officially taking the wraps off of a new weapon in its arsenal: the chip code-named Lynnfield, which looks to bring the native quad-core Nehalem microarchitecture into the mainstream. Thanks to some clever engineering and integration, this processor promises to enable systems that are faster, cheaper, quieter, smaller, and more energy efficient than prior desktop PCs.

You have, perhaps, heard such claims before, no? The question, then, is whether Intel has really pulled off such a feat with Lynnfield.

A brief Nehalem refresher
Lynnfield is simply a new implementation of the same Nehalem microarchitecture inside the first Core i7 processors—those chips were code-named Bloomfield. Nehalem is, in turn, an evolutionary step beyond the familiar Core 2, but with a heaping helping of consequential changes, especially to the system architecture. Nehalem consolidates four execution cores onto a single piece of silicon, integrates an on-die memory controller, and eliminates the front-side bus—adopting a system layout familiar from AMD's Athlon 64 and its descendants, although Intel's version of the same is intended to be faster and better.

Despite the new plumbing around them, Nehalem's execution cores are still based on the Core 2's, but they have been tweaked in ways big and small. For example, changes to instruction decoding and branch prediction bring higher performance per clock. Tweaks to the internal memory subsystem complement the revamping of the whole memory hierarchy, which has been tuned for the freer flow of data and instructions. Each core has 32KB L1 data and instruction caches and a 256KB dedicated L2 cache. The third-level cache is larger, at 8MB, and is shared by all four cores; as a result, the L3 cache is crucial to inter-core communication.

One of the big highlights on Nehalem's feature list is the return of simultaneous multithreading (SMT), better known in Intel marketing-speak as Hyper-Threading. Each Nehalem core can track and execute two hardware threads at once to make better use of its rich, four-issue-wide execution resources. Although Hyper-Threading proved to be a bit of a mixed blessing in the Pentium 4, Nehalem's SMT implementation has proven to be a nearly unqualified win in the server and workstation markets, and it shows real promise for the desktop, too, for reasons we'll soon explore.

Nehalem's L3 cache, memory controller, and off-chip I/O components are separate from the quad execution cores and together make up what Intel calls the chip's "uncore." The uncore is clocked separately from the cores and has its own power states, as well. As you might be gathering, this design is really quite modular, and Intel has played mix-and-match with the uncore elements of the base Nehalem microarchitecture while cooking up Lynnfield. For a deeper look at Nehalem itself, I suggest reading our reviews of the original Core i7 and the Xeon 5500 series.

Romping through Lynnfield
The most pertinent question today is how Intel has adapted the Nehalem microarchitecture to suit the mainstream market. Those changes are oriented toward related goals of cost reduction and integration.

The most obvious change is a move from three memory channels in Bloomfield to two channels in Lynnfield. Dual channels have been the standard in desktop PCs for ages now, and Lynnfield regresses to the mean. Those memory channels are potent, though—twin sets of DDR3 DIMMs, with officially sanctioned speeds up to 1333MHz, although higher frequencies are possible on some models, if you're willing to be an outlaw. The bump up in memory clocks somewhat offsets the loss of a channel, since Bloomfield is officially limited to 1066MHz RAM.

Lynnfield also, somewhat surprisingly, does away with the QuickPath Interconnect used on prior Nehalem incarnations. In its place are 16 lanes of built-in, on-die PCI Express 2.0 connectivity, used to connect the CPU directly to a discrete graphics card. These PCIe lanes can be split, if needed, into twin x8 links for use with dual graphics cards. Although this split arrangement is technically less than best, we've found it to be indistinguishable from the 32-lane X58 Bloomfield chipset in our initial tests with SLI and CrossFireX. The on-die integration of PCIe has the potential to reduce CPU-to-device latency over PCIe, which may turn out to be preferable to more lanes and higher latency, once GPU makers have had the opportunity to tune their drivers to take advantage of it.

A block diagram of the Lynnfield system architecture. Source: Intel.

The other bit of novel I/O in Lynnfield's uncore is a relatively pedestrian 2GB/s DMI link used to talk to Intel's single-chip core logic solution, the P55 platform controller hub, or PCH. We have a rather complete review of the P55 online today, with a look at motherboard solutions from three major manufacturers, so I won't dwell too much on its specifics.

Consider, however, what moving to single-chip solution saves in terms of space, power, and thus cost. Intel claims this dual-chip (CPU and PCH) solution offers a roughly 40% reduction in package size versus the Core 2 and friends. Intel reckons Nehalem's power-saving mojo reduces idle power consumption for the CPU alone by up to 50%. Not only that, but the 95W maximum power envelope, or thermal design power (TDP), of Lynnfield processors now encompasses the major PCIe links, and what's left fits inside the P55 chip, which has a tiny 4.7W TDP. Compare that a 22W TDP for the P45 north bridge and 4.5W for the ICH10R south bridge, along with a 95W Core 2 processor. Platform power consumption for Lynnfield systems should be down substantially.

One upshot of this integration should be the flourishing of motherboards and systems that pack tremendous power into smaller form factors. Fewer chips, simpler power delivery and cooling, and easier routing all help on this front. Already, major players like Gigabyte have introduced feature-rich mATX P55 motherboards, and I expect Mini-ITX solutions will be forthcoming, as well.

With that said, Lynnfield itself isn't exactly a small chip. Both Lynnfield and Bloomfield are manufactured on Intel's 45nm high-k fab process. At 774 million transistors and 296 mm², Lynnfield is actually larger than Bloomfield (731M transistors and 263 mm²). Based on the die shot above, it appears much of Lynnfield's additional die area is concentrated in its PCIe block, which clearly occupies more area than Bloomfield's two QPI blocks (picture here.) Lynnfield also slightly outweighs AMD's 45-nm Phenom II, which packs 758M transistors into 258 mm².