In a press conference earlier today, Intel provided a sneak peek at some of the papers it will present next week during the International Solid-State Circuits Conference in San Francisco. The chipmaker revealed a few juicy details about Gulftown, its upcoming six-core, 32-nm processor, as well as some interesting research prototypes kicking around in its labs.
Gulftown, also known as Westmere 6C, is essentially a six-core version of the dual-core Westmere design that recently debuted in Core i3 and Core i5 processors. Intel has fashioned Westmere 6C not out of three dual-core dies, but out of a single piece of silicon featuring six cores, a generous 12MB of L3 cache, a couple of 6.4GT/s QuickPath Interface links, and a triple-channel DDR3-1333 memory controller. (Westmere 2C, by contrast, comes with a companion die that houses its memory controller and graphics processor.) Hyper-Threading and Turbo Boost capabilities are part of the formula, too.
All told, Westmere 6C packs 1.17 billion transistors and measures 240 mm². That’s actually smaller physically than the 45-nm Bloomfield die from quad-core Core i7-900 processors, which spreads out 731 million transistors over 263 mm². Intel has no plans for a native quad-core 32-nm chip, as far as we know, but it does intend to release quad-core versions of Westmere 6C. Those products will simply have a couple of cores disabled.
Intel has also done quite a bit of work to make Westmere, in both its 2C and 6C forms, more power-efficient than previous offerings. For example, Westmere is the first Intel processor design that can do power gating with not just CPU cores, but also the "uncore" elements of the chip. Those uncore parts include the cache, QuickPath links, and memory controller; power gating, meanwhile, cuts off power delivery to certain areas of the chip, which helps improve power efficiency at idle. Intel has gone so far as to make the processor flush the contents of its L3 cache into an SRAM so the cache can power down completely. Also, Westmere 6C supports low-voltage DDR3 memory, which runs at 1.35V instead of the standard 1.5V. Using low-voltage DIMMs purportedly brings a 20% reduction in memory power draw.
Intel also gave us a peek into some of the advanced projects taking place in its research labs. There, the chipmaker is working on package-to-package connectivity, as well as "digital intelligence" to get the most performance out of many-core designs.
On the data connectivity front, Intel has developed a 47-channel interconnect that links chip packages directly and enables 470Gb/s (58.8GB/s) of bandwidth using only 0.7W of power. The interconnect takes the form of a ribbon cable—not unlike flexible multi-GPU bridge connectors—connected directly to each package. Intel says this approach saves considerable amounts of power over a more conventional design.
Intel sees this type of interconnect as especially handy for future, so-called "tera-scale" devices, which might need to move a terabyte per second from one chip to another. A conventional interconnect design might require about 150W of power to maintain that bandwidth, but the ribbon-cable approach could do it with "about 11W."
Speaking of tera-scale processors, Intel brought up that 80-core proof of concept it originally revealed at its fall 2006 developer forum. Such many-cores designs would be particularly vulnerable to inconsistencies in clock and voltage scaling between different cores, but Intel has come up with some interesting ways to get around that problem and maximize performance.
These days, both Intel and AMD set the speeds and voltages of their CPUs based on what the slowest core can achieve. However, Intel says it could define those parameters on a per-core basis in many-core designs, so some cores would run at higher clock speeds or with lower voltages than others. Then, in light workloads, the processor could intelligently map threads to its most capable cores. That approach alone could result in a 6-35% energy saving over simply assigning threads to cores on a random basis. Intel thinks it could achieve an additional 20-60% energy savings for "certain tasks" by using a more aggressive scheme called "thread hopping," which would move threads to faster cores as soon as those cores became free.
The firm is also looking into what happens in harsh conditions, when a chip can’t always produce correct results every time. In some of the research scenarios, as we understand it, processor cores could be pushed to run at higher clock speeds or lower voltages than usual. The processors could then be allowed to make errors and correct them either by running instructions again at half the speed or by running the same instructions multiple times. Purportedly, if an error were flagged in such a case, the instructions would only have to be run twice to ensure a correct result. The chipmaker didn’t go into too much detail here, but it claims running at nominal settings with error correction could improve performance by 40% or power efficiency by 21%.
Intel’s research projects can sometimes produce real innovations that get incorporated into future products, but they’re usually many years from production when they’re first presented in a context like this one. Still, the work itself is often fascinating, and it gives us a sense of what to expect many years down the road.