Single page Print

Not quite an integrated memory controller, but...
Those folks who expected Intel to make the move to an integrated memory controller with its next-gen architecture might have been disappointed by last Tuesday's announcement, but Intel's Justin Rattner offered some cause for hope in his keynote about Intel's R&D initiatives on Thursday morning. Rattner spoke about Intel's efforts to reduce power consumption even further though various techniques, including the development of a silicon radio for wireless communication that incorporates portions of the electronics in a radio into a chip.

The most striking concept he presented, though, was a CMOS-based voltage regulator module that could replace the VRMs on a motherboard with a single chip. Such a VRM could ramp up and down much quicker than traditional VRMs, allowing SpeedStep-like clock- and voltage-throttling capabilities to eliminate much of the inefficiencies caused by slow response times in current systems.


The proposed three-chip package

To illustrate how such a thing might be implemented, Rattner showed off a package holding three chips: a Pentium M processor, a north bridge chip, and a CMOS VRM. He said this arrangement would allow both the CPU and the Memory Controller Hub to ramp voltages up and down very quickly in response to changes in utilization, saving power.

Rattner presented this concept as a power-saving measure and didn't mention the possibility that an on-package memory controller could help cut memory access latencies. I would expect that moving the memory controller onto the same package as the CPU would have that effect, though. Unfortunately, the CMOS VRM is still a few years off, according to a presenter who mentioned the subject in a session following Rattner's keynote. We will simply have to see when—and if—this concept makes it into an end-user product.

Multi-core mojo and power efficiency
Bob Crepps, a Technology Strategist in Intel's Microprocessor Technology Lab, presented some eye-opening results from Intel's research into power efficiency in multi-core architectures. He covered a lot of ground in his presentation, but a few of the concepts were especially notable, in my view.

First, Crepps made an observation that may not be news, except for the source. Intel doesn't often say such things, but he pointed out that special-purpose hardware can achieve higher efficiency in terms of MIPS per watt than general-purpose processors. Coming from Intel, this admission may be significant because it could lead to the incorporation of specialized hardware blocks into general-purpose CPUs, or perhaps more extensive hardware acceleration in core-logic chipsets. He identified several areas of opportunity for custom hardware engines, including TCP/IP offloading, MPEG encoding and decoding, speech recognition, and (obviously) graphics.

Crepps pointed out that multi-core processors offer superior performance benefits with increases in die area and power consumption than a single core. As an example, he cited the case of four-core processor, each core with multithreading (a la Hyper-Threading), using a shared cache and front-side bus. This chip would be able to move intensive computational loads from one core to the next—"core hopping," as he called it—in order to manage hot spots. (I'm not sure exactly what sort of processor he's been playing with in the lab, but it doesn't sound like anything we've heard of yet.)


He asserted, with a graph to drive home the point, that such a multi-core device could offer much better performance than a single-core processor in the same die area and power envelope. I expect that's true, given a good, parallelizable workload.

Next came a whopper of a slide that captures the problems with the Pentium 4 better than anything I've seen from outside of Intel.


If one factors out changes in process technology and looks only at design changes, the Pentium 4 is six times faster than the 486, but with 23 times the power consumption. That's not a stellar proposition—hence the need to spend transistors on additional cores rather than trying to make a single CPU core perform better.

So how can we further reduce the energy per instruction used by a multi-core processor? One option is a technique called AMP, for asymmetric multiprocessing. The concept is simple enough: try running different processors (or cores) at different clock speeds within the same power envelope, and see which config achieves the best performance. Intel's researchers tested several different configurations using a four-way Xeon system. The power envelope they chose allowed them to run four Xeons at 1GHz each or a single Xeon at 2GHz. They also were able to fit into the same power envelope a simple AMP setup that would run highly sequential code on a single 2GHz Xeon and then switch over, for more parallel code, to three Xeons running at 1.25GHz. This AMP setup would deactivate either the 2GHz single processor or the three 1.25GHz processors, depending on the usage pattern.


The researchers found that an AMP system delivered higher performance overall for the same amount of power. In benchmarks with mostly parallel components, the SMP system was better, while in benchmarks with most sequential components, the single-processor system was better. Overall, though, with the sort of part-sequential/part-parallel workloads that are so often typical, AMP performed best.

This research suggests an obvious way forward for optimizing performance per watt on multi-core processors. I'll be interested to see when it's first implemented.