We’ve known for a while the basic outlines of Intel’s plans for future processors built on its upcoming 45nm fab process technology, but Intel exec Pat Gelsinger filled in more of the picture in a press conference today. Chief among the revelations were some reasons why the 45nm Penryn chips are, Gelsinger said, “not just a simple die shrink,” and more specifics about Nehalem, the chip based on the next-generation microarchitecture that will follow Penryn.
Here are Penryn’s key characteristics, some of which we’ve known about and some of which were just revealed or officially confirmed today.
- A 45nm die shrink of the Core microarchitecture — Penryn will be based on the Core architecture of current Core 2 processors, but will be built using Intel’s 45nm high-K process, which Gelsinger reminded us involves a “fundamental restructuring of the transistor,” with 20% faster switching and 30% lower power. Like the Core 2, Penryn chips will have two cores onboard and will be employed in dual-chip packages for quad-core products. Each Penryn chip will cram 410 million transistors into a 107mm² die; current Core 2 chips pack 291 million transistors into 143mm².
- 6MB of L2 cache per chip — Credit larger caches for much of Penryn’s increased transistor count. The chips will have 6MB of L2 cache, shared between two cores. Naturally, dual-chip quad-core configurations will have a total of 12MB of L2 cache.
- SSE4 and “Super Shuffle Engine” — We’ve already reported on the 50 new instructions of SSE4, and Penryn will support them, as expected. We learned today that Penryn will have the ability to perform 128-bit data shuffle operations in a single cycle. Gelsinger said this fast shuffle capability should make SSE4 much more programmable and more useful for compiled code, because the CPU will quickly handle realigning data as needed for vector execution.
- A faster divider — Penryn will be faster clock-for-clock than current Core 2 processors, and not just because of larger caches and SSE4. The CPU has a new, faster divider that can process four bits per clock versus the two bits per clock of current Conroe chips. Accordingly, Gelsinger expects twice the divide performance of Core 2 Duo and up to four times the performance for square-root operations.
- Bus speeds up to 1600MHz — We’ll see front-side bus speeds in Penryn derivatives of up to 1.6GHz, depending on the market segment. Gelsinger offered few specifics here, only noting that Xeon server CPUs will have bus speeds of “up to 1600MHz,” with no mention of specific bus frequencies for desktop or mobile chips.
- A new lower power state — Penryn will be able to drop into an additional low-power state when idle, which Intel has designated as the C6 state (or “deep power down capability,” if you’re into marketing names). This mode turns off CPU clocks, disables caches, and goes to what Gelsinger said is the lowest power state the process technology allows. Waking from this mode takes longer than it does from other power states, as one might expect.
- Dynamic Acceleration Tech — Penryn will also play with power by introducing a novel dynamic clock speed scaling ability. When one CPU core is busy while the other is idle, thus not requiring much power or producing much heat, Penryn will take advantage. The chip will boost the clock speed of the busy core to a higher-than-stock frequencywhile staying within its established thermal envelope.
- A split-load cache — Gelsinger said this will allow speculative execution across cache line boundaries, but offered little additional detail.
- Improved virtualization — No details here, although I believe they may have been disclosed before.
- Clock speeds over 3GHz and bitchin’ performance — Intel expects both the desktop and server versions of Penryn to reach clock speeds in excess of 3GHz, and in fact has been testing 3.2GHz versions of desktop and server chips already.
Gelsinger said they’d measured a 3.2GHz desktop part at 20% higher gaming performance than the current fastest Conroe. For applications that use SSE4, like media encoding, we can expect to see improvements of over 40%.
As for the server parts, Gelsinger said a 3.2GHz quad-core Penryn-derived system based on the Caneland platform with a 1600MHz front-side bus was achieving over 45% gains versus today’s fastest quad-core Xeon systems in certain apps. The apps he cited were bandwidth and floating-point-intensive ones like Stream, some sub-elements of SPECfp, and HP workloads like computational fluid dynamics. - Familiar power envelopes — Dual-core desktop versions of Penryn are slated to have a 65W TDP rating, like most Core 2 Duos today. The quad-core versions will come with 95W and 130W TDPs. The Xeon variants will hit 40, 65, and 80W TDP targets in dual-core form and 50, 80, and 120W in quad-core form. Gelsinger didn’t quote any thermal envelopes for mobile CPUs from this family, but there are evidently no plans for a quad-core mobile version of this processor.
Gelsinger said the entire family of Penryn-derived products is still on track to be in production this year, and Intel still expects to launch the first products from this family in the server segment in 2007. We’ll cover today’s revelations about the next-generation Nehalem architecture in a separate post shortly.