In his opening keynote today, Intel CEO Paul Otellini revealed that Intel has shipped over five million Core 2 Duo processors in the past 60 days, along with over a million Xeon 5100-series processors in the three months since their introduction. That means, he said, that the Xeon 5100 chips represented nearly half of all of Intel's DP-class (dual-socket low-end server) processor shipments and over 40% of the entire dual-socket market in the Xeon 5100's first three months. Obviously, Intel has made significant progress in converting away from Netburst and getting Core-based processors into the market, but they still have work to do.
Four cores in a socket, if not a chip
One of the most notable ways Intel plans to make use of its Core chips in the coming months is by deploying them in the same sort of dual-chip package used in the Netburst-based Presler and Dempsey products, essentially cramming two dual-core CPUs into a single processor socket to make a "quad core" result. We've known about these products, code-named Kentsfield for the desktop and Clovertown for servers, for some time now, but Intel recently announced these processors would launch later this year, rather than in early 2007 as originally planned. The decision to pull these products forward into late '06 isn't entirely surprising given the relative ease of gluing a pair of Core 2 Duo chips together on a single package.
As with Presler, which came to be best known as the Pentium D 900 series, the multi-chip package approach for Kentsfield and Clovertown will have its advantages and drawbacks. The advantages mostly come for Intel in the form of higher yields made possible by the ability to take two good chips from anywhere on a wafer (or, presumably, even from different wafers) and use them together. Trying to produce a single quad-core chip with the same die area as two Core 2 Duo processors would be a drag on yields. In fact, if I understood Intel's Steve Smith correctly, he estimated that the dual-chip approach could improve yields by about 20% compared to making a single, large quad-core chip. Smith also noted that Intel could choose the very best chips, with the ability to operate at relatively high frequencies and relatively low voltages, for use in quad-core products. The drawbacks of putting two chips together on one package have to do mainly with less-than-optimal performance. As one might expect from two Core 2 Duos paired up, Kentsfield and Clovertown will feature a total of 8MB of L2 cache, with 4MB of L2 cacheshared between two coresper chip. The two chips will communicate with one another, share data between their internal caches, and transmit cache coherency updates solely by means of the front-side bus. Each chip will put an electrical load on this bus, potentially making higher clock frequencies harder to reach. On balance, though, these new "quad-core" processors should probably offer noteworthy improvements over the current Core 2 Duo, provided that highly multithreaded software or multitasking is the name of the game.
Kentsfield will be the first of these quad-core products to arrive in the form of a new Core 2 Extreme product in November, Otellini announced today. The Extreme version will be followed by less expensive processors to be called "Core 2 Quad." Seriously. The first Extreme version should run at 2.66GHz, I believe, or slightly less than the dual-core Core 2 Extreme X6800.
Also in November, Intel will intro the server-oriented Clovertown in the Xeon 5300 series. Like the 5100 series, Xeon 5300s will use the formidable Bensley server platform with its dual, independent front-side busses and FB-DIMM memory subsystem.
Both the desktop and server quad-core parts should be drop-in replacements for their existing dual-core counterparts. Intel intends for them to fit into the same basic power envelopes, plug into the same sockets, and work with the same chipsets. If the motherboard vendor or computer maker has done its job well, the quad-core stuff should require only a BIOS update. On the desktop, even the older 975X chipset should work fine with Kentsfield.
Also, to squash a rumor, Intel made clear that shipping products will have all of the same power saving features, such as C1E halt and SpeedStep, as their dual-core variants. Early steppings of the processors didn't have these features enabled, but newer ones do.
Intel intends eventually to deliver various versions of these quad-core CPUs to fit into different power envelopes, including one that will have a thermal design power of only 50W, to be introduced in the first quarter of next year.
Given everything, the move to four cores per socket seems surprisingly undramatic from a hardware perspective. The more interesting questions will almost certainly center on the performance prospects for software applications to extract additional performance from more than two CPU cores, especially in mainstream desktop PCs. That part of the equation looks to be substantially more complicated than situating two dual-core CPUs together in a single package.
Teraflop chip stirs intrigue
In fact, Intel appears to be looking beyond simply adding additional x86 execution cores to future chips in order to take advatange of the ever-ballooning transistor budgets made possible by Moore's Law. One of the most dramatic (and yet oddly underplayed) facets of the opening IDF keynotes was the revelation of a new prototype chip with a tremendous amount of parallel floating-point computing power. Otellini mentioned the chip in his speech, but left it to CTO Justin Rattner to fill in the details.
The chip itself is quite a departure for Intel. Its basic building block is a simple floating-point execution core that is not x86-compatible. Next to the core is a router that connects the core to the rest of the chip, including memory. Intel calls this basic building block a "tile," and many tiles are arranged on the chip in an 8x10 grid, for a grand total of 80 FP execution cores.
Rattner said Intel has stacked this chip full of execution resources on top of a 20MB SRAM chip in such a way that there are thousands of electrical connections between them, making paths for tremendous amounts of bandwidth. Each execution core gets 256KB of dedicated SRAM via this arrangementshades of Sony's Cell processor herefor an aggregate of one trillion bytes per second of bandwidth from the execution cores to memory.
Could this processor be the hardware behind all of the rumors about Intel making a foray into high-end desktop GPUs? Could this project have played a hand in convincing AMD and ATI to pursue their merger? That still sounds like wild-eyed speculation to me at this point, but given where things stand, such talk is hard to avoid. Intel already has preliminary silicon, after all. They're saying this chip can deliver over a teraflop of computational power at its current 3.1GHz clock speed, so it might well be able to emulate DirectX 10 or handle other exotic tasks, such as real-time ray-tracing. And Otellini said Intel believed this chip would be available as a production product in the future, although he gave no ETA or any other information. We will have to see if we can learn more about this chip, its capabilities, and its potential applications in the near future.
For now, that wraps our quick look at day one of Fall IDF 2006. We'll have more from the show as the week unfolds.