Rattner reiterated some of the known features of the Core microarchitecture and revealed a few new ones, as well. Among the features of the new architecture will be:
The poster boy for Core architecture’s characteristics in both Rattner’s and Gelsinger’s keynotes was Conroe, the future desktop version of Core set to replace the Pentium 4/D. Rattner claimed that Conroe will deliver up to 40% more performance with 40% less power use than a Pentium D 950.
A four-issue-wide, 14-stage main pipeline This will obviously be a shorter pipeline than the 31 stages in Netburst processors, much closer to the current Pentium M and Core Duo CPUs, as expected. Micro-fusion Known as micro-ops fusion in the Pentium M, this allows the processor to fuse together certain types of internal “micro-ops” instructionsbehind the CPU’s instruction decoder, in the RISC coreand execute them as one for more performance per clock. Macro-fusion This is a new one, but wasn’t explained in great detail. Presumably, the CPU will be able to fuse together certain x86 ISA instructions a la micro-ops fusion. The example given was the fusion of the compare and jump instructions. Single-cycle execution of 128-bit SSE Core processors will execute the entire family of 128-bit SSE instructions in a single cycle, for what Intel is calling a boost in digital media performance. Obviously, higher performance per clock in SSE instructions will accelerate a range of applications. Shared on-chip L2 cache The dual-core Core (ack!) processors will feature a single, unified L2 cache that should allow for efficient sharing of data between the processor cores with no need for external bus traffic for cache coherency protocol traffic between the cores. Rattner said that there would be no partitioning of the L2 cache between cores, and in the event that one core should shut itself down to save power during a period of inactivity, the other core could make use of the full L2 cache if needed. Smarter memory access This one seems to come around every time Intel revises its CPU, but Core will indeed include new cache prefetch algorithms, which is probably necessary for best results given the move to a shared L2 cache. Also, as we learned at the last IDF, Core will have a feature called memory disambiguation that attempts to opportunistically reorder memory loads and stores when possible in order to lower effective access latencies. Advanced power gating Clock gating shuts down logic on the chip when it’s not needed at the time. In his keynote speech, Intel’s Pat Gelsinger described the Core architecture’s clock gating as “super-fine grained.”
Gelsinger pinpointed Conroe’s TDP, or thermal design power, at 65W. He also invited a gent from Microsoft up on stage for an Office 12 demo and on-stage benchmark. Office 12 will be optimized for multithreaded performance, and in a head-to-head competition, a Conroe-based system completed an Excel simulation in 11.4 seconds that took 28.7 seconds on a comparable Pentium D system.