Single page Print

Intel's Pentium 4 3.06GHz processor


Two heads and a whole lotta hertz
— 12:42 AM on November 14, 2002

INTEL'S PENTIUM 4 processor crosses the 3GHz threshold today. We've been playing with this new P4 for a while, and now we get to show you what it's all about. This chip brings Intel's Hyper-Threading technology to desktop processors for the first time. Like clockwork, only much messier, we've loaded up eleven different test configs with a brutal suite of benchmarks to see how the new P4 measures up. Read up to get the lowdown on the new Pentium 4's nosebleed-inducing 3GHz speeds and creamy smooth Hyper-Threading tech.



The Pentium 4 3.06GHz

Simultaneous multithreading lands on the desktop
The big change with the new Pentium 4 isn't actually its clock speed, although 3GHz sounds like a whopper of a number to most folks. Instead, the big news about the new P4 is its Hyper-Threading technology. The impetus behind Hyper-Threading is simple. Throughout the Pentium 4's young life, it has been a relatively slow performer at a given clock speed. Just last month, in our last big CPU review, we saw AMD's Athlon XP 2800+ running at 2.25GHz perform roughly as well as a Pentium 4 at 2.8GHz.

The Pentium 4 is relatively slow on a clock-for-clock basis because of its unusually deep, 20-stage main pipeline. The Pentium 4's clock speeds can reach amazing heights because of this pipeline, too. So it's a tradeoff. The P4 is by no means a poor performer, but it's a little slow clock-for-clock.

Intel has taken a number of steps to improve the P4's clock-for-clock (and overall) performance. Most notably, the company has raised the P4's front-side bus speed and doubled the size of the L2 cache. Hyper-Threading—or simultaneous multithreading (SMT) as it's known in the wider, non-copyrighted world—is yet another way to increase the average number of instructions a processor can execute per clock cycle, or instructions per clock (IPC). Simultaneous multithreading makes a single physical processor look like two logical processors, and in doing so, it keeps the CPU's execution units busier. This isn't symmetric multiprocessing (SMP)—that creamy smooth goodness that comes from having multiple processors in a single system—but it essentially looks like it to operating systems and programs. As with SMP, software will have to be multithreaded in order to take full advantage of SMT.

The logic needed to make Hyper-Threading work adds only 5% to the Pentium 4's die size, including duplicate copies of key resources necessary for maintaining two architectural states on one chip. Intel points out that's not much extra real estate for an enhancement that can improve performance by as much as 30% in the right scenarios.

Hyper-Threading adds so little to the Pentium 4's die size because it only requires physical duplicates of a small subset of the processor's resources. Many other CPU resources, including the caches, registers, execution units, and scheduling queue, are shared, either through static partitioning (splitting 'em in two) or dynamic sharing. The most important shared resources are the processor's execution units, where integer math, floating-point math, and load/store functions are handled. Execution stages in the deeply pipelined Pentium 4 are likely to be unused during some CPU cycles, and Hyper-Threading is intended to help keep the chip's execution pipelines busier by exposing a second logical processor.

I would like to cover what gets shared in HT and why in more detail, but that's another article altogether. If you want to understand the specifics of Intel's Hyper-Threading implementation, let me recommend Jon Stokes' article on the subject. He explores the complexities of adjudicating between logical CPUs competing for resources better than I can here.

Hyper-Threading's resource sharing has the potential to sap performance in certain situations. Sharing the L2 cache between two logical processors means only half the cache space and bandwidth may be available to execute a given thread. We'll examine this issue in more detail in our processor benchmarks here shortly.