The big change with the new Pentium 4 isn't actually its clock speed, although 3GHz sounds like a whopper of a number to most folks. Instead, the big news about the new P4 is its Hyper-Threading technology. The impetus behind Hyper-Threading is simple. Throughout the Pentium 4's young life, it has been a relatively slow performer at a given clock speed. Just last month, in our last big CPU review, we saw AMD's Athlon XP 2800+ running at 2.25GHz perform roughly as well as a Pentium 4 at 2.8GHz.
The Pentium 4 is relatively slow on a clock-for-clock basis because of its unusually deep, 20-stage main pipeline. The Pentium 4's clock speeds can reach amazing heights because of this pipeline, too. So it's a tradeoff. The P4 is by no means a poor performer, but it's a little slow clock-for-clock.
Intel has taken a number of steps to improve the P4's clock-for-clock (and overall) performance. Most notably, the company has raised the P4's front-side bus speed and doubled the size of the L2 cache. Hyper-Threadingor simultaneous multithreading (SMT) as it's known in the wider, non-copyrighted worldis yet another way to increase the average number of instructions a processor can execute per clock cycle, or instructions per clock (IPC). Simultaneous multithreading makes a single physical processor look like two logical processors, and in doing so, it keeps the CPU's execution units busier. This isn't symmetric multiprocessing (SMP)that creamy smooth goodness that comes from having multiple processors in a single systembut it essentially looks like it to operating systems and programs. As with SMP, software will have to be multithreaded in order to take full advantage of SMT.
The logic needed to make Hyper-Threading work adds only 5% to the Pentium 4's die size, including duplicate copies of key resources necessary for maintaining two architectural states on one chip. Intel points out that's not much extra real estate for an enhancement that can improve performance by as much as 30% in the right scenarios.
Hyper-Threading adds so little to the Pentium 4's die size because it only requires physical duplicates of a small subset of the processor's resources. Many other CPU resources, including the caches, registers, execution units, and scheduling queue, are shared, either through static partitioning (splitting 'em in two) or dynamic sharing. The most important shared resources are the processor's execution units, where integer math, floating-point math, and load/store functions are handled. Execution stages in the deeply pipelined Pentium 4 are likely to be unused during some CPU cycles, and Hyper-Threading is intended to help keep the chip's execution pipelines busier by exposing a second logical processor.
I would like to cover what gets shared in HT and why in more detail, but that's another article altogether. If you want to understand the specifics of Intel's Hyper-Threading implementation, let me recommend Jon Stokes' article on the subject. He explores the complexities of adjudicating between logical CPUs competing for resources better than I can here.
Hyper-Threading's resource sharing has the potential to sap performance in certain situations. Sharing the L2 cache between two logical processors means only half the cache space and bandwidth may be available to execute a given thread. We'll examine this issue in more detail in our processor benchmarks here shortly.