The Pentium 4 is relatively slow on a clock-for-clock basis because of its unusually deep, 20-stage main pipeline. The Pentium 4's clock speeds can reach amazing heights because of this pipeline, too. So it's a tradeoff. The P4 is by no means a poor performer, but it's a little slow clock-for-clock.
Intel has taken a number of steps to improve the P4's clock-for-clock (and overall) performance. Most notably, the company has raised the P4's front-side bus speed and doubled the size of the L2 cache. Hyper-Threadingor simultaneous multithreading (SMT) as it's known in the wider, non-copyrighted worldis yet another way to increase the average number of instructions a processor can execute per clock cycle, or instructions per clock (IPC). Simultaneous multithreading makes a single physical processor look like two logical processors, and in doing so, it keeps the CPU's execution units busier. This isn't symmetric multiprocessing (SMP)that creamy smooth goodness that comes from having multiple processors in a single systembut it essentially looks like it to operating systems and programs. As with SMP, software will have to be multithreaded in order to take full advantage of SMT.
The logic needed to make Hyper-Threading work adds only 5% to the Pentium 4's die size, including duplicate copies of key resources necessary for maintaining two architectural states on one chip. Intel points out that's not much extra real estate for an enhancement that can improve performance by as much as 30% in the right scenarios.
Hyper-Threading adds so little to the Pentium 4's die size because it only requires physical duplicates of a small subset of the processor's resources. Many other CPU resources, including the caches, registers, execution units, and scheduling queue, are shared, either through static partitioning (splitting 'em in two) or dynamic sharing. The most important shared resources are the processor's execution units, where integer math, floating-point math, and load/store functions are handled. Execution stages in the deeply pipelined Pentium 4 are likely to be unused during some CPU cycles, and Hyper-Threading is intended to help keep the chip's execution pipelines busier by exposing a second logical processor.
I would like to cover what gets shared in HT and why in more detail, but that's another article altogether. If you want to understand the specifics of Intel's Hyper-Threading implementation, let me recommend Jon Stokes' article on the subject. He explores the complexities of adjudicating between logical CPUs competing for resources better than I can here.
Hyper-Threading's resource sharing has the potential to sap performance in certain situations. Sharing the L2 cache between two logical processors means only half the cache space and bandwidth may be available to execute a given thread. We'll examine this issue in more detail in our processor benchmarks here shortly.
|Corsair Lighting Node Pro brings light strip control to every PC||8|
|In the lab: Asus' Tinker Board SBC||14|
|In the lab: HyperX's Alloy FPS mechanical gaming keyboard||10|
|Team Group Cardea SSDs are ready to handle the heat||7|
|Gigabyte's Ryzen motherboards are home, home on the range||32|
|Zotac molds GTX 1050s into low-profile tiny terrors||7|
|TR forums spotlight: krazyredboy's crazy simulator PC||13|
|Deals of the week: a high-end Mini-ITX mobo, fast RAM, storage, and more||27|
|Steam Audio SDK promises better surround sound gratis||19|
|Best part of the article? We're flying home with Ryzen review samples as of this writing.||+44|