The Pentium 4 is relatively slow on a clock-for-clock basis because of its unusually deep, 20-stage main pipeline. The Pentium 4's clock speeds can reach amazing heights because of this pipeline, too. So it's a tradeoff. The P4 is by no means a poor performer, but it's a little slow clock-for-clock.
Intel has taken a number of steps to improve the P4's clock-for-clock (and overall) performance. Most notably, the company has raised the P4's front-side bus speed and doubled the size of the L2 cache. Hyper-Threadingor simultaneous multithreading (SMT) as it's known in the wider, non-copyrighted worldis yet another way to increase the average number of instructions a processor can execute per clock cycle, or instructions per clock (IPC). Simultaneous multithreading makes a single physical processor look like two logical processors, and in doing so, it keeps the CPU's execution units busier. This isn't symmetric multiprocessing (SMP)that creamy smooth goodness that comes from having multiple processors in a single systembut it essentially looks like it to operating systems and programs. As with SMP, software will have to be multithreaded in order to take full advantage of SMT.
The logic needed to make Hyper-Threading work adds only 5% to the Pentium 4's die size, including duplicate copies of key resources necessary for maintaining two architectural states on one chip. Intel points out that's not much extra real estate for an enhancement that can improve performance by as much as 30% in the right scenarios.
Hyper-Threading adds so little to the Pentium 4's die size because it only requires physical duplicates of a small subset of the processor's resources. Many other CPU resources, including the caches, registers, execution units, and scheduling queue, are shared, either through static partitioning (splitting 'em in two) or dynamic sharing. The most important shared resources are the processor's execution units, where integer math, floating-point math, and load/store functions are handled. Execution stages in the deeply pipelined Pentium 4 are likely to be unused during some CPU cycles, and Hyper-Threading is intended to help keep the chip's execution pipelines busier by exposing a second logical processor.
I would like to cover what gets shared in HT and why in more detail, but that's another article altogether. If you want to understand the specifics of Intel's Hyper-Threading implementation, let me recommend Jon Stokes' article on the subject. He explores the complexities of adjudicating between logical CPUs competing for resources better than I can here.
Hyper-Threading's resource sharing has the potential to sap performance in certain situations. Sharing the L2 cache between two logical processors means only half the cache space and bandwidth may be available to execute a given thread. We'll examine this issue in more detail in our processor benchmarks here shortly.
|Asus brightens up its Z170 Pro Gaming mobo with Aura RGB LEDs||5|
|iPad sales stabilize in Apple's fiscal 2016 third quarter||35|
|Seagate Nytro family now includes a 2TB M.2 SSD||11|
|Crucial fills out MX300 SSDs with 275GB, 525GB, and 1TB models||19|
|Nvidia and AMD ease 360-degree video production with new APIs||16|
|AMD FireRender is now the open-source Radeon ProRender||8|
|AMD Radeon Pro graphics cards bring Polaris to content pros||49|
|Radeon Pro Solid State Graphics keeps big data close to the GPU||84|
|Pascal powers up pro graphics with Nvidia's new Quadros||33|
|Now you can install Crysis directly on the video card!||+51|