The Pentium 4 is relatively slow on a clock-for-clock basis because of its unusually deep, 20-stage main pipeline. The Pentium 4's clock speeds can reach amazing heights because of this pipeline, too. So it's a tradeoff. The P4 is by no means a poor performer, but it's a little slow clock-for-clock.
Intel has taken a number of steps to improve the P4's clock-for-clock (and overall) performance. Most notably, the company has raised the P4's front-side bus speed and doubled the size of the L2 cache. Hyper-Threadingor simultaneous multithreading (SMT) as it's known in the wider, non-copyrighted worldis yet another way to increase the average number of instructions a processor can execute per clock cycle, or instructions per clock (IPC). Simultaneous multithreading makes a single physical processor look like two logical processors, and in doing so, it keeps the CPU's execution units busier. This isn't symmetric multiprocessing (SMP)that creamy smooth goodness that comes from having multiple processors in a single systembut it essentially looks like it to operating systems and programs. As with SMP, software will have to be multithreaded in order to take full advantage of SMT.
The logic needed to make Hyper-Threading work adds only 5% to the Pentium 4's die size, including duplicate copies of key resources necessary for maintaining two architectural states on one chip. Intel points out that's not much extra real estate for an enhancement that can improve performance by as much as 30% in the right scenarios.
Hyper-Threading adds so little to the Pentium 4's die size because it only requires physical duplicates of a small subset of the processor's resources. Many other CPU resources, including the caches, registers, execution units, and scheduling queue, are shared, either through static partitioning (splitting 'em in two) or dynamic sharing. The most important shared resources are the processor's execution units, where integer math, floating-point math, and load/store functions are handled. Execution stages in the deeply pipelined Pentium 4 are likely to be unused during some CPU cycles, and Hyper-Threading is intended to help keep the chip's execution pipelines busier by exposing a second logical processor.
I would like to cover what gets shared in HT and why in more detail, but that's another article altogether. If you want to understand the specifics of Intel's Hyper-Threading implementation, let me recommend Jon Stokes' article on the subject. He explores the complexities of adjudicating between logical CPUs competing for resources better than I can here.
Hyper-Threading's resource sharing has the potential to sap performance in certain situations. Sharing the L2 cache between two logical processors means only half the cache space and bandwidth may be available to execute a given thread. We'll examine this issue in more detail in our processor benchmarks here shortly.
|LG's X Venture has a beefy battery and a heavy-duty build||10|
|Huawei opens up three new Windows 10 notebooks||4|
|Corsair Commander Pro takes charge of case fans and lighting||3|
|National Taffy Day Shortbread||10|
|Agon AG251FG can do 2560x1440 or 240Hz||16|
|Let's hope lightning doesn't strike FSP's PTM+ power supply||25|
|Rumor: Leaked pictures appear to show Nvidia's next Titan card||18|
|Microsoft sketches out its latest Surface Pro||39|
|AMD says its Vega cards will launch "over the next couple of months"||112|
|Please keep your politics to yourself. Not trying to be a back seat moderator, but you can state your own personal opinion as fact inside the R&P sect...||+32|