Prescott clock speeds will initially range from 2.8GHz to 3.4GHz. To keep Prescott-based P4s distinct from older “Northwood” cores, Intel is tacking an “E” on to the product names, so they’ll be called the Pentium 4 2.8E or 3.2E. The product mix gets most confusing at 2.8GHz, where one could buy four different Pentium 4s: the 2.8GHz (a Northwood core with a 533MHz front-side bus), the 2.8C (Northwood again, but with an 800MHz bus), the 2.8A (Prescott with a 533MHz bus), or the 2.8E (Prescott with 800MHz bus). Clear as mud?
Anatomy of a die shrink
Let’s start with a look at the gory details of the die shrink. With Prescott, Intel is moving the Pentium 4 from a 130nm (or 0.13 micron) fabrication process to a 90nm process. As always, such a transition brings immediate benefits in the form of smaller die sizes and, usually, higher potential clock speeds. The conversion to 90nm is far from trivial, though, and Intel has enhanced its manufacturing process in a number of ways in order to facilitate the change.
One of the most notable changes is the use of a strained silicon substrate. When stretched slightly, the lattice structure of silicon atoms spreads out and opens up, allowing for freer flow of electrons. This lower resistance, in turn, allows for smaller gate lengths and faster transistors. Intel claims here that its new process only adds two percent to manufacturing costs, which is remarkable given the use of strained silicon.
Intel’s 90nm process replaces the fluorine-doped silicon oxide dielectric film used previously with an even lower capacitance carbon-doped oxide film. This process also employs a layer of nickel silicide, essentially as caps on the transistors, to lower resistance versus the cobalt silicide used in Intel’s 130nm process. The result of these changes is gate lengths as small as 50nm. SRAM cells are down from 2 square microns to 1.15.
Not only is the 90nm process smaller, but Intel is also manufacturing Prescott using seven layers of copper interconnects, instead of the six used at 130nm. All told, the changes shrink the Pentium 4’s die size to 122 mm2, from 145 mm2 for Northwoodthis despite the fact Prescott’s transistor count is 125 million, over twice Northwood’s 55 million transistors.
The Pentium 4’s new look
Wondering how Prescott got to have so many more transistors? The answer is that Prescott is a serious overhaul of the Netburst microarchitecture all Pentium 4s share. In fact, Prescott is arguably a more major revamp than the P6 core got during its long tenure at the heart of the Pentium Pro, Pentium II, and Pentium III processors. There are too many changes to cover in depth here, but I will attempt to summarize them and talk about the most significant modifications of the chip’s design.
The watchwords for the Prescott changes are “higher clock frequencies.” Virtually all the modifications to the Prescott core are intended to produce high performance while allowing the chip to run at clock speeds of 4GHz and beyond. Many of the radical elements of the original Netburst design are present here in even more radical form, including the deep main pipeline, execution trace cache, and ample amounts of speculative logic and prefetching. Most of these changes represent tradeoffs of various types, between, say, higher clock speeds and higher clock-for-clock performance, or, in many cases, between higher latencies and better peak performance. Generally, Prescott has been tuned for higher clock frequencies, and the choices Intel’s design team has made reflect that emphasis.
With that said, we’ll let the bullets start flying on our summary of Prescott’s new features.
- A much longer pipeline Probably the biggest news of the day is that fact that Netburst’s main branch prediction/recovery pipeline has been lengthened from a healthy 20 stages in its previous incarnation to 31 stages in Prescott. To give you a point of reference, that’s longer than the Alaskan oil pipeline. Pipelines of around 10 stages are much more common. AMD’s Hammer core in the Athlon 64 and Opteron processors is 12 stages.
By making each stage of the pipeline less complex, Intel increases the processor’s tolerance for running at higher clock speeds. In doing so, though, Intel’s engineers have chosen to reduce clock-for-clock performance. This change, by itself, would significantly lower the number of instructions per clock (IPC) the Pentium 4 can execute. Higher clock speeds can offset a lower IPC, but Prescott starts out at only 3.4GHz, and Northwood runs at that speed, too.
Fortunately, there are a number of countervailing forces to take into account. For one thing, instruction latencies vary; not all instructions use all stages of the pipeline. More importantly, Prescott includes a whole raft of enhancements aimed at increasing its clock-for-clock performancesome in very specific ways. That’s what the rest of these bullet points are about.
Before we move on, I should point out once more that taken in context, a lower IPC isn’t necessarily a bad thing. Higher or lower IPCs in processor design are tradeoffs, and need not evoke a value judgment. What is true of the Pentium 4, and of Prescott more so than prior revisions, is that Intel has chosen to go full-bore the way of lower IPC and higher clock speeds. This “speed demon” approach to processor design seems to fit reasonably well with Intel’s technological prowess in chip fabrication.
- A larger L2 cache The main contributor to Prescott’s massive transistor count is its new 1MB L2 cache. We’ve seen larger caches help performance many times before, the most dramatic recent example being the Pentium 4 Extreme Edition processors with 2MB of L3 cache onboard. The Extreme Edition is a screamer as a result of this massive cache. Prescott’s larger L2 cache necessarily has higher latencies, so going to a larger cache has its drawbacks. Still, in a chip designed to run so much faster than main memory, the larger on-chip cache makes sense.
- A larger L1 data cache Northwood’s L1 data cache was 8K and 4-way associative. Prescott’s is 16K and 8-way associative, so Prescott’s L1 cache should have a higher hit rate and, thus, be more effective.
Like previous Netburst processors, Prescott’s L1 instruction cache is an unconventional execution trace cache that holds decoded micro-ops for the processor’s RISC-like core instead of CISC-style x86 instructions. Prescott’s execution trace cache still holds roughly 12,000 micro-ops, but the chip can now encode more types of micro-ops into the trace cache, making it more efficient.
- SSE3 instructions Intel has endowed the Prescott core with 13 new instructions now known as SSE3. Like previous SSE revisions, these extended instructions are intended to accelerate certain types of computational tasks. Five new instructions for complex arithmetic allow for better handling of tasks like Fast Fourier Transforms; these instructions should enhance the Pentium 4’s potential in scientific and distributed computing scenarios. Another four new instructions should make the Pentium 4 a better vertex shader for graphics applications by allowing manipulation of data organized as an array of structures, as is common in graphics vertex databases. A pair of new instructions enhances thread synchronization in Hyper-Threading, allowing an unoccupied logical processor to enter a dormant state in order to release resources for the other logical processor, to consume less power, or both. The remaining instructions should improve video encoding and x87-to-integer data conversions.
Of course, programs must be rewritten or recompiled to take advantage of SSE3 instructions, so we won’t see SSE3’s benefits immediately.
- Better prefetching Intel has improved Prescott’s hardware and software prefetch abilities, so it can anticipate what data will be needed next and fetch them into its L2 cache. Most importantly, the hardware prefetching algorithm, which requires no special code, should be smarter about what to grab and when to grab it.
- Enhanced Hyper-Threading Intel’s engineers have modified Prescott in various ways to make Hyper-Threading better. Shared resourced have been expanded and more types of operations can be conducted in parallel. The number of store instructions in flight is up from 24 to 32, for instance, and the number of write-combining buffers used to track stores is up from six to eight. These changes should allow multiple threads to execute better simultaneously. Also, Prescott includes measures to reduce L1 cache contention between its two logical processors.
- Lots of microarchitectural tweaks Here’s where the bullet point thing breaks down. There are too many important little tweaks to list them all under their own headings.
For instance, Prescott’s branch prediction unit has been improved to avoid branch mispredictions, which will be more costly than ever with Prescott’s long pipeline. One of the enhancements is the addition of an indirect branch predictor, borrowed from the work of the Pentium M team.
Another key change is a new shifter/rotator block added to one of the chip’s simple arithmetic logic units, or ALUs. You will recall that the Pentium 4’s simple ALUs run at twice the speed of the rest of the chip; that’s still true for Prescott, and now one of the ALUs can handle shift and rotate operations. Also, Prescott now does integer multiplication in a dedicated integer multiplier instead of using the floating-point multiplier, as previous Netburst chips did.
There are also store-to-load forwarding enhancements, improvements to SSE/2/3 and x87 multimedia performance, and more.
All told, Prescott is a rather different animal from the Northwood and Willamette chips that precede it and share the Pentium 4 name. These changes will affect performance in ways that are difficult to predict. Instruction latencies will be higher, except where they’re lower. The same is true for performance in general, and that’s why we run the benchmarks.
Prescott pullin’ the juice
There has been some concern, leading up to Prescott’s launch, about how much power the chip will consume and how much heat it will produce. The key spec Intel provides in this realm is TDP, or Thermal Design Power. TDP is not, however, a peak power load number; it is a thermal design guideline. As Intel puts it, “The TDP is not the maximum power that the processor can dissipate.” So we have something to go on there, but perhaps not much.
Northwood’s TDP at 3.2GHz is 82W, while the Extreme Edition’s is about 92W. Prescott’s TDP at 3.2GHz is 103W. So yeah, this thing pulls some juice and generates some heat.
To manage Prescott’s thermal prowess, Intel has created a new specification for thermals that allows for finer-grained control of fan speeds based on a value returned from the CPU. This value is set “based on the power dissipation of each unit,” according to Intel, and combined with the thermal diode temp, will dictate safe fan speeds for coolers. Implementing this scheme will require motherboard changes, but not changes to the actual cooler designs. In fact, Intel-approved coolers for current Pentium 4s should work for Prescott at its initial speed grades.
Intel is also pushing a verification program for ATX cases, trying to ensure enclosures have proper venting and the like. Clearly, Intel is squeezing all it can from ATX while waiting for the new BTX form factor to arrive in force.
So the hundred dollar question is: will Prescott work with my motherboard? The answer is, as with so many things in life, it depends. These first Prescott chips drop into 478-pin sockets, just like Northwoods. Newer motherboards from top vendors have probably been ready for Prescott for some time, but they will have to provide adequate power for Prescott, and not all older motherboards can. So Intel’s answer is, “Check with your motherboard manufacturer.” We checked with Abit about our IC7-G test platform, and they were able to provide us with a Prescott-ready BIOS. Once we flashed to it, the Prescott ran like a champ on our board. Depending on your motherboard’s age and power design, your mileage may vary.
Regular readers may recall our recent review of the Athlon 64 3000+, which included benchmarks for lots of different processors at various speed grades, including some very fresh results for a number of new chips. We threw those out for this review, and started over with a clean slate. This time out, we have new drivers, new BIOSes, and new revisions for many of our test applications. Also, we’re now using ATI Radeon 9800 Pro graphics cards in our test systems. As a result, the benchmark scores from the previous reviews will not be directly comparable to our results here. Not to worry, though: we’ve tested plenty of speed grades and CPU types.
We tested all the Pentium 4 chips with Hyper-Threading enabled. To make things even more interesting, we tested the Prescott and Northwood Pentium 4s with Hyper-Threading turned off, to better understand the relative benefits of Prescott’s improved Hyper-Threading implementation.
Also, in order to obtain the results for a Northwood Pentium 4 running at 3.4GHz, we used the handy-dandy BIOS option on our Abit IC7-G motherboard to disable the L3 cache on our 3.4GHz Extreme Edition processor. The Extreme Edition, of course, is just a Northwood with a 2MB L3 cache. By all appearances, with its L3 cache disabled, the chip performs exactly as one would expect a Pentium 4 3.4GHz chip to perform.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test systems were configured like so:
|Processor||Athlon XP ‘Barton’ 3200+ 2.2GHz||Athlon XP ‘Barton’ 3000+ 2.167GHz||AMD Athlon 64 3000+ 2.0GHz
AMD Athlon 64 3200+ 2.0GHz
AMD Athlon 64 3400+ 2.2GHz
|AMD Athlon 64 FX-51 2.2GHz|| Pentium 4 2.8’C’GHz
Pentium 4 3.2GHz
Pentium 4 3.2GHz Extreme Edition
Pentium 4 3.4GHz Extreme Edition
Pentium 4 2.8’E’GHz
Pentium 4 3.0’E’GHz
Pentium 4 3.2’E’GHz
|Front-side bus||400MHz (200MHz DDR)||333MHz (166MHz DDR)||HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
|HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
|800MHz (200MHz quad-pumped)|
|Motherboard||Asus A7N8X Deluxe v2.0||Asus A7N8X Deluxe v2.0||MSI K8T Neo||MSI 9130||Abit IC7-G|
|North bridge||nForce2 SPP||nForce2 SPP||K8T800||K8T800||82875P MCH|
|South bridge||nForce2 MCP-T||nForce2 MCP-T||VT8237||VT8237||82801ER ICH5R|
|Chipset drivers||ForceWare 3.13||ForceWare 3.13||4-in-1 v.4.51
|INF Update 5.1.1002|
|Memory size||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)|
|Memory type||Corsair TwinX XMS4000 DDR SDRAM at 400MHz||Corsair TwinX XMS4000 DDR SDRAM at 333MHz||Corsair TwinX XMS4000 DDR SDRAM at 400MHz||Corsair CMX512RE-3200LL PC3200 registered DDR SDRAM at 400MHz||Corsair TwinX XMS4000 DDR SDRAM at 400MHz|
|Hard drive||Seagate Barracuda V 120GB ATA/100||Seagate Barracuda V 120GB ATA/100||Seagate Barracuda V 120GB SATA 150||Seagate Barracuda V 120GB SATA 150||Seagate Barracuda V 120GB SATA 150|
|Audio||Creative SoundBlaster Live!|
|Graphics||Radeon 9800 Pro 256MB with CATALYST 4.1 drivers|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 1, DirectX 9.0b|
All tests on the Pentium 4 systems were run with Hyper-Threading enabled, except where otherwise noted.
Thanks to Corsair for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- Cachemem 2.65MMX
- SiSoft Sandra 2004 (9.89)
- Compiled binary of C Linpack port from Ace’s Hardware
- Discreet 3ds max 5.1 SP1
- NewTek Lightwave 7.5
- Cinebench 2003
- POV-Ray for Windows v3.5
- PICCOLOR v4.0 build 472
- SPECviewperf 7.1.1
- ScienceMark 2.0 beta (23SEP03 build)
- Sphinx 3.3
- LAME 3.95.1 (build from mitiok.cjb.net)
- Xmpeg 5.0.3 with DivX Video 5.11
- FutureMark 3DMark03 build 340
- Comanche 4 demo
- Quake III Arena v1.31
- Serious Sam SE v1.07
- Splinter Cell v1.2
- Unreal Tournament 2003 demo v.2206
- Wolfenstein: Enemy Territory v2.55
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Our synthetic memory tests should enlighten us about the effects of Prescott’s larger caches and improved prefetch mechanisms, so we’ll kick things off with them, as usual.
Better prefetching seems to give the Pentium 4 ‘E’ chipsthat is, the Prescottsa slight edge over the Northwoods in Sandra’s bandwidth test, which is already very aggressive about using buffering and the like to achieve the fastest possible results. The Athlon 64 FX-51, with its built-in dual-channel memory controller, still leads the pack in Sandra.
Cachemem’s bandwidth test is less aggressive, and Prescott’s improved prefetch algorithm appears to make a much bigger difference here. The Pentium 4 ‘E’ chips romp in this test.
Linpack shows us just what we might expect from Prescott’s larger L2 cache. You can see how the Northwood’s performance drops off around about 512K, but Prescott continues unabated into larger matrix sizes. Prescott also performs very well on the far right side of the graph, when we’re well into main memory. Again, better hardware prefetch is the likely reason for this improvement.
However, the real shocker here, at least for me, is the Prescott’s relatively low peak throughput in terms of MFLOPS. The Prescott peaks well below the Northwood at the same clock speed. That’s likely due to lower floating-point math performance produced by Prescott’s longer pipeline.
One other thing to note: the crazy insane light blue line towering above all the rest is the Pentium 4 Extreme Edition 3.4GHz. That puppy, with 2MB L3 cache, just abuses our Linpack test. Just thought I should mention that.
Prescott demonstrates slightly higher memory access latencies, which might be the result of its slightly higher L2 cache latencies. Northwood chips are generally a few ticks faster than the Prescott at our chosen sample size. Prescott’s improved prefetch seems to mask this latency in our bandwidth tests, though.
And, of course, the Athlon 64 chips with their built-in memory controllers are easily quickest overall here.
Not only are our 3D graphs indulgent, but they’re useful, too. I’ve arranged them manually in a very rough order from worst to best, for what it’s worth. I’ve also colored the data series according to how they correspond to different parts of the memory subsystem. Yellow is L1 cache, light orange is L2 cache, and orange is main memory. The red series on the Extreme Edition graph represents L3 cache. Of course, caches sometimes overlap, so the colors are just an interesting visual guide.
If you look at the row for the 1024KB block size, you can see the effects of Prescott’s larger L2 cache. So… there you have it. Let’s see how all of this affects performance in real applications.
Unreal Tournament 2003
The Pentium 4 ‘E’ processors run a few frames per second slower than the P4 Northwood in UT2003, giving us our first taste of Prescott’s real-world performance. Then again, in this game, the Athlon 64 simply dominates.
Quake III Arena
The P4 ‘E’ processors find friendly territory in Quake III Arena. In this game, they can outperform the older P4s on a clock-for-clock basis. The Athlon 64s, though, are fastest once more.
Wolfenstein: Enemy Territory
The Prescott essentially ties Northwood in Wolf: ET.
Tom Clancy’s Splinter Cell
Splinter Cell is a new addition to our CPU test suite, and it seems to run best on Athlon 64 processors. The P4 ‘E’ chips, though, beat out the Northwood P4s once again.
The Pentium 4 Extreme Edition comes out looking good here with a win over the Athlon 64 FX-51. However, the Prescott P4s are bunched up near the bottom of the pack, along with the aging Athlon XPs.
Serious Sam SE
It’s a similar story in Serious Sam, where the Prescotts give up about 5 frames per second to the Northwoodsnot much to write home about, but then the Athlon 64s are considerably faster than both.
Prescott handles itself well in 3DMark03, coming in with a faster score overall than both the corresponding P4 Northwood and the Athlon 64 with an equivalent model number. The individual CPU tests are mixed.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine that needs the latest computer hardware to run at speeds close to real-time processing. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.
There are two goals with Sphinx. The first is to run it faster than real time, so real-time speech recognition is possible. The second, more ambitious goal is to run it at about 0.8 times real time, where additional CPU overhead is available for other sorts of processing, enabling Sphinx-driven real-time applications.
We have a new winner in the Sphinx sweeps, as the Pentium 4 ‘E’ models nearly break the 50% barrier. Notice, also, how different Prescott and Northwood are. The Northwood has always been faster running the Sphinx binary produced by the Microsoft compiler, but the Prescott is faster with the Intel compiler’s executable.
LAME MP3 encoding
We used LAME to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:
lame –alt-preset extreme file.wav file.mp3
Prescott’s a little bit slower in LAME than Northwood. In fact, Prescott at 3.2GHz is just a little faster than Northwood at 2.8GHz.
DivX video encoding
This new version of XMPEG includes a benchmark feature, so we’re reporting scores in frames per second now. This is the first app we’ve look at so far that makes good use of Hyper-Threading, so keep an eye on the HT and non-HT results.
Prescott looks very good here, easily outperforming the other Pentium 4 chips at the same clock speed, and laying waste to the Athlon 64. Both the P4 3.2 and the 3.2E are helped by Hyper-Threading.
3ds max rendering
We begin our 3D rendering tests with Discreet’s 3ds max, one of the best known 3D animation tools around. 3ds max is both multithreaded and optimized for SSE2. We rendered a couple of different scenes at 1024×751 resolution, including the Island scene shown below. Our testing techniques were very similar to those described in this article by Greg Hess. In all cases, the “Enable SSE” box was checked in the application’s render dialog.
The Pentium 4 ‘E’ is at an ever-so-slight disadvantage in 3ds max, but I wouldn’t consider the performance difference significant.
NewTek’s Lightwave is another popular 3D animation package that includes support for multiple processors and is highly optimized for SSE2. Lightwave can render very complex scenes with realism, as you can see from the sample scene, “A5 Concept,” below.
To test the effects of Hyper-Threading, we’ve tested the Hyper-Threaded processors with one, two, and four rendering threads. For non-Hyper-Threaded processors, we just tested with one and two threads.
Don’t be fooled by the sort orders of the graphs. We had to sort by something, so we used the “1 thread” results. However, you can see Prescott’s improved Hyper-Threading at work here. As the number of threads goes up, the Pentium 4 ‘E’ scores go down. The opposite is true for the Northwood P4s. As a result, once we reach four threads, the Pentium 4 ‘E’ 3.2GHz gives its best performance45.5 seconds for rendering the “Radiosity_ReflectiveThings” scene. Northwood’s best time at 3.2GHz, by contrast, is 47.4 seconds. Prescott is very competitive, but it’s also very different from its predecessor.
POV-Ray is the granddaddy of PC ray-tracing renderers, and it’s not multithreaded in the least. Don’t ask me whyseems crazy to me. POV-Ray also relies more heavily on x87 FPU instructions to do its work, because it contains only minor SIMD optimizations.
With almost entirely x87 math and no help from multithreading, the Prescott has a rough time in our POV-Ray scene.
Cinebench 2003 rendering and shading
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration. Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading, as you can see in the results.
The Pentium 4 ‘E’ at 3.2GHz performs on par with a Northwood Pentium 4 at 2.8GHz in Cinebench’s rendering test.
Prescott takes two of the remaining three tests from Northwood, although the Athlon 64 is decidedly on top overall.
SPECviewperf workstation graphics
SPECviewperf simulates the graphics loads generated by various professional design, modeling, and engineering applications.
Prescott comes out looking great in SPECviewperf, beating the Northwood clock for clock with some consistency.
I’d like to thank Alex Goodrich for his help working through a few bugs the 2.0 beta version of ScienceMark. Thanks to his diligent work, I was able to complete testing with this impressive new benchmark, which is optimized for SSE, SSE2, 3DNow! and is multithreaded, as well. Unfortunately, we don’t yet have a version of ScienceMark capable of taking advantage of SSE3’s new complex arithmetic instructions.
In the interest of full disclosure, I should mention that Tim Wilkens, one of the originators of ScienceMark, now works at AMD. However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website. We are sufficiently satisfied with his efforts, and impressed with the enhancements to the 2.0 beta revision of the application, to continue using ScienceMark in our testing.
The molecular dynamics simulation models “the thermodynamic behaviour of materials using their forces, velocities, and positions”, according to the ScienceMark documentation.
Primordia “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” In our case, we used the default element, Argon.
Prescott struggles in all three of the above tests without the assistance of SSE3, running, in the case of Primordia, nearly a full 60 seconds behind Northwood.
These last two tests, SGEMM and DGEMM, measure matrix math performance using several different codepaths optimized with several instruction set extensions, including SSE, SSE2, and 3DNow!
Prescott does especially well in DGEMM, showing off exceptional peak performance with properly vectorized data and SSE2.
picCOLOR image analysis
We thank Dr. Reinert Muller with the FIBUS Institute for pointing us toward his picCOLOR benchmark. This image analysis and processing tool is partially multithreaded, and it shows us the results of a number of simple image manipulation calculations. The overall score is indexed to a Pentium III 1GHz system based on a VIA Apollo Pro 133. In other words, the reference system would score a 1.0 overall.
The new version of picCOLOR we’re using today is optimized for SSE and SSE2, so it should perform differently than past revisions.
The Pentium 4 ‘E’ is fastest overall. In fact, despite a cache and clock speed deficit, the Prescott at 3GHz beats out the Pentium 4 Extreme Edition at 3.4GHz. Here are the individual test scores for some of our key participants.
The P4 ‘E’ 3.2GHz pulls out the overall win by getting the top scores in both fixed and float interpolation and putting in a strong overall effort. I have to wonder if some of the new bits inside Prescott, such as the shift/rotate unit, haven’t contributed to its strong performance here. Notice how much stronger the Prescott is in the Fixed Interpolation test. It’s much faster than the Northwood with or without Hyper-Threading, and Prescott gains performance with HT enabled, unlike Northwood.
Our benchmark results for the new Prescott-based Pentium 4 ‘E’ processors are the very definition of mixed. In some cases, Prescott looks very good, but in others, it’s slower than current Pentium 4 chips at the same clock speed. The larger caches and architectural tweaks have helped immensely in offsetting Prescott’s super-long 31-stage pipeline, but they haven’t entirely made up the gap. On balance, Prescotts are slower than Northwoods. I expect Prescott P4s will look relatively stronger over time as SSE3 instructions are adopted and, especially, as clock speeds ramp up.
Obviously, Intel is aiming for the future with Prescott, and that future includes nosebleed-inducing clock speeds, BTX cases with much-improved cooling solutions, fancy new CPU sockets, PCI Express, and more multithreaded applications. In the context of all of these changes, Intel’s modifications to Prescott make lots of sense. However, the future isn’t here yet, and the Pentium 4 ‘E’ chips are now consumer products. As consumer products, they’re not a great proposition. They’re slower overall than previous Pentium 4s, they run hotter, and they draw more power. I doubt there’s enough of a performance difference between Northwood and Prescott to matter to most folks, but Northwood is probably a better overall choice right now.
That’s especially true given the pricing, which follows a very simple formula. At a given clock speed, the Pentium 4 and Pentium 4 ‘E’ will cost the same. Also, AMD has priced the equivalent Athlon 64 models at parity with the Pentium 4. So, for instance, a Pentium 4 3.2GHz, a P4 3.2’E’GHz, and an Athlon 64 3200+ all list for $278 at present. The 3.4GHz/3400+ variants are $417. All other things being equal, I’d pick the Athlon 64 any day of the weekunless I were into video editing, media encoding, image processing, Lightwave rendering…
All depends on what you want to do.
If what you want to do is throw a whole truckload of cash Intel’s way to have the most impressive processor possible, you may want to consider the Pentium 4 Extreme Edition 3.4GHz. It will run you $999, or $799 plus your first-born child. The P4 Extreme Edition 3.4GHz may well be the fastest x86 processor on the planet right now, although the title belt isn’t unified. The Athlon 64 FX-51 shares a piece of the championship, especially when it comes to 3D gaming. And the A64 FX-51 is a virtual steal at only $733. But, then, you weren’t worrying about money, now, were you?