Intel’s Pentium 4 Prescott processor

INTEL IS LAUNCHING A BEVY of new chips today, including new speed grades of the current Pentium 4 and Pentium 4 Extreme Edition processors clocked at 3.4GHz. The biggest news, though, is the new processor core, code-named Prescott. Prescott isn’t just a die shrink, though it is that. Prescott is also a major reworking of the Pentium 4’s microarchitecture—major enough that I’m surprised Intel didn’t opt to call this processor the Pentium 5.

Prescott clock speeds will initially range from 2.8GHz to 3.4GHz. To keep Prescott-based P4s distinct from older “Northwood” cores, Intel is tacking an “E” on to the product names, so they’ll be called the Pentium 4 2.8E or 3.2E. The product mix gets most confusing at 2.8GHz, where one could buy four different Pentium 4s: the 2.8GHz (a Northwood core with a 533MHz front-side bus), the 2.8C (Northwood again, but with an 800MHz bus), the 2.8A (Prescott with a 533MHz bus), or the 2.8E (Prescott with 800MHz bus). Clear as mud?


We tested, well, lots of chips against Prescott and the new P4 3.4GHz processors

Anatomy of a die shrink


Anatomy of Intel’s 90nm process tech (Source: Intel)

Let’s start with a look at the gory details of the die shrink. With Prescott, Intel is moving the Pentium 4 from a 130nm (or 0.13 micron) fabrication process to a 90nm process. As always, such a transition brings immediate benefits in the form of smaller die sizes and, usually, higher potential clock speeds. The conversion to 90nm is far from trivial, though, and Intel has enhanced its manufacturing process in a number of ways in order to facilitate the change.

One of the most notable changes is the use of a strained silicon substrate. When stretched slightly, the lattice structure of silicon atoms spreads out and opens up, allowing for freer flow of electrons. This lower resistance, in turn, allows for smaller gate lengths and faster transistors. Intel claims here that its new process only adds two percent to manufacturing costs, which is remarkable given the use of strained silicon.

Intel’s 90nm process replaces the fluorine-doped silicon oxide dielectric film used previously with an even lower capacitance carbon-doped oxide film. This process also employs a layer of nickel silicide, essentially as caps on the transistors, to lower resistance versus the cobalt silicide used in Intel’s 130nm process. The result of these changes is gate lengths as small as 50nm. SRAM cells are down from 2 square microns to 1.15.

Not only is the 90nm process smaller, but Intel is also manufacturing Prescott using seven layers of copper interconnects, instead of the six used at 130nm. All told, the changes shrink the Pentium 4’s die size to 122 mm2, from 145 mm2 for Northwood—this despite the fact Prescott’s transistor count is 125 million, over twice Northwood’s 55 million transistors.


The Prescott die (Source: Intel)
 

The Pentium 4’s new look
Wondering how Prescott got to have so many more transistors? The answer is that Prescott is a serious overhaul of the Netburst microarchitecture all Pentium 4s share. In fact, Prescott is arguably a more major revamp than the P6 core got during its long tenure at the heart of the Pentium Pro, Pentium II, and Pentium III processors. There are too many changes to cover in depth here, but I will attempt to summarize them and talk about the most significant modifications of the chip’s design.

The watchwords for the Prescott changes are “higher clock frequencies.” Virtually all the modifications to the Prescott core are intended to produce high performance while allowing the chip to run at clock speeds of 4GHz and beyond. Many of the radical elements of the original Netburst design are present here in even more radical form, including the deep main pipeline, execution trace cache, and ample amounts of speculative logic and prefetching. Most of these changes represent tradeoffs of various types, between, say, higher clock speeds and higher clock-for-clock performance, or, in many cases, between higher latencies and better peak performance. Generally, Prescott has been tuned for higher clock frequencies, and the choices Intel’s design team has made reflect that emphasis.

With that said, we’ll let the bullets start flying on our summary of Prescott’s new features.

  • A much longer pipeline — Probably the biggest news of the day is that fact that Netburst’s main branch prediction/recovery pipeline has been lengthened from a healthy 20 stages in its previous incarnation to 31 stages in Prescott. To give you a point of reference, that’s longer than the Alaskan oil pipeline. Pipelines of around 10 stages are much more common. AMD’s Hammer core in the Athlon 64 and Opteron processors is 12 stages.

    By making each stage of the pipeline less complex, Intel increases the processor’s tolerance for running at higher clock speeds. In doing so, though, Intel’s engineers have chosen to reduce clock-for-clock performance. This change, by itself, would significantly lower the number of instructions per clock (IPC) the Pentium 4 can execute. Higher clock speeds can offset a lower IPC, but Prescott starts out at only 3.4GHz, and Northwood runs at that speed, too.

    Fortunately, there are a number of countervailing forces to take into account. For one thing, instruction latencies vary; not all instructions use all stages of the pipeline. More importantly, Prescott includes a whole raft of enhancements aimed at increasing its clock-for-clock performance—some in very specific ways. That’s what the rest of these bullet points are about.

    Before we move on, I should point out once more that taken in context, a lower IPC isn’t necessarily a bad thing. Higher or lower IPCs in processor design are tradeoffs, and need not evoke a value judgment. What is true of the Pentium 4, and of Prescott more so than prior revisions, is that Intel has chosen to go full-bore the way of lower IPC and higher clock speeds. This “speed demon” approach to processor design seems to fit reasonably well with Intel’s technological prowess in chip fabrication.

  • A larger L2 cache — The main contributor to Prescott’s massive transistor count is its new 1MB L2 cache. We’ve seen larger caches help performance many times before, the most dramatic recent example being the Pentium 4 Extreme Edition processors with 2MB of L3 cache onboard. The Extreme Edition is a screamer as a result of this massive cache. Prescott’s larger L2 cache necessarily has higher latencies, so going to a larger cache has its drawbacks. Still, in a chip designed to run so much faster than main memory, the larger on-chip cache makes sense.
  • A larger L1 data cache — Northwood’s L1 data cache was 8K and 4-way associative. Prescott’s is 16K and 8-way associative, so Prescott’s L1 cache should have a higher hit rate and, thus, be more effective.

    Like previous Netburst processors, Prescott’s L1 instruction cache is an unconventional execution trace cache that holds decoded micro-ops for the processor’s RISC-like core instead of CISC-style x86 instructions. Prescott’s execution trace cache still holds roughly 12,000 micro-ops, but the chip can now encode more types of micro-ops into the trace cache, making it more efficient.

  • SSE3 instructions — Intel has endowed the Prescott core with 13 new instructions now known as SSE3. Like previous SSE revisions, these extended instructions are intended to accelerate certain types of computational tasks. Five new instructions for complex arithmetic allow for better handling of tasks like Fast Fourier Transforms; these instructions should enhance the Pentium 4’s potential in scientific and distributed computing scenarios. Another four new instructions should make the Pentium 4 a better vertex shader for graphics applications by allowing manipulation of data organized as an array of structures, as is common in graphics vertex databases. A pair of new instructions enhances thread synchronization in Hyper-Threading, allowing an unoccupied logical processor to enter a dormant state in order to release resources for the other logical processor, to consume less power, or both. The remaining instructions should improve video encoding and x87-to-integer data conversions.

    Of course, programs must be rewritten or recompiled to take advantage of SSE3 instructions, so we won’t see SSE3’s benefits immediately.

  • Better prefetching — Intel has improved Prescott’s hardware and software prefetch abilities, so it can anticipate what data will be needed next and fetch them into its L2 cache. Most importantly, the hardware prefetching algorithm, which requires no special code, should be smarter about what to grab and when to grab it.
  • Enhanced Hyper-Threading — Intel’s engineers have modified Prescott in various ways to make Hyper-Threading better. Shared resourced have been expanded and more types of operations can be conducted in parallel. The number of store instructions in flight is up from 24 to 32, for instance, and the number of write-combining buffers used to track stores is up from six to eight. These changes should allow multiple threads to execute better simultaneously. Also, Prescott includes measures to reduce L1 cache contention between its two logical processors.
  • Lots of microarchitectural tweaks — Here’s where the bullet point thing breaks down. There are too many important little tweaks to list them all under their own headings.

    For instance, Prescott’s branch prediction unit has been improved to avoid branch mispredictions, which will be more costly than ever with Prescott’s long pipeline. One of the enhancements is the addition of an indirect branch predictor, borrowed from the work of the Pentium M team.

    Another key change is a new shifter/rotator block added to one of the chip’s simple arithmetic logic units, or ALUs. You will recall that the Pentium 4’s simple ALUs run at twice the speed of the rest of the chip; that’s still true for Prescott, and now one of the ALUs can handle shift and rotate operations. Also, Prescott now does integer multiplication in a dedicated integer multiplier instead of using the floating-point multiplier, as previous Netburst chips did.

    There are also store-to-load forwarding enhancements, improvements to SSE/2/3 and x87 multimedia performance, and more.

All told, Prescott is a rather different animal from the Northwood and Willamette chips that precede it and share the Pentium 4 name. These changes will affect performance in ways that are difficult to predict. Instruction latencies will be higher, except where they’re lower. The same is true for performance in general, and that’s why we run the benchmarks.

Prescott pullin’ the juice
There has been some concern, leading up to Prescott’s launch, about how much power the chip will consume and how much heat it will produce. The key spec Intel provides in this realm is TDP, or Thermal Design Power. TDP is not, however, a peak power load number; it is a thermal design guideline. As Intel puts it, “The TDP is not the maximum power that the processor can dissipate.” So we have something to go on there, but perhaps not much.

Northwood’s TDP at 3.2GHz is 82W, while the Extreme Edition’s is about 92W. Prescott’s TDP at 3.2GHz is 103W. So yeah, this thing pulls some juice and generates some heat.

To manage Prescott’s thermal prowess, Intel has created a new specification for thermals that allows for finer-grained control of fan speeds based on a value returned from the CPU. This value is set “based on the power dissipation of each unit,” according to Intel, and combined with the thermal diode temp, will dictate safe fan speeds for coolers. Implementing this scheme will require motherboard changes, but not changes to the actual cooler designs. In fact, Intel-approved coolers for current Pentium 4s should work for Prescott at its initial speed grades.

Intel is also pushing a verification program for ATX cases, trying to ensure enclosures have proper venting and the like. Clearly, Intel is squeezing all it can from ATX while waiting for the new BTX form factor to arrive in force.

So the hundred dollar question is: will Prescott work with my motherboard? The answer is, as with so many things in life, it depends. These first Prescott chips drop into 478-pin sockets, just like Northwoods. Newer motherboards from top vendors have probably been ready for Prescott for some time, but they will have to provide adequate power for Prescott, and not all older motherboards can. So Intel’s answer is, “Check with your motherboard manufacturer.” We checked with Abit about our IC7-G test platform, and they were able to provide us with a Prescott-ready BIOS. Once we flashed to it, the Prescott ran like a champ on our board. Depending on your motherboard’s age and power design, your mileage may vary.

 

Test notes
Regular readers may recall our recent review of the Athlon 64 3000+, which included benchmarks for lots of different processors at various speed grades, including some very fresh results for a number of new chips. We threw those out for this review, and started over with a clean slate. This time out, we have new drivers, new BIOSes, and new revisions for many of our test applications. Also, we’re now using ATI Radeon 9800 Pro graphics cards in our test systems. As a result, the benchmark scores from the previous reviews will not be directly comparable to our results here. Not to worry, though: we’ve tested plenty of speed grades and CPU types.

We tested all the Pentium 4 chips with Hyper-Threading enabled. To make things even more interesting, we tested the Prescott and Northwood Pentium 4s with Hyper-Threading turned off, to better understand the relative benefits of Prescott’s improved Hyper-Threading implementation.

Also, in order to obtain the results for a Northwood Pentium 4 running at 3.4GHz, we used the handy-dandy BIOS option on our Abit IC7-G motherboard to disable the L3 cache on our 3.4GHz Extreme Edition processor. The Extreme Edition, of course, is just a Northwood with a 2MB L3 cache. By all appearances, with its L3 cache disabled, the chip performs exactly as one would expect a Pentium 4 3.4GHz chip to perform.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test systems were configured like so:

Processor Athlon XP ‘Barton’ 3200+ 2.2GHz Athlon XP ‘Barton’ 3000+ 2.167GHz AMD Athlon 64 3000+ 2.0GHz
AMD Athlon 64 3200+ 2.0GHz
AMD Athlon 64 3400+ 2.2GHz
AMD Athlon 64 FX-51 2.2GHz Pentium 4 2.8’C’GHz
Pentium 4 3.2GHz
Pentium 4 3.2GHz Extreme Edition
Pentium 4 3.4GHz Extreme Edition
Pentium 4 2.8’E’GHz
Pentium 4 3.0’E’GHz
Pentium 4 3.2’E’GHz
Front-side bus 400MHz (200MHz DDR) 333MHz (166MHz DDR) HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
800MHz (200MHz quad-pumped)
Motherboard Asus A7N8X Deluxe v2.0 Asus A7N8X Deluxe v2.0 MSI K8T Neo MSI 9130 Abit IC7-G
BIOS revision C1007 C1007 1.1 1.1 IC7_21.B00
North bridge nForce2 SPP nForce2 SPP K8T800 K8T800 82875P MCH
South bridge nForce2 MCP-T nForce2 MCP-T VT8237 VT8237 82801ER ICH5R
Chipset drivers ForceWare 3.13 ForceWare 3.13 4-in-1 v.4.51
ATA 5.1.2600.220
4-in-1 v.4.51
ATA 5.1.2600.220
INF Update 5.1.1002
Memory size 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs)
Memory type Corsair TwinX XMS4000 DDR SDRAM at 400MHz Corsair TwinX XMS4000 DDR SDRAM at 333MHz Corsair TwinX XMS4000 DDR SDRAM at 400MHz Corsair CMX512RE-3200LL PC3200 registered DDR SDRAM at 400MHz Corsair TwinX XMS4000 DDR SDRAM at 400MHz
Hard drive Seagate Barracuda V 120GB ATA/100 Seagate Barracuda V 120GB ATA/100 Seagate Barracuda V 120GB SATA 150 Seagate Barracuda V 120GB SATA 150 Seagate Barracuda V 120GB SATA 150
Audio Creative SoundBlaster Live!
Graphics Radeon 9800 Pro 256MB with CATALYST 4.1 drivers
OS Microsoft Windows XP Professional
OS updates Service Pack 1, DirectX 9.0b

All tests on the Pentium 4 systems were run with Hyper-Threading enabled, except where otherwise noted.

Thanks to Corsair for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering.

The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 
Memory performance
Our synthetic memory tests should enlighten us about the effects of Prescott’s larger caches and improved prefetch mechanisms, so we’ll kick things off with them, as usual.

Better prefetching seems to give the Pentium 4 ‘E’ chips—that is, the Prescotts—a slight edge over the Northwoods in Sandra’s bandwidth test, which is already very aggressive about using buffering and the like to achieve the fastest possible results. The Athlon 64 FX-51, with its built-in dual-channel memory controller, still leads the pack in Sandra.

Cachemem’s bandwidth test is less aggressive, and Prescott’s improved prefetch algorithm appears to make a much bigger difference here. The Pentium 4 ‘E’ chips romp in this test.

Linpack shows us just what we might expect from Prescott’s larger L2 cache. You can see how the Northwood’s performance drops off around about 512K, but Prescott continues unabated into larger matrix sizes. Prescott also performs very well on the far right side of the graph, when we’re well into main memory. Again, better hardware prefetch is the likely reason for this improvement.

However, the real shocker here, at least for me, is the Prescott’s relatively low peak throughput in terms of MFLOPS. The Prescott peaks well below the Northwood at the same clock speed. That’s likely due to lower floating-point math performance produced by Prescott’s longer pipeline.

One other thing to note: the crazy insane light blue line towering above all the rest is the Pentium 4 Extreme Edition 3.4GHz. That puppy, with 2MB L3 cache, just abuses our Linpack test. Just thought I should mention that.

Prescott demonstrates slightly higher memory access latencies, which might be the result of its slightly higher L2 cache latencies. Northwood chips are generally a few ticks faster than the Prescott at our chosen sample size. Prescott’s improved prefetch seems to mask this latency in our bandwidth tests, though.

And, of course, the Athlon 64 chips with their built-in memory controllers are easily quickest overall here.

 
Memory performance (continued)
Not only are our 3D graphs indulgent, but they’re useful, too. I’ve arranged them manually in a very rough order from worst to best, for what it’s worth. I’ve also colored the data series according to how they correspond to different parts of the memory subsystem. Yellow is L1 cache, light orange is L2 cache, and orange is main memory. The red series on the Extreme Edition graph represents L3 cache. Of course, caches sometimes overlap, so the colors are just an interesting visual guide.

If you look at the row for the 1024KB block size, you can see the effects of Prescott’s larger L2 cache. So… there you have it. Let’s see how all of this affects performance in real applications.

 

Unreal Tournament 2003

The Pentium 4 ‘E’ processors run a few frames per second slower than the P4 Northwood in UT2003, giving us our first taste of Prescott’s real-world performance. Then again, in this game, the Athlon 64 simply dominates.

Quake III Arena

The P4 ‘E’ processors find friendly territory in Quake III Arena. In this game, they can outperform the older P4s on a clock-for-clock basis. The Athlon 64s, though, are fastest once more.

Wolfenstein: Enemy Territory

The Prescott essentially ties Northwood in Wolf: ET.

Tom Clancy’s Splinter Cell

Splinter Cell is a new addition to our CPU test suite, and it seems to run best on Athlon 64 processors. The P4 ‘E’ chips, though, beat out the Northwood P4s once again.

 

Comanche 4

The Pentium 4 Extreme Edition comes out looking good here with a win over the Athlon 64 FX-51. However, the Prescott P4s are bunched up near the bottom of the pack, along with the aging Athlon XPs.

Serious Sam SE

It’s a similar story in Serious Sam, where the Prescotts give up about 5 frames per second to the Northwoods—not much to write home about, but then the Athlon 64s are considerably faster than both.

3DMark03

Prescott handles itself well in 3DMark03, coming in with a faster score overall than both the corresponding P4 Northwood and the Athlon 64 with an equivalent model number. The individual CPU tests are mixed.

 

Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine that needs the latest computer hardware to run at speeds close to real-time processing. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.

There are two goals with Sphinx. The first is to run it faster than real time, so real-time speech recognition is possible. The second, more ambitious goal is to run it at about 0.8 times real time, where additional CPU overhead is available for other sorts of processing, enabling Sphinx-driven real-time applications.

We have a new winner in the Sphinx sweeps, as the Pentium 4 ‘E’ models nearly break the 50% barrier. Notice, also, how different Prescott and Northwood are. The Northwood has always been faster running the Sphinx binary produced by the Microsoft compiler, but the Prescott is faster with the Intel compiler’s executable.

LAME MP3 encoding
We used LAME to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:

lame –alt-preset extreme file.wav file.mp3

Prescott’s a little bit slower in LAME than Northwood. In fact, Prescott at 3.2GHz is just a little faster than Northwood at 2.8GHz.

DivX video encoding
This new version of XMPEG includes a benchmark feature, so we’re reporting scores in frames per second now. This is the first app we’ve look at so far that makes good use of Hyper-Threading, so keep an eye on the HT and non-HT results.

Prescott looks very good here, easily outperforming the other Pentium 4 chips at the same clock speed, and laying waste to the Athlon 64. Both the P4 3.2 and the 3.2E are helped by Hyper-Threading.

 

3ds max rendering
We begin our 3D rendering tests with Discreet’s 3ds max, one of the best known 3D animation tools around. 3ds max is both multithreaded and optimized for SSE2. We rendered a couple of different scenes at 1024×751 resolution, including the Island scene shown below. Our testing techniques were very similar to those described in this article by Greg Hess. In all cases, the “Enable SSE” box was checked in the application’s render dialog.

The Pentium 4 ‘E’ is at an ever-so-slight disadvantage in 3ds max, but I wouldn’t consider the performance difference significant.

 

Lightwave rendering
NewTek’s Lightwave is another popular 3D animation package that includes support for multiple processors and is highly optimized for SSE2. Lightwave can render very complex scenes with realism, as you can see from the sample scene, “A5 Concept,” below.

To test the effects of Hyper-Threading, we’ve tested the Hyper-Threaded processors with one, two, and four rendering threads. For non-Hyper-Threaded processors, we just tested with one and two threads.

Don’t be fooled by the sort orders of the graphs. We had to sort by something, so we used the “1 thread” results. However, you can see Prescott’s improved Hyper-Threading at work here. As the number of threads goes up, the Pentium 4 ‘E’ scores go down. The opposite is true for the Northwood P4s. As a result, once we reach four threads, the Pentium 4 ‘E’ 3.2GHz gives its best performance—45.5 seconds for rendering the “Radiosity_ReflectiveThings” scene. Northwood’s best time at 3.2GHz, by contrast, is 47.4 seconds. Prescott is very competitive, but it’s also very different from its predecessor.

 

POV-Ray rendering
POV-Ray is the granddaddy of PC ray-tracing renderers, and it’s not multithreaded in the least. Don’t ask me why—seems crazy to me. POV-Ray also relies more heavily on x87 FPU instructions to do its work, because it contains only minor SIMD optimizations.

With almost entirely x87 math and no help from multithreading, the Prescott has a rough time in our POV-Ray scene.

 

Cinebench 2003 rendering and shading
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration. Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading, as you can see in the results.

The Pentium 4 ‘E’ at 3.2GHz performs on par with a Northwood Pentium 4 at 2.8GHz in Cinebench’s rendering test.

Prescott takes two of the remaining three tests from Northwood, although the Athlon 64 is decidedly on top overall.

 

SPECviewperf workstation graphics
SPECviewperf simulates the graphics loads generated by various professional design, modeling, and engineering applications.

Prescott comes out looking great in SPECviewperf, beating the Northwood clock for clock with some consistency.

 

ScienceMark
I’d like to thank Alex Goodrich for his help working through a few bugs the 2.0 beta version of ScienceMark. Thanks to his diligent work, I was able to complete testing with this impressive new benchmark, which is optimized for SSE, SSE2, 3DNow! and is multithreaded, as well. Unfortunately, we don’t yet have a version of ScienceMark capable of taking advantage of SSE3’s new complex arithmetic instructions.

In the interest of full disclosure, I should mention that Tim Wilkens, one of the originators of ScienceMark, now works at AMD. However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website. We are sufficiently satisfied with his efforts, and impressed with the enhancements to the 2.0 beta revision of the application, to continue using ScienceMark in our testing.

The molecular dynamics simulation models “the thermodynamic behaviour of materials using their forces, velocities, and positions”, according to the ScienceMark documentation.

Primordia “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” In our case, we used the default element, Argon.

Prescott struggles in all three of the above tests without the assistance of SSE3, running, in the case of Primordia, nearly a full 60 seconds behind Northwood.

These last two tests, SGEMM and DGEMM, measure matrix math performance using several different codepaths optimized with several instruction set extensions, including SSE, SSE2, and 3DNow!

Prescott does especially well in DGEMM, showing off exceptional peak performance with properly vectorized data and SSE2.

 

picCOLOR image analysis
We thank Dr. Reinert Muller with the FIBUS Institute for pointing us toward his picCOLOR benchmark. This image analysis and processing tool is partially multithreaded, and it shows us the results of a number of simple image manipulation calculations. The overall score is indexed to a Pentium III 1GHz system based on a VIA Apollo Pro 133. In other words, the reference system would score a 1.0 overall.

The new version of picCOLOR we’re using today is optimized for SSE and SSE2, so it should perform differently than past revisions.

The Pentium 4 ‘E’ is fastest overall. In fact, despite a cache and clock speed deficit, the Prescott at 3GHz beats out the Pentium 4 Extreme Edition at 3.4GHz. Here are the individual test scores for some of our key participants.

The P4 ‘E’ 3.2GHz pulls out the overall win by getting the top scores in both fixed and float interpolation and putting in a strong overall effort. I have to wonder if some of the new bits inside Prescott, such as the shift/rotate unit, haven’t contributed to its strong performance here. Notice how much stronger the Prescott is in the Fixed Interpolation test. It’s much faster than the Northwood with or without Hyper-Threading, and Prescott gains performance with HT enabled, unlike Northwood.

 
Conclusions
Our benchmark results for the new Prescott-based Pentium 4 ‘E’ processors are the very definition of mixed. In some cases, Prescott looks very good, but in others, it’s slower than current Pentium 4 chips at the same clock speed. The larger caches and architectural tweaks have helped immensely in offsetting Prescott’s super-long 31-stage pipeline, but they haven’t entirely made up the gap. On balance, Prescotts are slower than Northwoods. I expect Prescott P4s will look relatively stronger over time as SSE3 instructions are adopted and, especially, as clock speeds ramp up.

Obviously, Intel is aiming for the future with Prescott, and that future includes nosebleed-inducing clock speeds, BTX cases with much-improved cooling solutions, fancy new CPU sockets, PCI Express, and more multithreaded applications. In the context of all of these changes, Intel’s modifications to Prescott make lots of sense. However, the future isn’t here yet, and the Pentium 4 ‘E’ chips are now consumer products. As consumer products, they’re not a great proposition. They’re slower overall than previous Pentium 4s, they run hotter, and they draw more power. I doubt there’s enough of a performance difference between Northwood and Prescott to matter to most folks, but Northwood is probably a better overall choice right now.

That’s especially true given the pricing, which follows a very simple formula. At a given clock speed, the Pentium 4 and Pentium 4 ‘E’ will cost the same. Also, AMD has priced the equivalent Athlon 64 models at parity with the Pentium 4. So, for instance, a Pentium 4 3.2GHz, a P4 3.2’E’GHz, and an Athlon 64 3200+ all list for $278 at present. The 3.4GHz/3400+ variants are $417. All other things being equal, I’d pick the Athlon 64 any day of the week—unless I were into video editing, media encoding, image processing, Lightwave rendering…

All depends on what you want to do.

If what you want to do is throw a whole truckload of cash Intel’s way to have the most impressive processor possible, you may want to consider the Pentium 4 Extreme Edition 3.4GHz. It will run you $999, or $799 plus your first-born child. The P4 Extreme Edition 3.4GHz may well be the fastest x86 processor on the planet right now, although the title belt isn’t unified. The Athlon 64 FX-51 shares a piece of the championship, especially when it comes to 3D gaming. And the A64 FX-51 is a virtual steal at only $733. But, then, you weren’t worrying about money, now, were you? 

Comments closed
    • alphaGulp
    • 16 years ago

    Correct me if I’m wrong, but I believe that the speed of a processor is determined by multiplying its basic speed by the depth of its pipeline.

    Intel likes to create processors with ridiculously long pipelines so that they can offer chips running at insane speeds – never mind the fact that the pipeline hardly ever gets filled.

    This is why Mac users used to talk about the ‘megahertz myth’: their processor had a ~10 stage pipeline so it ‘ran’ at much lower speeds. Similarly, the AMD processors run at a much lower speed since they also have a shorter pipeline, yet they have comparable performance.

    It is pretty amazing that the prescott is roughly as fast as the northwood, seeing how it is actually running much slower. Although the chip may be able to fill its pipeline some of the time, it surely must be getting a massive amount of misses.

    On the other hand, from the impressive list of changes (the huge cache being just one of them), it is clear that they didn’t get there cheaply. The prescott has the potential to be a very fast chip…

    As an aside, the cynic in me wonders why they picked 31 as the length for their pipeline – could it be that this is the number which makes the prescott match the northwood’s performance?

    • Perezoso
    • 16 years ago

    More news:

    q[http://www.anandtech.com/guides/showdoc.html?i=1958<]§

    • Anonymous
    • 16 years ago

    This is #104 again – just so people don’t misunderstand, I know branch mispredict penalties are a function of pipeline length – the 400-500 cycle number I am quoting is a forced access to memory when you consider a L1 miss (4 cycles) and then an L2 miss (additional 28 cycles).

    • Anonymous
    • 16 years ago

    #103 – That’s the most damaging thing about Intel’s situation. Since they have such a high clock speed, not only do they have to deal with scability problems that arise and the stalls that are incurred on cache miss or branch mispredict, (those get FRIGGING huge at high clock speeds – we’re talking 400-500 cycles as we approach 5 ghz), those 200 mhz increases aren’t going to do them any good for the reasons you’ve already stated.

    • wesley96
    • 16 years ago

    I’d like to see the scaling graphs for the Prescott and the Hammer and see which one’s got better curve…

      • wagsbags
      • 16 years ago

      great idea actually. though it may be rather misleading because amd doesn’t clock a proc up 200mhz to raise the model number accordingly.

        • Logan[TeamX]
        • 16 years ago

        Not really, because if anything it shows AMD engineers more efficient chips. They easily keep pace with Intel, while needing less clock speed per iteration to do so.

    • Anonymous
    • 16 years ago

    I want to see what an FX-51 at 2.3ghz or whatever it can manage vs P4EE at 3.6 and Prescott at 3.9. I think the results will begin to highlight where Prescott is going. It OC’s like crazy, and I think its performance starts to get really good when you do it. I’m not sure but that’s my thinking.

    $.02

      • Mr Bill
      • 16 years ago

      Over the time P4E clock increases 18%from 3.4GHz to 4.0GHz, AMD needs to reach only 2.6GHz. Thats two speed grades.

    • leor
    • 16 years ago

    i’m sure prescott will show it’s strength when it clocks higher just like the p4 did. when it was released it was a dog, but it found it’s legs.

    the only question is how well it will ramp up, if they can handle the heat issue and whether AMD, already using SOI will be able to outrun them, they have a real opportunity here . . .

    • Anonymous
    • 16 years ago

    I take back my previous nay sayings. Intels chips are now the cheapest processors out. I never thought I would EVER say that. So right now with slightly better preformance than the 3400+ on tests its without a doubt the best priced processor out. Sorry amd!

      • indeego
      • 16 years ago

      Link to evidence/claimsg{

      • Anonymous
      • 16 years ago

      pricewatch.com I of course am referring to the prescott’s not the northwoods…

        • wagsbags
        • 16 years ago

        Ummmm hate to burst your bubble but all the articles showed that Prescotts are slower than northwoods and amds comparable offerings….

          • Anonymous
          • 16 years ago

          In SOME tests, other wise its very close throughout. Plus the price even beats the 3200+!

            • Anonymous
            • 16 years ago

            Eh? P4 3.4 $424
            AMD 3400+ $404
            AMD 3200+ $268

            I may have missed something but the P4 3.4, cheaper that the 3200+?

    • Anonymous
    • 16 years ago

    I’m surprised there hasn’t been more Intel bashing in response to this topic as I thought there would be. Just as well, we as consumers should be happy when AMD and Intel release new processors since the competition between the two companies is doing us a lot of good regardless of whose processor you prefer. For those of you who would like to see one company or the other die off, just remember this.

      • albundy
      • 16 years ago

      well said!

    • Anonymous
    • 16 years ago

    Ok, the new chip seems a bit of a dud at the speeds it was launched…but saying that Athlon XPs are generally faster than P4s in most commonly used software around the world is a bit off…
    For the past 4 years I have always owned two PCs, one of each variety and more or less of similar speed ratings. I do a lot of different stuff, from compiling big projects to gaming, video encoding, web app serving etc. They both have their plus and minuses but I can’t really say i have seen noticeable speed differences between the two platforms (but i would take an intel chipset from all the others any day). However, i still haven’t got a hold of an A64, maybe it’s about time…too bad i am a bit short in cash 🙁

    • meanfriend
    • 16 years ago

    Just a small suggestion. For the graphs, has there ever been any thought to color coding the graphs?

    If the bars do not occur in the same order on every graph (because you are ranking by performance, or whatever) it would be nice to make the bars of the products being reviewed in a different color or pattern so they stand out and are easier to find. Even highlighted the name of the product on the axis would be helpful.

    • WaltC
    • 16 years ago

    (This was meant for Anand X in thread #74.)

    Thanks for the info…I admit the only site I checked was newegg. I’ll keep my eyes peeled…:)

    • liquidsquid
    • 16 years ago

    I would expect that a lot of compilers were optimized for the deficiencies in the non-prescott core, and those are the programs being tested now. I would be willing to say that if these products were re-compiled for the Prescott you would see larger gains (i.e. the shifter unit). However the AMD 64 still seems to own some pretty important tests. (And we have yet to see 64-bit performace!)

    -LS

      • LiamC
      • 16 years ago

      I had some feisty discussions with some prominent Intel boosters circa Willamette -> they were saying (and Intel PR did as well) “wait for P4 optimised apps – they should be available in several [b]months[

    • Anonymous
    • 16 years ago

    I have yet to find anyone make a plausible statement that represents a heavy multitasking workload. I’m sorry, but having multiple windows open doesn’t count because usually only the active window is consuming CPU cycles! Switching between multiple windows would be a function of whether or not the process was in memory or cached on disk, not whether or not your CPU has HT capability!!

    #1) Running Anti-Virus scan doesn’t count. Anyone actually try this and also attempt to get anything work related done? I’ll tell you why – it’s impossible. This is coming from someone who owns BOTH a dual processor opteron system and a 3.4 ghz pentium4 with “HT technology”. The reason? Your hard drive gets bogged down so much so from the scan that the other thread is stalled when you attempt to do anything I/O related. Same thing that happens with single processor machines.

    #2) Memory, memory, memory has more to do with how fast an operating system ‘responds’. This is more of a factor IMO than dual processors or HT. Think about it from another angle – when was the last time, using a preemptive multitasking operating system, did you have an occurance where you were running SETI/Folding@home/Prime95 in the background and you noticed it impacted your performance in the application you were working on significantly? It DOESN’T. That’s why it’s called “preemptive multitasking”. You don’t need to wait for a process to voluntarily give up a task before it starts to work. It just WORKS.

    #3) The ONLY plausible scenario that I can think of for dual processing/hyperthreading would be a SINGLE process with multiple threads that do work to make the task at hand go faster. This is what dual processing and hyperthreading were meant for!!!

      • DerekBaker
      • 16 years ago

      Try setting the anti-virus software’s priority to low.

      Derek

    • AmishRakeFight
    • 16 years ago

    I’m surprised they didn’t call it pentium 5 either, after all – remember the first pentium 4’s were slower than the pentium III’s they were supposedly replacing.

      • indeego
      • 16 years ago

      It’s probably not a good market to release a new marketing campaign quite yet. “Pentium 6” would fall on dead ears. late this year/early Next year I think would be more appropriate, depending on how growth hits this year (it’s expected to grow quite fastg{<.<}g)

    • Anonymous
    • 16 years ago

    Well anandadnawahatevertheheckitsnameis ran the prescott at 3.7 gig air cooled normal voltage and all without a hitch and expect you can get beyond 4 ghz rather easy… so it seems it is a faster chip if you overclock.

    Also interesting bit the faster its clocked the BETTER the prescott works compared to a northwood at the same clock speed..

    What that means is if you could get a 4 ghz northwood and a 4 ghz prescott the prescott would beat it hands down.

    Heck even going from 2.8 to 3.2 was a marked difference.

      • WaltC
      • 16 years ago

      /[http://www.newegg.com<]§ this morning, and unlike the day after the A64 launch in which you could buy boxed A64's there because they were already in the channel, Prescott does not appear to be available at the moment (at least not from NewEgg as of this a.m.) So, I'm not sure anything that any of these sites say about P4E and overclocking is relevant, since they have obviously received engineering samples versus samples of shipping chips. I would think that if there were no Prescott yield issues at present, that like A64, Prescott would already have been in the distribution channels on the day of its launch, wouldn't you? As well, and obviously, when Intel does ship them it has no intention of shipping them beyond 3.4GHz according to the launch data, so I think that it may be difficult to see the general kind of overhead in clocking that works for Anand with the specific chip sample Anand has. /[

        • anand
        • 16 years ago

        Pricewatch lists several companies that already selling the Prescott chips. Search for “prescott 2.8” or “prescott 3.2”. I went to one of them and tried ordering and it didn’t say anything about it being backordered, so it looks like they have chips in hand.

        • Anonymous
        • 16 years ago

        No what im saying is when higher speed prescotts come out they will be even better performers then some might expect simply due to this fact that the prescott scales better then the northwood did.

        Now the billion dollar question is will the prescott scale better then the athlon 64? Its gona hit 3.6 and 3.8 fairly fast and overclockers likely will be doing 4 and over rather soon…

    • Anonymous
    • 16 years ago

    I thought HT was supposed to help when running multiple applications concurrently. All the benches are using single apps. Can Tech Report find a benchmark that runs multiple apps at the same time and compare the CPUs using that? I’m curious…

    Tech Report rocks as always.

      • WaltC
      • 16 years ago

      Running “multiple aps” generally is referred to as “multitasking,” which is different from “multithreading,” but the differences are more in the software than in the way the cpu operates. A single application may have more than a single “thread” of code which, in the case of SMP (two or more physical processors) may be executed simultaneously in real time, a thread to each physical processor. The great bulk of existing applications, however, are “single threaded,” and do not benefit from being run by more than one physical processor. In a single cpu environment, if you run more than one application simultaneously you are “multitasking,” and the single cpu is “time sharing” between the two applications, so that they appear to be running simultaneously. This switching happens very fast, so fast that the multiple applications appear to be running “at the same time,” but the reality is that at any given moment the cpu is actually running one or the other and never both. How the cpu is shared among the multiple applications in this manner is determined by the OS, as well as by the applications themselves.

      The so-called “hyperthreading” feature in the P4 family is not to be confused with SMP (that is, SMT does not equal SMP.) What hyperthreading does in a P4 is to multitask among the multiple threads in a single, multithreaded application, so that the single physical cpu “time shares” the application threads just as it would time share the application threads generated by multitasking two or more single-threaded applications you might run. Generally, the cpu doesn’t know the difference (but the software could be different.) But the important point is that never does a single hyperthreading cpu execute two threads *simultaneously* as happens under a supporting OS in an environment with two or more physical cpus, in which the OS may designate each thread in a multithreaded ap to run on a separate processor, so that thread execution is truly simultaneous instead of “time shared.”

      That’s why you can see ~100% linear performance improvement in some multithreaded applications when run on two cpus versus one cpu of the same architecture, but will see a maximum of ~30% improvement running multithreaded aps on an HT enabled P4 versus and HT disabled P4. In short, a single HT-enabled P4 is no substitute for dual, HT-disabled P4s in terms of multithreaded application performance. It’s not even close.

      What’s interesting about the P4 and HT, is that what Intel has done is simply to include circuitry which raises the IPC of the cpu in the narrow condition of multiple software threads. The Athlon/A64, for instance, always runs at a higher IPC than P4 in all conditions including single threaded, multithreaded included, which is nice because the bulk of all software is single threaded…:) The downside for the P4 with HT turned on is that in the absence of multiple threads the performance will drop compared to what it would be with HT turned off, and so basically, in order to get maximum performance out of an HT P4, the user has to know when to turn on HT and when to turn it off. That will vary according to the software different people use–a person running mainly multithreaded professional software would never have to turn it off, but someone running mainly 3d games (which are all single threaded) would never want to turn it on, IMO, which is but one of the things I don’t like about it Intel’s HT…:)

      I prefer consistency in IPC in a single cpu among single and multithreaded situations both. And if my concern was primarily multithreaded software performance, I would go with dual-cpu SMP always over single-cpu hyperthreading.

        • Anonymous
        • 16 years ago

        Except you’re wrong. HT does allow 2 threads or processes to run simultaneously, but since execution resources and cache is shared, the performance gains range from 10-30%, an excellent result given the ~5% increase in die size. And setting the stage for to take better advantage of clockspeed, execution resources and cache size increases.

    • WaltC
    • 16 years ago

    What I’m looking forward to is getting to a point where we have “heavily optimized” A64, 64-bit code we can test against “heavily optimized” P4, 32-bit code. At that time I think we’ll have apples to apples. Most, if not all, of the performance evaluations the A64 “loses” currently it loses to applications which have been heavily optimized for the P4 over the last few years.

    That performance differences are software related in these cases is spelled out by looking at Ace’s Hardware findings for *[http://www.aceshardware.com/read.jsp?id=60000319<]§ ...in which 3.2 GHz Prescott performance is behind not only the A64 3400 & 3200, but also is behind the P4C 3.2. So clearly, the idea that "Prescott is faster for media encoding," is simply untrue. Prescott's speed in media encoding relative to other cpus will depend on the specific media-encoding software being used for media encoding when all of the cpus are tested. I suspect that when we compare software equally optimized for both A64 and P4, A64 will win probably every time. Really, saying that one cpu is inherently faster than another for certain tasks, without looking at the specific pieces of software used for the test, indeed the specific tests themselves run within that software, cannot possibly be accurate, IMO. Here is what Ace's says about Prescott's loss in the Ace's "media encoding" tests: /[<"We have to do more encoding tests to be absolutely sure, but we were quite surprised to see the Pentium 4 family beaten in WME 9.0, which is well optimized for SSE-2. However, WME 9 also uses the SSE-2 capabilities of the Athlon 64 much better. Prescott takes 11% more time to complete the encoding than the previous 3.2 GHz Pentium 4. According to some early reports we heard that SSE-3 instructions can improve encoding performance by 5-7% at most, but even that will not make Prescott shine."<]/

    • Anonymous
    • 16 years ago

    What baffles me is how terrible the Prescott results were in ScienceMark. I was honestly expecting a better show given the improved latency of some of the instructions, and the fact that the code is not very branch dependent. In reality the bottleneck is memory latency and bandwidth, both 2 issues that really are taken care of by a larger cache (increasing cache hit ratio) and the strength of the 800 fsb platform. Additional bottlenecks are multiplication/addition (FP – MolDyn) and some transcendentals (a fast power function is a must for Primordia).

    I’m going to have to do some more research and see if there’s any bottleneck I can code around.

    • Logan[TeamX]
    • 16 years ago

    The A64 also runs FPU/X87 math faster, and for those of us who participate in Grid.org’s cancer and / or smallpox research projects, X87 horsepower is all we need. I swear by my OC’ed Barton 2500+ (@ 2.25GHz) for cancer research.

    Also, I’ve been grilling them for a bit about running a PiFast benchmark, ala the boys at Hexus.net. That tests your FPU, memory bandwidth, and cache size/speed effectiveness.

    Heck, for that matter I know they’ve bantered about a F@H bench, but never got it off the ground. F@H can be forced to run SSE or SSE2 extensions (not sure on the latter, actually) and pitting two classes (three with the EE) against the two classes of A64 would be a rather nice shooting match.

    There is a LOT more than games that an AMD64 is good at. Perhaps file compression? Compiling?

    Just because Damage hasn’t covered it here, doesn’t mean it hasn’t been done. Obviously your fingers aren’t broken if you’re posting here, so go Google some more A64/P4 comparos and see what other reviewers have to say.

    [EDIT] – just thought I’d add that it’s curious that the P4s don’t appear to have the best memory bandwidth in SiSoft Sandra, but do in Cachemem. Yet, they fall behind pretty good when it comes to the A64’s single-channel, on-die memory controller coupled with HT and low-latency RAM. 48 ns for memory access is really quick in ANYONE’s books.

    What’s even more “fishy” is the fact that the Prescotts, for all their doubled L2 caches and “superior” strained silicon design… they get absolutely murdered at Sciencemark’s tests by their Northwood brethren. Last I checked Sciencemark depended on whatever instruction sets you ran (SSE2 included, and HT-aware as well), and *gasp* memory bandwidth. For the Prescotts to get absolutely flogged by their own stablemates, nevermind AMD… that’s embarassing.

    At least the P4 3.4EE manages to edge out the FX-51 in Molecular Dynamics, but still gets its clock cleaned by the 2.2GHz A64. Once again, in Primordia, which leans more favourably on memory bandwidth, the P4 3.4GHz EE manages to edge out one, but not two, of the 2.2GHz AMD64 offerings. If nothing else, Primordia confirms what SiSoft Sandra says about the memory bandwidth of the FX-51: it’s there in droves. A chuckle goes to the P4 2.8GHz Prescott for finishing dead last in the M.D. bench.

    Finally, when comparing the benches for MD and Primordia, you’ll see that the tests are indeed HT-aware, as the non-HT scores are considerably higher for the P4s of both (three?) varieties.

    Do I draft? No. Do I design funny cartoons? No. Then I guess I have no need for a P4, either Prescott, Northwood (which appears to be the better of the two), or the Xeon-cloaked-as-an-EE.

    Sure, you can argue that the FX-51 is really an Opteron 148, but then, why does the A64 3400+ manage to edge it out still in a few benches? Because it doesn’t matter, AMD still has the bang for the buck, and the A64 single-channel line proves it.

    • bozzunter
    • 16 years ago

    #58 Sorry, but talking about “video editing, media encoding, image processing, Lightwave rendering… “, given that a Pentium 4 is also faster in Office applications, leaves out only games for Athlon 64…

    • Anonymous
    • 16 years ago

    bozzunter, i think damage is fair,

    he is just saying that intel is better for video editing and encoding and that amd is better for games.

    so what is your problem. Damage did not take sides, if he likes to play games than he should take a amd cpu. If he wanted a video station that is very stable, he would have said that he would prefer an intel chip.

    Where is the big deal??

    Cem

    • bozzunter
    • 16 years ago

    “All other things being equal, I’d pick the Athlon 64 any day of the week—unless I were into video editing, media encoding, image processing, Lightwave rendering… ”

    That’s great, we have Tom’s Hardware which can’t see anything but Intel, now let’s go to the opposite direction with Techreport. Given the benchmarks, I’d say:

    “All other things being equal, I’d pick the Athlon 64 any day of the week—unless I were into everything but games”.

    I can’t really understand why people here, and almost everywhere in the world, is for a processor like if it was a football team. For an AMD supporter is almost impossible to realise:

    a) A Pentium 4 3 GHz is less powerful than an Athlon 64 3000 in games
    b) The same Pentium 4 3 GHz is almost always more powerful than an Athlon 64 3000 in any other kind of application
    c) Every Pentium 4 3 GHz can be brought to 3,4 / 3,6 GHz only raising the FSB and the volts (1.6). Any person able to deal with the BIOS can obtain (for free) a processor which ALWAYS outperforms an Athlon 64 3000.

    I’m simply reading the benchmarks, I’m really no kind of AMD-Intel supporter (I’d rather support my football team, or a tennis player, or whatever, but not an enterprise, hell!), but I can’t understand why a technical article must be transformed in a football match discussion.

      • Koly
      • 16 years ago

      “a) A Pentium 4 3 GHz is less powerful than an Athlon 64 3000 in games
      b) The same Pentium 4 3 GHz is almost always more powerful than an Athlon 64 3000 in any other kind of application”

      Non-gaming benchmarks where A64 3000+ is equal or better than 3.0GHz Northwood or Prescott:

      Sphinx: A64 0.642 Nw 0.645 Pr:0.571 (lower is better)
      Lame: A64 58.9 Nw:54.0 Pr: 59.5 (lb)
      3ds max OceanSunset: A64 23.0 Nw 22.0 Pr 24.0 (lb)
      POV-Ray: A64 310 Nw 360 Pr 382.5 (lb)
      Cinebench CINEMA4D: A64 334.5 Nw 317 Pr 338.5
      Cinebench OGL softw.: A64 1548 Nw 1441 Pr 1366
      Cinebench OGL hardw: A64 3146 Nw 2834 Pr 2841
      SPEC viewper. drv09: A64 36.96 Nw 35.22 Pr 39.32
      SPEC viewper. ugs03: A64 22.50 Nw 21.99 Pr 23.58
      ScienceMark Mol.Dyn.: A64 92.96 Nw 97.22 Pr 109.66 (lb)
      ScienceMark Primor.: A64 425.6 Nw 448.2 Pr 507.2 (lb)
      ScienceMark C.AES: A64 13.31 Nw 14.85 Pr 15.3 (lb)
      picColor: A64 3.56 Nw 3.10 Pr 3.92

      As you can see, there are quite some. Open your eyes, Intel fanboy. In reality, A64 is more powerful in scientific applications, little behind in rendering and multimedia\encoding. And much better in games.

        • Samlind
        • 16 years ago

        There’s another reality that all these benchmarks don’t cover. That reality is for a huge, and I mean huge, installed base of software used by businesses every day, Athlons, even the non-64 variety, are faster. The reason is this software base will never be optimized for SSE let alone SSE2 & SSE3, and relies on X87 to get the floating point done. Companies don’t buy $40k software packages because there’s a new revision out, they just keep using what’s working. Also, there’s stuff that falls below the benchmarking radar – like huge Excel spreadsheets that take minutes to recalulate.

        All of those applications are helped dramatically by a triple issue FPU.

          • Anonymous
          • 16 years ago

          Spreadsheet recalculation! You’re right, that’s definitely a benchmark I’d like to see in future CPU reviews.

      • Ruiner
      • 16 years ago

      You can OC the a64 3000 too (usually to 3200+ rated speeds), unfortunately not as easily.

      Damn multiplier locks.

        • ionpro
        • 16 years ago

        Gee, glad to know you can overclock the 2.0Ghz Athlon 64 3000+ to… 2.0Ghz Athlon 64 3200+ speeds. It’s the cache that differs between the two processors, nothing more or less. In most benchies, that less than a <10% difference in CPU speed, meaning <5% difference in system speed. Not a huge deal.

          • Ruiner
          • 16 years ago

          Well I meant roughly. The overclocks I have seen have been in the 2150-2350 MHz range. Yes that’s with half the cache of the ‘real’ 3200. What the PR of such an OCed chip is up to you.

    • Anonymous
    • 16 years ago

    Queen put out a song that fits this situation.

      • derFunkenstein
      • 16 years ago

      Fat-Bottomed Girls?

        • NeXus 6
        • 16 years ago

        Intel fanboy: We Will Rock You/We Are The Champions.

        AMD fanboy: Another One Bites The Dust.

          • Logan[TeamX]
          • 16 years ago

          More like:

          Intel fanboy: Why does my computer need LN2 to win? – A concerto in E Major.

          AMD fanboy: The trials and tribulations of clearing the 2.4Ghz barrier – an opera in D Minor.

    • Anonymous
    • 16 years ago

    Completely off topic but.. §[<http://www.mercurynews.com/mld/mercurynews/business/7849191.htm<]§ Can anyone else smell the irony? Using macs bwahahaa.

      • Anonymous
      • 16 years ago

      Oh i forgot to mention I do realize a good chunk is FUD but the ppc things kinda sounds true? I’m not all that techy but wouldn’t that help cut down on pirating due to a different architecture….? Ok i will shut my newb mouth up.

    • FubbHead
    • 16 years ago

    Gee. Again they release an underperforming processor, which relies on very specific conditions to do the actual processor speed any justice at all. The P4 has really become the bloatware of the processor industry in my book. They can do much better, as they have showed with the Centrino.

    The P4 is kind of like a fast train. It can go fast, but you still need a damn railway to do it.

    • Rakhmaninov3
    • 16 years ago

    Excellent review, if only I had the money to buy bleeding edge stuff and the time to play with it!!

    • Anonymous
    • 16 years ago

    Maybe intel should name the p4E after element 115, ununpentium uuP. §[<http://www.webelements.com/<]§

    • Anonymous
    • 16 years ago

    Wow, so much venom #31. More source will be posted, but not enough to make a full build.

    • Anonymous
    • 16 years ago

    Intel: Introducing our newest processor the P4 E. You can downgrade today!

      • albundy
      • 16 years ago

      Me first ME FIRST!!!

    • Anonymous
    • 16 years ago

    What the heck…. This processor sucks. It pulls its heat around in a couple of tests… but thats it. I hope SSE3 gives it a huge boost.

    Another thing I noticed is that half the time you need to DISABLE the HT if you want the full potential of the processor. “Hold on guys I gotta restart and disable HT before I play”. Pffffffft

    Great review btw nice to have some hard numbers on the prescott.

      • indeego
      • 16 years ago

      Let’s look into our historical crystal ball: Did SSE1 or SSE2 provide any great advantage on release? Absolutely NOT. 6 months after release? no. A year? Maybe. Were higher clocked processors released anyways that would have made the marketing term an asterisk anyway? yeahg{<.<}g

      • Anonymous
      • 16 years ago

      Thanks for the info, I actually didnt know that.

    • Mr Bill
    • 16 years ago

    Good write up. The P4E sure seems to be different and does not seem to like older code compilations. I wonder how the linpack benchmark would perform if recompiled for each vendor?

    Say AMD keeps ratcheting the clock 200MHz per release. How much do you think Intel will have to increment their clock to match the performance? Seems like Intel is going to continue using the cpu clock in the CPU name. Will AMD have to come up with yet another naming scheme?

    • atidriverssuck
    • 16 years ago

    hmm, bit of a toss-up. There’s something inside me that dislikes relying on software optimisations too much (even tho processors and the software for them have to change as time goes on), and from a pure hardware standpoint (and it might just be my “hardware-centric-isms” here), I’d have to say I’d be much more comfortable with buying an Athlon 64 if I had to slap out the cash right now. It just comes across to me as more consistent overall, and less reliant on the ‘whims’ of (un)optimised software.

    I suppose a heavy reliance on encoding/video (and rendering) in your work with some of the apps tested would be a good reason to go Intel but I’d feel dirty doing it. Also, we don’t really know how optimisations with Athlon 64 will go, since the product is relatively new compared to the years the p4 line has been coded for, yet it still puts on a very competitive showing despite all of these ‘limitations’. This makes me think it will kick major arse once optimised for.

    What I found amusing was seeing lots of Non-HyperThreading enabled showings score higher than HyperThreading. Sphinx speech recognition benchmarks also show that the Intel compiler benefits Athlon 64 more than it does its own processors. Hmmm 🙂

    Yes, I think the hardware-centric-isms are gonna decide for an Athlon.

      • Anonymous
      • 16 years ago

      Of course, when most of the benchmarks are single-threaded, HT is not going to show its benefit.

        • atidriverssuck
        • 16 years ago

        I suppose, but declines with HT enabled aren’t favourable, u know? Which brings me back to software reliance, and the fact that Intel has already had a nice head start with HyperThreading over AMD’s processors. It all comes down to which direction software goes in, I think, and of course price for peformance. Right now I’d go Athlon 64.

        Time will tell how well the hardware is exploited by future software for both processors.

          • Anonymous
          • 16 years ago

          The differences are typically <2%. The benefits to multi-threaded applications are usually >10%. Running two instances of some single-threaded benchmarks (like LAME) would show large increases in throughput as well.

            • atidriverssuck
            • 16 years ago

            Hmm, now there’s an idea for the next bunch of benchmarks. Running 2 instances of LAME & comparing…

            • Anonymous
            • 16 years ago

            I’ve done some testing of running multiple instances of applications. LAME gets an increase of throughput of 27%. Other applications usually show 10-40% gain.

        • rmstow
        • 16 years ago

        What ? You don’t think the OS, the device drivers, the services running in the background, etc., aren’t going to have a few threads running too ?

        These days, when you think your W2K or XP machine is sitting there with all apps closed and nothing much going on, chances are you have at least twenty processes running.

          • Yahoolian
          • 16 years ago

          And guess what? Most of the time those processes are doing nothing, so the ability to run them at the same time provides a very small boost, if at all…

    • Anonymous
    • 16 years ago

    AGAIN I call bullsh*t to your comments on sciencemark

    Most of the source code is, in fact, not available. Even the old code on the SM1 site isn’t representative of their SM1 benchmark binary. So please, double check before you speak and ditch the comment.

    ” However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website.”

    • Anonymous
    • 16 years ago

    YAWN! Not only because it 6:28am here (European Central Time) but what a boring cpu. Let’s see what the Socket 939 can do.

    • Anti-M$
    • 16 years ago

    Looks like intel’s 2nd big failur

      • wagsbags
      • 16 years ago

      Why does it not surprise me that you said that. Of course you’re completely disregarding the fact that by the time it gets to 3.6 it’ll be faster than northwoods are ever going to get.

      • atidriverssuck
      • 16 years ago

      it will be far from a failure. Intel has plenty of customers and the brand recognition (and promotional $) AMD can only dream about. I don’t see anything inherently ‘wrong’ with the direction this processor is going in.

      It’s just another choice you can make when buying your parts, which is a GOOD thing. It has different strengths and weaknesses than its competitor.

      Now price for peformance? That’s another story entirely. But since Joe User buys from Dell & Co, his choices will likely be dictated by marketing & whether or not the computer with the ‘cool case’ comes with Intel inside with a high GHz number.

      That alone qualifies almost anything Intel does as a sucesss 🙂

        • indeego
        • 16 years ago

        Had the release been at those higher scaled processors and shown improvements across the boards there would be no question: It would have been a success. But to say “wait and see this thing will really take off!!1” we think, eventually— that ain’t going to cut it. The AMD-64, on release, beat the absolute shit out of the Athlon XP’s on it’s release dateg{<.<}g

          • Anonymous
          • 16 years ago

          Yeah my thoughts exactly, im sure in time it will show improvement but… why would you even buy more expensive p4 E’s when the regular p4 does better. But I guess intel has to start somewhere right? I guess they couldn’t throw out a 3.6 or 3.8 right off the bat.

    • Steel
    • 16 years ago

    So how is Damage Jr. adjusting to his new life at Intel?

    • derFunkenstein
    • 16 years ago

    I’m going to have to reserve judgement until I see how it scales. Anandtech *tried* to say it showed promise because of the way it scales…but all I could tell in those graphs is that it scales differently depending on the app…no real trends…I’d like to see one wide-open at 4GHz or so…

    • Captain Ned
    • 16 years ago

    IIRC, one of the big hacks against the P4 when it came out was its lack of a barrel shifter unit, which at the time was reputed to cost 6-7 cycles for such code not P4 optimized. Is the new shift/rotate unit Intel’s way of admitting that they were wrong on the barrel shifter, or is it an after-the-fact band-aid to realize that not everyone recompiled their source code just for P4s?

    • Anonymous
    • 16 years ago

    q”[To keep Prescott-based P4s distinct from older “Northwood” cores, Intel is tacking an “E” on to the product names, so they’ll be called the Pentium 4 2.8E or 3.2E.”]q

    I was wondering what they’d call it. I’m guessing calling it ‘P4 X.XP,’ or just ‘P4P’ for short would have sounded too close to ‘P2P,’ which we all know is a four-letter word in computing these days.

    So they call it the ‘P4E?’ What happened to ‘D?’ Perhaps they wanted to try to -[

    • Krogoth
    • 16 years ago

    It looks like that Prescott really needs apps to take advanage of SSE3 plus major ramping on clock speeds. Of course this is currently limited by the sereve leakage issues that the Prescotts currently have.

    BTW Damage excellent article top-notch as always. 😀

    • endersdouble
    • 16 years ago

    Posting this that damn close to the superbowl….damn you Damage, way too much of a conflict of interest!

    • endersdouble
    • 16 years ago

    Posting this that damn close to the superbowl….damn you Damage, way too much of a conflict of interest!

    • LicketySplit
    • 16 years ago

    Good review as usual Damage….looks like it will show promise at higher mhz… 😉

    • Anonymous
    • 16 years ago

    A great peace of bad engineering. Seemingly it’s only good where the code only loopes and does the same thing over and over again like in video encoding.

    Basicly it’s better at what Nortwood was best at and is worse at what it was bad at.

    If AMD’s next revision of the Athlon 64 has improved SSE performace, say double, it will rule the Prescott almost everywhere.

      • DaveJB
      • 16 years ago

      Actually, considering the 31 stage pipeline, higher cache latency and that the ALU’s are (supposedly) no longer double-pumped, the fact that it doesn’t perform something like a 2A GHz P4 is a miracle of engineering!

      And it’s “piece”, not “peace”.

        • Anonymous
        • 16 years ago

        I totally agree with you #7

        • Anonymous
        • 16 years ago

        Piece, maaan.

          • wagsbags
          • 16 years ago

          actually I’m very surprised at how good of a processor this is. Not something to buy now but when it hits 3.6 it’ll definitely be at least as good as northwood would.

            • Anonymous
            • 16 years ago

            …and that’s good!?!?

            ………. 3.6 to compete with 3.2!? Anyone find it ironic that this fixes AMD’s QS rating problem that they had with the AthlonXP line? 😉

            • wesley96
            • 16 years ago

            Well, I suppose they (AMD) knew this was coming. 🙂

        • Ardrid
        • 16 years ago

        The ALUs are still double-pumped.

          • Chryx
          • 16 years ago

          No, they aren’t, I’ve checked with an ex-Intel guy I know (he was on the design staff for Willamette), Prescotts ALUs are single pumped.

      • Krogoth
      • 16 years ago

      Also the Prescott is at a disadvanage versus Northwood since nothing out there yet takes advanage of SSE3. In areas where the Northwoods exceled in will be overtaken by Prescotts once developers update their software to take advanage of SSE3.

    • terabithian
    • 16 years ago

    What ever happened to a Pentium 4-D?

      • wesley96
      • 16 years ago

      Given that there were Pentium 3 ‘E’s back in the old days, they didn’t ‘skip’ D, per se.

    • Yahoolian
    • 16 years ago

    No overclocking test!?

      • Anonymous
      • 16 years ago

      Does anybody except ghetto kids “overclock” their PC? Serious question!

        • wagsbags
        • 16 years ago

        I would say a large portion of people that build their own computers overclock. I didn’t know enough at the time so I bought a dell otherwise I’d definitely be overclocking.

          • indeego
          • 16 years ago

          I build all my own PC’s and many Friend’s, (and considering building all refresh for work) and I almost never overclock other than for the occational test. Reliability and price/performance at stock is far more importantg{<.<}g (to me)

          • Anonymous
          • 16 years ago

          I construct my PCs for years. I don’t overclock 🙂

        • My Johnson
        • 16 years ago

        You know, I haven’t a clue. Aren’t all the rich kids running 250MHz FSB’s with their current Canterwood setups?

          • Krogoth
          • 16 years ago

          More like smarter enthusiast are getting P4 2.4C and 2.6C paring up with the Springdale with PAT-like hack. To save few hundred on getting near-performance of a 3.2C with the Canterwood and beyond.

        • nexxcat
        • 16 years ago

        Well, a one-liner like “we managed to reach x.yGHz on our overclocking attempt” would give us an idea on how soon we may expect faster processors, and also on how good Intel’s yields are on the new process and such.

      • Ruiner
      • 16 years ago

      HOCP also declined to mention their OC results. I’m guessing they were horrible.

    • Spotpuff
    • 16 years ago

    Any thermal tests?

    Did Intel introduce a new HSF for the chip, considering it dissipates more heat?

    This is very reminiscent of the P4 launch, where the Williamette got stomped by the AXP… it’s a lot closer now but again this is a chip for the future, not the present. Intel releasing it now kinda makes you feel like if you buy it you get to be a guinea pig/beta tester.

    Guess everyone is busy watching the superbowl rather than reading TR. Shame! 😛

      • DaveJB
      • 16 years ago

      I /[

      • wagsbags
      • 16 years ago

      Damage great article but we seriously need to know how this thermally stacks up against the northwood. Maybe with some overclocking, how hot with this thing gets when it hits 4ghz or so? Will air be enough.

        • Spotpuff
        • 16 years ago

        Yeah at 3.2 it’s doing 102W or something I think it said in the article; which is far, far too high. I think BTX overall is a good idea but Intel is _[

      • Anonymous
      • 16 years ago

Pin It on Pinterest

Share This