The move was a radical conversion and a stunning act of agility from a company so large and so invested in long-range planning. Judging by the results, however, it seems to have worked. The Pentium Extreme Edition 840 has arrived roughly one year after news broke of Intel’s big shift in direction, the first fruits of the new era of parallelism.
The first thing you may notice about the Pentium Extreme Edition 840 is the name, which conspicuously does not include the words “Pentium” and “4” arranged together back to back. Intel has revamped its naming scheme for its dual-core chips, despite the fact that the Extreme Edition 840 packs a thoroughly Pentium 4 heritage.
In fact, this dual-core chip, code-named “Smithfield,” is actually a pair of Pentium 4 “Prescott” cores situated together on a single piece of silicon. Each core has 1MB of L2 cache onboard, and the two cores share an 800MHz front-side bus. Beyond the Siamese twins action, there’s very little to differentiate these chips from the latest Pentium 4 processors like the 600 series. Like those chips, Smithfield features support for Intel’s EM64T extensions for 64-bit computing, and it can save on power consumption via the one-two punch of an enhanced halt state and SpeedStep power management.
Smithfield is also manufactured using the same basic 90nm fabrication process as current Pentium 4 chips. However, it’s roughly twice the size of the Prescott core at 230 million transistors and 206 mm2 of die space. Heck, have a look at a picture of the Smithfield die, and you’ll know instantly what’s up.
I swear I did not make this up in a paint program. Smithfield really is two Prescott cores side by side.
For starters, Intel will be selling four versions of Smithfield, and the version we’re reviewing today is the highest end product. The Extreme Edition 840 is clocked at 3.2GHz and has Intel’s Hyper-Threading technology enabled. If you pay the premium for the Extreme Edition, you get the extra coolness factor of seeing four CPU utilization graphs charted in Task Manager or your CPU monitor of choice. The more pedestrian versions of Smithfield will be dubbed the Pentium D, and they will not include Hyper-Threading support. Here’s how the whole product line shapes up:
|Pentium D 820||2.8GHz||No||$241|
|Pentium D 830||3.0GHz||No||$316|
|Pentium D 840||3.2GHz||No||$530|
|Pentium Extreme Edition 840||3.2GHz||Yes||$999|
Prices will range from $999 for the Extreme Edition 840 to as low as $241 for the Pentium D 820, although pricing on the Pentium D processors isn’t yet 100% official. Obviously, with these kinds of prices, Intel is dead serious about pushing deep into the desktop PC market with dual-core products right away.
I am a little perplexed about the choice of Hyper-Threading as a means of setting apart the Extreme Edition from the regular Pentium D. Yes, I understand that Intel needs something to make its flagship processor special, but Hyper-Threading offers decent performance gains, sometimes of up to 50%, through thread-level parallelism precisely at a time when Intel is pushing the industry heavily in that direction. Disabling Hyper-Threading for the vast majority of Smithfield-based chips out there, which will be Pentium D processors, seems like a drastic measure. We’ve generally accepted Hyper-Threading as a Good Thing on the Pentium 4 and would hate to see it go.
With this concern in mind, I asked Intel whether there wasn’t some ulterior motive involved in this choice beyond product segmentation. To my surprise, their answer was no. Apparently, this capability will lie dormant in most of the world’s Smithfield-based processors in order to keep the Extreme Edition looking, er, extreme.
New chips, new chipsets
The new dual-core processors will require the support of new core logic, and for our testing, Intel supplied us with an early version of its own D955XBK motherboard based on the new 955X chipset. This new chipset is more than just a tweak for dual-core capabilities, however; it includes several other improvements over its predecessor, the 925XE. Let’s review them quickly:
- An enhanced memory controller The 955X officially supports DDR2 memory clocked at 667MHz, up from 533MHz in the 925XE. Heck, the Intel 955X motherboard actually has an option listed for DDR2 800, although I haven’t tested to see if it works yet. The 955X will also support up to 8GB of RAM (optionally with ECC support) so users of 64-bit operating systems can cram in lots of memory.
- Improved Matrix Storage The 955X north bridge will pair up with Intel’s new ICH7 south bridge I/O chip, and the ICH7 adds even more server-class features to Intel’s Matrix Storage Technology with the addition of RAID levels 5 and 10. Both of these RAID flavors offer a combination of better performance, higher capacity, and better data integrity than a single drive or the more common RAID levels 0 or 1. RAID 5 allows for lots of storage with redundancy on a three-drive array, and RAID 10 offers better fault tolerance in a four-drive array than RAID 0+1. Of course, the ICH7 will continue to support Native Command Queuing for SATA drives, and it adds 3Gbps SATA-II transfer rates. The addition of these new RAID levels and SATA-II transfer rates should keep Intel at least half a step ahead of NVIDIA’s nForce4 SLI Intel Edition on the storage front.
More PCI Express lanes The ICH7 south bridge also has two more PCI Express lanes than the ICH6, for a total of six. Intel has attached four of those lanes on its D955XBK motherboard to a PCI Express x16 physical slot that promises the possibility of a dual-graphics configuration, like so:
Intel is quick to caution, though, that this slot hasn’t been validated for use with graphics cards. Yet.
Beyond those changes, the 955X chipset is fairly similar to the 925XE. Intel will be bringing out lower end versions of this chipset in the 945 lineup, and those won’t feature internal memory controller timings as aggressive as the 955X’s.
Dual-core approaches compared
Intel’s approach to dual-core processors is somewhat different from the approach taken by its primary competitor, AMD. The Pentium Extreme Edition 840 offers essentially the same thing as a pair of Xeon 3.2GHz processors, but in a single CPU socket with a faster memory subsystem. Although its two cores are on the same chip, they communicate with one another and with the rest of the system by means of a shared 800MHz front-side bus. All memory accesses, I/O, and cache coherency updates between processors must traverse this shared bus, which has a peak throughput of 6.4GB/s. That’s less bandwidth than the 10.7GB/s theoretical peak transfer rate of the 955X chipset’s dual channels of DDR2 667 alone.
AMD’s dual-core processors, including the newest Opterons and the upcoming Athlon 64 X2, have a design modified specifically for dual-core implementations. You can read my review of the dual-core Opterons for a more detailed discussion of AMD’s design, but I’ll summarize here. AMD’s dual-core chips share some common resources between cores, including a single system request queue, a single on-die memory controller, and a single set of HyperTransport links for external I/O and cache coherency updates. The two cores can share data with one another via the very high speed on-chip system request interface, which is how cache coherency updates (and cache-to-cache data transfers) are passed. Overall, this arrangement gives the Athlon 64 more paths for critical data with higher bandwidth and lower latencies than Intel’s shared bus approach. In short, it’s more elegant.
That’s not the whole story, though. If the Pentium Extreme Edition 840 performs well, we can forgive some technical inelegance. And Intel’s plans for dual-core processors extend well beyond this first implementation or its technical merits. Intel has made public its ambitions for dual-core products, and those plans rely heavily on Intel’s traditional strengths: manufacturing and selling chips in volume. While AMD is focusing primarily on servers for its dual-core parts, Intel has committed to bringing dual-core processors to desktop PCs sooner, and at lower prices, than the relatively expensive Athlon 64 X2. The X2’s availability will likely be rather limited until late 2005, too. As a result, the value proposition for Intel’s dual-core processors may be much more tempting.
Another key to success in dual-core chips for Intel will likely be its transition to a new 65nm fab process, which is coming soon. The 65nm dual-core desktop processor, code-named Presler and due in early 2006, will pack 2MB of L2 cache but should be much cheaper to manufacture. Not only will the chips be much smaller thanks to the die shrink, but Intel also intends to package together two separate pieces of silicon to make one dual-core processor. This approach should bring much higher yields, because two “good” cores need not be cut from the same place in the wafer, and a “bad” core doesn’t necessarily torpedo the one next door. As much as we dislike Intel’s shared system bus, this is a smart way to make chips, and it probably wouldn’t be possible with AMD’s dual-core design.
Savor the creamy smoothness
Dual-core processors deliver additional performance in two ways: through the use of multithreaded applications or through improved multitasking. Much of the buzz about the first wave of dual-core processors has centered on better multitasking and the improved user experience that multiprocessing can provideand rightly so. Dual-processor systems (whether dual CPU or dual core) exhibit a very nice performance characteristic apart from high peak throughput; they don’t slow down as easily when things get bad. In a fast, modern PC, avoiding extreme slowdownssome of which we may not even notice because we’ve learned to accept themis a wonderful thing. The move to symmetric multiprocessing on the desktop coincides nicely with the migration of other multitasking-oriented technologies from the workstation/server worlds into desktops, such as RAID and command queuing for storage or the ability to reserve bandwidth for isochronous data transfers. Taken together, these things should allow desktop PCs to deliver a much more responsive experience in difficult scenarios where they have traditionally felt sluggish.
That said, concerns about smoother multitasking will only take us so far. A dual-core CPU like the Pentium XE 840, with its four front-ends thanks to Hyper-Threading, already offers copious amounts of creamy smoothness for multitasking. Adding even more cores in the future probably won’t bring many tangible improvements in the user experience. In order for CPU performance truly to continue ramping up over time as it has in the past, thread-level parallelism in software must succeed in delivering real, consistent performance benefits, and not just in easy cases. Some types of tasks lend themselves well to multithreading, but others do not. Currently, virtually no games are multithreaded, for instance, because adding threading complicates development work substantially. Over the long term, the question of thread-level parallelism is the larger one for CPU performance, and that’s the issue we’ve attempted to address in our test selection for this first set of multi-core CPU reviews.
Our focus on thread-level parallelism won’t prevent us from savoring the smoothness of dual-core systems, though. We’ve already hatched plans to test some multitasking scenarios in a future article. If you have any ideas about scenarios you’d like to see tested, feel free to let us know.
These questions about the performance benefits of dual-core processors are magnified by the stark choice presented by the Pentium XE 840. Think about the tradeoff here: you could have a Pentium 4 Extreme Edition with 2MB cache running at 3.73GHz on a 1066MHz front-side bus, or, for the same price, you can get two of the same CPU core with 1MB of cache each running at only 3.2GHz on an 800MHz bus. Which is better? Tough call, but that’s the kind of choice Intel’s dual-core strategy will present to consumers. We’ll keep an eye on that choice as we look through the benchmark results.
We’ve included results for a wide range of CPUs in the following pages, including some workstation-class processors like the Opteron and Xeon. Intel has decided to position the Extreme Edition 840 as a single-socket workstation solution as well as a high-end desktop processor, so these comparisons make some sense. Still, putting the XE 840 up against a pair of Opteron 275s isn’t really a fair fight. Those results are included because they shed some light on the question of thread-level parallelism, not because the XE 840 needs to match them.
In fact, the Pentium Extreme Edition 840’s most direct competitors here are the Pentium 4 Extreme Edition 3.73GHz, the Athlon 64 FX-55, and the Opteron 175. The FX-55 is AMD’s fastest single-core solution, while the P4 XE 3.73GHz is Intel’s fastest single-core part. Both are priced comparably to the Pentium XE 840. More intriguingly, the Opteron 175 is AMD’s dual-core chip, also priced at $999 along with the XE 840 and the rest. Moreover, the Opteron 175 is essentially the same thing as an Athlon 64 X2 4400+.
We have included results for the Pentium D 840 in our testing, which we obtained by disabling Hyper-Threading on our Extreme Edition 840. Since the Pentium D 840 is just an Extreme Edition 840 sans HT, the numbers should be valid.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test systems were configured like so:
|Processor||Opteron 148 2.2GHz
Opteron 152 2.6GHz
Opteron 175 2.2GHz
Dual Opteron 248 2.2GHz
Dual Opteron 252 2.6GHz
Dual Opteron 275 2.2GHz
|Xeon 3.2GHz (Nocona 1MB)
Xeon 3.4GHz (Nocona 1MB)
Dual Xeon 3.2GHz (Nocona 1MB)
Dual Xeon 3.4GHz (Nocona 1MB)
| Pentium 4 660 3.6GHz
Pentium D 840 3.2GHz
Pentium Extreme Edition 840 3.2GHz
|Pentium 4 Extreme Edition 3.73GHz||Athlon 64 4000+
Athlon 64 FX-55
|System bus||1GHz HyperTransport||800MHz (200MHz quad-pumped)||800MHz (200MHz quad-pumped)||1066MHz (266MHz quad-pumped)||1GHz HyperTransport|
|Motherboard||Tyan Thunder K8WE S2895||SuperMicro X6DAL-G||Intel D955XBK||Intel D955XBK||Asus A8N-SLI Deluxe|
|BIOS revision||2/21/2005 beta||080010||BK95510J.86A.1152||BK95510J.86A.1234||MCT2/dualcore|
|North bridge||nForce Professional 2200
nForce Professional 2050
AMD 8131 PCI-X Tunnel
|Intel E7525||955X MCH||955X MCH||nForce4 SLI|
|Chipset drivers||SMBus driver 4.45
IDE driver 4.75
|OS integrated||INF Update 220.127.116.119||INF Update 18.104.22.1689||SMBus driver 4.45
IDE driver 4.75
|Memory size||2GB (4 DIMMs)||2GB (4 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)|
|Memory type||OCZ PC3200 512MB registered ECC DDR SDRAM at 400MHz||Kingston PC3200 512MB registered ECC DDR DRAM at 333MHz||Corsiar XMS2 5400UL DDR2 SDRAM at 533MHz||Corsiar XMS2 5400UL DDR2 SDRAM at 667MHz||Corsair XMS Pro 3200XL DDR SDRAM at 400MHz|
|CAS latency (CL)||3||2.5||3||4||2|
|RAS to CAS delay (tRCD)||3||3||2||2||2|
|RAS precharge (tRP)||3||3||2||2||2|
|Cycle time (tRAS)||8||7||8||8||5|
|Hard drive||Maxtor DiamondMax 10 250GB SATA 150|
with NVIDIA 4.60 drivers
with Realtek 22.214.171.12420 drivers
with SigmaTel 5.10.4456.0 drivers
with SigmaTel 5.10.4456.0 drivers
with Realtek 126.96.36.19920 drivers
|Graphics||GeForce 6800 Ultra 256MB PCI-E with ForceWare 71.84 drivers|
|OS||Windows XP Professional x64 Edition|
Note that we have more total memory on the workstation-class setups. I don’t believe any of our benchmarks are constrained by available RAM in a 1GB system, but you’ll still want to keep the difference in mind.
All tests on the Pentium systems were run with Hyper-Threading enabled, except where otherwise noted.
Thanks to Corsair, OCZ, and Kingston for providing us with memory for our testing. This matchup required lots of high-quality RAM, so we had to spread the love around. All three brands are far and away superior to generic, no-name memory.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- SiSoft Sandra 2005 SR1 10.50 64-bit
- ScienceMark 2.0 64-bit
- Compiled binary of C Linpack port from Ace’s Hardware
- POV-Ray for Windows 3.6 64-bit
- SMPOV 4.3
- 3ds max 7.0
- Cinebench 2003
- LAME MT 3.97a 64-bit
- Xmpeg 5.0.3 with DivX Video 5.21
- Windows Media Encoder 9
- Sphinx 3.3
- picCOLOR v4.0 build 545 64-bit
- DOOM 3 1.1 with trdelta1 demo
- Far Cry 1.3 with tr3-pier demo
- Unreal Tournament 2004 v3355 with trdemo1
- 3DMark05 v120
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
We generally start out with some memory subsystem tests, so we can see how the processors match up on that front. These results sometimes help us to understand some of the later benchmark results from real applications.
Despite the presence of a second CPU core on the bus, the Extreme Edition 840 accesses memory only slightly slower than the Pentium 4 660. The P4 Extreme Edition 3.73GHz, with its faster bus (and matching faster RAM in our test config), achieves higher throughput with comparable latencies. The AMD chips, meanwhile, use their built-in memory controllers to achieve extremely low latencies, especially the Athlon 64.
Up next are some gaming tests, which will essentially serve to illustrate the futility of running a dual-core processor in a single-threaded application. Notice that we’ve included above each result a little graph generated by the Windows Task Manager as the benchmark ran. This should give you some indication of the amount of threading in the application. In some cases with single-threaded apps like the games below, the task will oscillate back and forth between one CPU and the next, but total utilization generally won’t go above 25% for a quad-core (or quad-front-end, in the case of the XE 840 with Hyper-Threading) system.
We tested performance by playing back a custom-recorded demo that should be fairly representative of most of the single-player gameplay in Doom 3.
Our Far Cry demo takes place on the Pier level, in one of those massive, open outdoor areas so common in this game. Vegetation is dense, and view distances can be very long.
Unreal Tournament 2004
Our UT2004 demo shows yours truly putting the smack down on some bots in an Onslaught game.
Here is that big tradeoff in action. The Extreme Edition 840 can only use one of its two 3.2GHz cores in these games, and as a result, it can’t keep up with the Pentium 4 660 clocked at 3.6GHz, let alone the P4 XE 3.73Ghz. To make matters worse, both of the Athlon 64s are well ahead of any of the Pentiums. The XE 840 may be able to deliver acceptable (or better) frame rates in many of today’s games, but those UT2004 frame rates are low enough to make me a little nervous about this processor’s suitability for a high-end gaming rig.
The dual-core Opteron 175, interestingly enough, doesn’t ask users to make such a harsh tradeoff. The 175’s performance in these single-threaded games is quite good.
3DMark’s CPU test perhaps gives us a little glimpse of the future, when games are multithreaded. This test uses multithreading to execute a software vertex shader routine, and the XE 840 comes out near the top of the heap.
POV-Ray just recently made the move to 64-bit binaries, and thanks to the nifty SMPOV distributed rendering utility, we’ve been able to make it multithreaded, as well. SMPOV spins off any number of instances of the POV-Ray renderer, and it will bisect the scene in several different ways. For this scene, the best choice was to divide the screen up horizontally between the different threads, which provides a fairly even workload.
With four threads for rendering, the Extreme Edition 840 trounces its single-core competitors, the Athlon 64 FX-55 and Pentium 4 Extreme Edition 3.73GHz. Notice, also, how Hyper-Threading allows the XE 840 to shave nearly 30 seconds off of its shortest render time in comparison to the Pentium D 840. However, AMD’s dual-core chip, the Opteron 175, is faster still.
3dsmax 7 rendering
We tested 3ds max performance by rendering 20 frames of a sample scene at 320×240 resolution. This particular scene makes use of a motion-blur effect that requires extensive multi-pass rendering. We tried two different renderers: 3ds max’s default scanline renderer and its built-in version of the mental ray renderer.
Both of these renderers make good use of multithreading, and the results for the Extreme Edition 840 are positive. Once more, it soundly beats out the single-core FX-55 and XE 3.73GHz processors, but once more, the Opteron 175 is even faster.
Cinema 4D rendering
Cinema 4D’s rendering engine does a very nice job of distributing the load across multiple processors, as the Task Manager graph shows.
Cinema 4D’s rendering engine is a model of good multithreading, and the XE 840 thrives on it. Here, the XE 840 even beats out the Opteron 175 slightly.
The rest of Cinebench’s tests aren’t multithreaded, and it shows in the results. The XE 840 finishes consistently in the lower half of the pack, behind the P4 Extreme Edition 3.73GHz and FX-55.
LAME audio encoding
LAME MT is, as you might have guessed, a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. You can even download a paper (in Word format) describing the programming effort.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. The author notes, “In general, this approach is highly recommended, for it is exponentially harder to debug a parallel application than a linear one.”
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in our previous CPU reviews.
With only two threads, the Pentium XE 840 and Pentium D 840 produce near-identical encoding times. The XE 840 outperforms the Pentium 4 XE 3.73GHz here, though not by much.
Xmpeg/DivX video encoding
We used the Xmpeg/DivX combo to convert a DVD .VOB file of a movie trailer into DivX format. Like LAME MT, this application is only dual threaded.
Windows Media Encoder video encoding
We asked Windows Media Encoder to convert a gorgeous 1080-line WMV HD video clip into a 640×460 streaming format using the Windows Media Video 8 Advanced Profile codec.
The XE 840 encodes video quite a bit quicker than the single-core competition, and it essentially matches the Opteron 175.
We’re using the 64-bit beta version of ScienceMark for these tests, and several of its components are multithreaded. ScienceMark author Alexander Goodrich says this about the Molecular Dynamics simulation:
Molecular Dynamics is lightly multithreaded – one thread takes care of U/I aspects, and the other thread takes care of the computation. The computation itself is not multithreaded, though Tim and I were looking into ways of changing the algorithm to support multi-threading programming a couple years ago – it’s a lot of effort, unfortunately. When MD [is] running there [is] a total of 2 threads for the process.
Here are the results:
The Primordia test “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” Alex says this about it:
Primordia is multithreaded. Two main tasks occur which allow this to happen. Essentially, we identified 2 parallel tasks that could be done. We could probably take this a step further and optimize it even more. There is an issue, however, with the Pentium Extreme Edition that we’ve identified. The second computation thread gets executed on the logical HT thread rather than the 2nd core, so performance isn’t as good as it could be. This will be fixed in the next revision. This doesn’t effect [sic] the regular Pentium D. A workaround could include disabling HT on Pentium EE. There are 3 threads for primordia – 2 threads for computation, 1 thread for U/I.
The next two tests are only single-threaded, and they don’t make as good use of any of the CPUs here as they could if they were better optimized. The ScienceMark team has plans to incorporate linear algebra libraries from Intel and AMD in order to boost performance.
Unfortunately, the 64-bit ScienceMark beta needs a little bit of work. Somehow, it doesn’t comprehend Hyper-Threading properly on the XE 840, causing the XE to run slower than the Pentium D in the Moldyn and Primordia tests.
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX and SSE/2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:
This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.
We’re using the 64-bit port of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations at once.
The Pentium 4 has long excelled at parallel SIMD computation. The XE 840 continues that tradition hereand then some. All of the XE 840’s direct competitors are left choking in its dust.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance. However, the versions of Sphinx we’re using are only single-threaded.
Here’s a single-threaded app where the XE 840 does fairly well, just barely outpacing the Athlon 64 FX-55.
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA.
At our request, Dr. Müller, the program’s author, added larger image sizes to this latest build of picCOLOR. We were concerned that the thread creation overhead on the tests rather small default image size would overshadow the benefits of threading. Dr. Müller has also made picCOLOR multithreading more extensive. Eight of the 12 functions in the test are now multithreaded.
Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.
This version of picCOLOR also has some problems with Hyper-Threading, so the Pentium D turns in the faster score overall.
We measured the power consumption of our entire test systems, except for the monitor, at the wall outlet using a Watts Up PRO watt meter. The test rigs were all equipped with OCZ PowerStream 520W power supply units. The idle results were measured at the Windows desktop, and we used SMPOV and the 64-bit version of the POV-Ray renderer to load up the CPUs. In all cases, we asked SMPOV to use the same number of threads as there were CPU front ends in Task Managerso four for the dual Opteron 252, four for the Pentium XE 840, two for the Opteron 175, and so on.
The graphs below have results for “power management” and “no power management.” That deserves some explanation. By “power management,” we mean SpeedStep or PowerNow/Cool’n’Quiet. (In the case of the Pentium 4 600-series processors and the XE 840, the C1E halt state is always active, even in the “no power management” tests.) Sadly, the beta BIOS we used for our Tyan S2895 motherboard didn’t support AMD’s PowerNow, so we couldn’t report scores for the Opterons with power management enabled.
Intel has tamed the Prescott and Smithfield cores’ power consumption at idle through the use of SpeedStep and the C1E halt state. As a result, the dual-core Pentium XE 840 system consumes fewer watts at idle than the system based on AMD’s Opteron 175 (without the benefit of CPU power management.) The picture changes dramatically, though, when the processors are under load. The XE 840-based system pulls 313W at the wall socket, while the Opteron 175 box only requires 201W. Indeed, the XE 840 setup uses more power under load than the test rig that’s housing a pair of dual-core Opteron 275 processors.
Notably, there’s a 26W delta between the Pentium D 840 and the XE 840 under load, simply because of the addition of Hyper-Threading in the Extreme Edition.
Our benchmarks have dramatically illustrated the power of thread-level parallelism to enhance performance. The Pentium Extreme Edition 840 was able to take advantage of this effect to outshine its primary single-core competitors, the Athlon 64 FX-55 and P4 Extreme Edition 3.73GHz, in a range of multithreaded applications. Even without considering its multitasking benefits, the Pentium XE 840 looks like a worthy addition to Intel’s Extreme Edition lineup, so long as threading is the name of the game. Throw in the smoother multitasking that comes with any SMP system, and the logic behind the XE 840 begins to make sense.
Unfortunately, not all applications are multithreaded, and many won’t be for months or even years into the future. The relatively slow 3.2GHz clock speed of the XE 840 demands a real shift in mindset in order to accept the loss of single-threaded performance in exchange for multicore bliss. I wouldn’t expect big-time PC gamers, some of the Extreme Edition line’s supposed target customers, to accept this tradeoff willingly. Gamers would be better served by the P4 Extreme Edition 3.73GHz, or, as we have pointed out in the past, practically any variety of Athlon 64 down to the 3500+. Non-gamers who use mainly single-threaded applications may not want to accept the XE 840’s slower performance, either.
Probably the most direct competitor for the Pentium Extreme Edition 840, in terms of pricing and technology, is AMD’s Opteron 175, which is also a dual-core chip and also priced at a princely $999. Compared to the Opteron 175, the Extreme Edition 840 is generally somewhat slower overall in multithreaded applications and much slower overall in single-threaded applications. Not only that, but a system based on the Extreme Edition 840 is extreme in another sense: it consumes over 100W more power under load than an Opteron 175-based system. All told, that’s a rough combination of elements for the Extreme Edition 840. The competition is not making things easy. Perhaps the XE 840’s saving grace will be the relative obscurity of the Opteron 100 series, should AMD continue that odd tradition. However, since neither the Extreme Edition 840 nor the Opteron 175 is generally available for purchase today, I hate to speculate on how the availability picture will shake out.
Taking the long view, Intel seems to have a sound strategy for dual-core processors overall. Although this thousand-dollar wonder is a little bit inaccessible for most folks, the Pentium D 840 looks promising, and the cheaper versions look even more exciting. If Intel follows through with Smithfield chips for under $300, as I expect it will, then the Pentium D will be doing battle in earnest for the hearts and wallets of an awful lot of real PC enthusiasts, who will be asking some very hard questions about the merits of exchanging better gaming performance for better video encoding and creamy smooth multitasking. I’m curious to see what answers they choose.