The Extreme Edition’s stellar resume isn’t likely to intimidate AMD, because Athlon 64 processors have been outperforming Intel’s CPUs for a good while now. Just to be sure, though, AMD has cooked up its own new flagship CPU for release today, the Athlon 64 FX-60. The FX-60 isn’t as flashy as Intel’s new number, but it does represent a major change for AMD’s high-end gaming-oriented processors, because it is the first dual-core member of the FX product line.
Now, these two new thousand-dollar processors face one another in our broad suite of performance tests, ranging from highly multithreaded 64-bit rendering applications to the latest games. Thanks to new multithreaded graphics drivers, these new dual-core CPUs may even challenge their single-core counterparts for supremacy in 3D gaming. Let’s see whether the boffo specs on Intel’s new 65nm Extreme Edition processor translate into a credible challenge for the dual-core FX-60.
Presler gets extreme
The Pentium Extreme Edition 955 is the top member of a whole new family of Pentium desktop processors from Intel. At the heart of this lineup is a single chip, code-named Cedar Mill, which is a rendition of the Pentium 4’s familiar Netburst microarchitecture manufactured via Intel’s 65nm fab process. Cedar Mill processors pack 188 million transistors into a die that’s only 81 square millimeterswell below the 122mm2 of Pentium 4 “Prescott” processors thanks to the transition from 90nm to 65nm process tech. This reduction in die area comes in spite of the fact that Cedar Mill processors carry quite a few more transistors. Intel estimates Cedar Mill’s transistor count at 188 million, versus 169 million for the version of Prescott that has 2MB of L2 cache. Cedar Mill also has 2MB of L2 cache, so the additional transistors likely come from other sources, including the addition of support for Intel’s new virtualization technology, dubbed VT.
Cedar Mill chips support not only VT but the entire legacy of Intel alphabet-soup extensions, including MMX, SSE/2/3, EM64T, and HT or Hyper-Threading. This latest addition to the soup consists of a handful of new instructions intended to facilitate the creation and operation of multiple virtual machines on a single CPU, a la virtual machine software packages like VMWare. Hardware assisted virtualization can segment virtual machines at a lower level than software packages alone, allowing for better partitioning between VMs for the sake of security, stronger isolation of faults or crashes to a single VM, and higher performance. The use of virtual machines is largely confined to servers right now, but virtualization will likely spread to the desktop in the coming years for the sake of security or digital rights management schemesor because I’d really like to run the MacOS alongside Windows, assuming Mr. Jobs will allow such crazy things to happen.
Intel will mix and match Cedar Mill silicon and features to the various products in its desktop CPU lineup, disabling Hyper-Threading or binning out clock speeds according to its needs. Cedar Mill chips in their most basic form will make up the Pentium 4 6×1 series of processors at clock speeds ranging from 3GHz for model 631 to 3.6GHz for the Pentium 4 661. These products will talk to the world via an 800MHz front-side bus and will not support VT technology.
The more radical implementations of Cedar Mill fall under the umbrella of the “Presler” code name. Presler is not a separate chip, but two Cedar Mill chips situated together on a single package to make a “dual-core” CPU. Like prior dual-core CPUs from Intel, Presler’s two halves communicate with one another over a shared front-side bus, with no provisions for point-to-point intra-chip data transfers. Thus, there’s really no performance penalty for moving to a two-chip design. There are manufacturing advantages, though. Any two Cedar Mill chips can be joined together to make a Presler processor, and the total surface area of the wafer that must be defect-free to produce a Cedar Mill chip is only the aforementioned 81 mm2. Yes, it has to happen twice, but not on adjacent portions of the wafer or even on the same wafer. Contrast that with with the die sizes of the Pentium D or the Athlon 64 X2both roughly 200 square millimetersand you begin to see the advantages of Presler’s dual-chip-per-package approach. These puppies should be much cheaper to produce as Intel’s yields ramp up on its 65nm process.
Presler CPUs will form the meat of Intel’s desktop processor lineup, the Pentium D 900 series. The 900s initially range from the 920 at 2.8GHz to the 950 at 3.4GHz, all riding an 800MHz bus with 2MB of L2 per core. These products have VT support enabled but not Hyper-Threading. (Trust me, all of this somehow makes sense to the marketing majors inside Intel, if not the engineers.)
Our subject today, the Pentium Extreme Edition 955, is the fully-realized, self-actuated version of Presler with support cranked up for all of the goodies, including Hyper-Threading, VT, and 2MB of cache per core. The Extreme Edition also separates itself from the riff-raff with its ability to run on a 1066MHz front-side busa welcome development given the prevalence of dual-channel memory subsystems on Intel core-logic chipsets and the rise of DDR2 memory modules capable of running at 800MHz and beyond. Progress often comes at a penalty, and in this case, the price we pay is motherboard compatibility. Although the Extreme Edition 955 comes in the same LGA775 package as other recent desktop Pentiums, it requires additional voltage that can only be supplied by newer motherboards, such as the boards arriving alongside the Extreme Edition 955 that are based on the Intel 975X chipset.
The 975X chipset: Intel does dual graphics
The 975X pairs up a revamped north bridge with the familiar ICH7R south bridge I/O chip.
The only notable change between the older 955X chipset and the new 975X is that the 975X can slice up its 16 lanes of PCI Express dedicated to graphics into a pair of eight-lane connections for use in dual-graphics schemes. ATI’s CrossFire and S3’s DuoChrome will support the 975X, but NVIDIA apparently intends to maintain a driver-based lockout that will limit SLI to its own nForce4 chipsets. This development comes as something of a surprise, since rumors were circulating widely about NVIDIA supporting SLI on the 975X, and even some of the initial communications we received from Intel about the 975X mentioned SLI support. NVIDIA has long claimed that it would validate third-party hardware for SLI if the hardware were submitted to its SLI validation program. We recently learned at CES that Intel and its motherboard partners have submitted 975X boards for validation, and we also learned that NVIDIA is not likely to validate those boards for SLI. We also got the distinct impression that NVIDIA’s refusal to validate the 975X is almost certainly not motivated by technical problems, especially given the fact that the initial SLI demo systems were based on Intel core logic chips.
Intel says it’s still working on getting SLI validation for the 975X, but I wouldn’t hold my breath waiting for it. Perhaps if Intel agrees to certify nForce chipsets for its Centrino or Viiv platforms, NVIDIA will open up SLI support on the 975X. Turnabout is fair play, right?
Regular readers may recall that our review of the Pentium Extreme Edition 955 was delayed by thermal problems, as we’ve explained. Intel claims the problems we experiencedwith two separate kits consisting of a motherboard, CPU, and coolerare still something of a mystery. The problem was resolved by switching to a third-party cooler with a different retention mechanism, and our current best guess is that the trouble we encountered was related to the CPU cooler retention mechanism. This possibility was first suggested to us by Chris Angelini, another reviewer who had encountered similar problems. Chris speculated that the CPU cooler might be tensioned too tightly, causing the Intel D975XBX motherboard to warp and thus preventing the cooler from making clean, even contact across the surface of the CPU. This theory would seem to explain why we saw relatively better (though not entirely satisfactory) results when using the cooler’s included TIM pad on initial installation of the CPU rather than thermal grease.
Whatever the case, we don’t believe the CPU itself was at fault. The Extreme Edition 955 pulls less power and thus dissipates less heat under load than its predecessor, the Extreme Edition 840, as our test results will show. Intel may have larger problems with its LGA775 thermal infrastructure, though. We’ll have to keep an eye on this issue.
AMD’s FX flagship doubles up
So Intel has chopped its dual-core product up into two chips, doubled the L2 cache, raised the clock and bus speeds, and performed a die shrink to 65nm. AMD’s response is much simpler than all of that. The Athlon 64 FX-60 is pretty much just an Athlon 64 X2 4800+ blessed with a 2.6GHz clock speed and a fancy name.
This simple move represents a major transition, though, because AMD’s high-end gaming CPU is at last making the leap to dual cores. The FX-60 gives up 200MHz to the Athlon 64 FX-57, but its second core should benefit from the introduction of multithreaded graphics drivers by both ATI and NVIDIA. In fact, AMD is saying that the Athlon 64 FX-57 may be the last of its high-speed, single-core processors. The FX-57 will exist alongside the FX-60 for the time being, allowing consumers to choose between a single 2.8GHz core or the equivalent of two Athlon 64 FX-55 cores in the FX-60.
Like all dual-core AMDs to date, the FX-60 is a 90nm chip. AMD has yet to make the transition to 65nm, but continues to claim that its transition to 65nm is “on track.” For now, AMD will have to rely on its current mix of CPU microarchitecture and 90nm SOI manufacturing capabilities to deliver an attractive performance per watt proposition. We’ll have to see whether the FX-60 can match the 65nm Presler there.
That brings us to the main event, the match-up between the FX-60 and the Extreme Edition 955, which is rife with subplots. The biggest question, of course, is which of these high-dollar CPUs can claim to be fastest, and the natural follow-up will deal with power consumption as it relates to performance. Beyond that, we’ll want to keep track of several other notable questions.
We expect some sort of a boost from multithreaded graphics drivers (we’ll be testing with NVIDIA’s), but how much and what sort of performance gains? NVIDIA has said that it will offload some vertex processing to the CPU in these drivers, and so we should expect to see gains at lower resolutions, where vertex throughput is more likely a bottleneck than graphics pixel-pushing power. Fortunately, we already tend to test CPU gaming performance at lower resolutions precisely because we don’t want graphics fill rates to become a bottleneck. Then again, the performance gains from multithreading in the graphics driver aren’t likely to be huge. Can dual-core CPUs really take advantage of multithreaded drivers well enough to outpace faster single-core processors in otherwise single-threaded games? For that matter, can the single-core Pentium 4 Extreme Edition 3.73GHz capitalize on its Hyper-Threading abilities to become more competitive with the Athlon 64 FX-57 in 3D games?
In a similar vein, we’ve tested largely with 64-bit applications on Windows XP Pro x64 Edition, and many of those applications are multithreaded. We’ll be interested to see how newer 64-bit code and multithreading affect performance on different CPU microarchitectures.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test systems were configured like so:
|Processor||Pentium Extreme Edition 840 3.2GHz||Pentium 4 Extreme Edition 3.73GHz
Pentium Extreme Edition 955 3.4GHz
| Athlon 64 FX-60 2.6GHz
Athlon 64 FX-57 2.8GHz
Athlon 64 X2 4800+ 2.4GHz
|System bus||800MHz (200MHz quad-pumped)||1066MHz (266MHz quad-pumped)||1GHz HyperTransport|
|Motherboard||Intel D975XBX||Intel D975XBX||Asus A8N32-SLI Deluxe|
|North bridge||975X MCH||975X MCH||nForce4 SLI X16|
|South bridge||ICH7R||ICH7R||nForce4 SLI|
|Chipset drivers||INF Update 126.96.36.1996
Intel Matrix Storage Manager 188.8.131.525
|INF Update 184.108.40.2066
Intel Matrix Storage Manager 220.127.116.115
|SMBus driver 4.5
IDE/SATA driver 5.52
|Memory size||2GB (2 DIMMs)||2GB (2 DIMMs)||2GB (2 DIMMs)|
|Memory type||Crucial Ballistix PC2-8000
DDR2 SDRAM at 800MHz
|Crucial Ballistix PC2-8000
DDR2 SDRAM at 800MHz
DDR SDRAM at 400MHz
|CAS latency (CL)||4||4||2.5|
|RAS to CAS delay (tRCD)||4||4||3|
|RAS precharge (tRP)||4||4||3|
|Cycle time (tRAS)||15||15||8|
|Hard drive||Maxtor DiamondMax 10 250GB SATA 150|
with SigmaTel 5.10.4825.0 drivers
with SigmaTel 5.10.4825.0 drivers
with Realtek 18.104.22.16870 drivers
|Graphics||GeForce 7800 GTX 512 PCI-E with ForceWare 81.98 drivers|
|OS||Windows XP Professional x64 Edition
Windows XP Professional with Service Pack 2 (WorldBench only)
All tests on the Pentium systems were run with Hyper-Threading enabled, except where otherwise noted.
Thanks to Crucial for providing us with memory for our testing. Their products and support are both far and away superior to generic, no-name memory.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- SiSoft Sandra 2005 SR3 10.10.69 64-bit
- CPU-Z 1.31
- Compiled binary of C Linpack port from Ace’s Hardware
- POV-Ray for Windows 3.6.1 64-bit
- SMPOV 4.3
- Cinebench 2003 64-bit Edition
- 3ds max 7.0 with Service Pack 1
- LAME MT 3.97a 64-bit
- Windows Media Encoder 9 x64 Edition
- Sphinx 3.3
- picCOLOR 4.0 build 561 64-bit
- Half-Life 2 64-bit Edition with trbuggy2 demo
- Battlefield 2 1.12
- FEAR 1.02
- Unreal Tournament 2004 v3369 and 3369 64-bit Edition with trdemo1
- 3DMark05 v120
- WorldBench 5.0
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
These synthetic tests don’t always mirror real-world performance, but they can tell us some interesting things about the CPUs and their memory subsystems, so we’ll start here.
The Extreme Edition 955’s combination of 800MHz DDR2 memory and a 1066MHz front-side bus gives it an edge in raw memory bandwidth. The FX-60’s RAM runs at half that clock speed, but it’s still fast enough to stay in the same neighborhood.
Our simple Linpack test isn’t optimized anywhere near well enough as some versions that serve as excellent scientific computing benchmarks. It can, however, show us the basic shape and bandwidth of the L1 and L2 caches in these CPU cores. The FX-60 delivers performance right between the 2.4GHz Athlon 64 X2 4800+ and the 2.8GHz FX-57. However, the Presler gives us something of a surprise; at 3.46GHz, its L2 cache proves faster than that of the Pentium 4 Extreme Edition 3.73GHz. We’ve seen this sort of thing before, like when the Athlon 64 3500 moved from 130nm to 90nm and the cache got faster. The likely explanation is that Intel made the cache faster, clock for clock, on Cedar Mill than on its 90nm chips.
The Athlon 64 processors have had a leg up in memory access latency ever since AMD brought a memory controller onboard the CPU. That advantage remains here, despite the Extreme Edition 955’s 1066MHz bus and fast DDR2 memory.
We tested F.E.A.R. by manually playing through a specific point in the game five times for each CPU while recording frame rates using the FRAPS utility. Each gameplay sequence lasted 60 seconds. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe that five sample sessions is sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.
Above the following benchmark graph, and throughout most of the tests in the review, we’ve included a Task Manager plot showing CPU utilization. These plots were captured on the Pentium Extreme Edition 955 system, and they should offer some indication of how much impact multithreading has on the operation of each application. Single-threaded apps may sometimes show up as spread across multiple processors in Task Manager, but the total amount of space below all four lines shouldn’t equal more than the total area of one square if the test is truly single-threaded. Anything significantly more than that is probably an indication of some multithreaded component in the execution of the test. (FRAPS was not running when we captured the Task Manager plots.)
We played F.E.A.R. with both CPU and graphics performance options set to the game’s built-in “High” settings.
The single-core Athlon 64 FX-57 just edges out the dual-core FX-60 in this single-threaded game. Looks to me like the graphics drivers are indeed making use of multithreading here, based on the Task Manager output. In fact, the Extreme Edition 955 edges out the Pentium 4 Extreme Edition 3.73GHz, despite a substantial clock speed deficit. Median low frame rates track largely with the average frame rates, but note how the Extreme Edition 955 keeps pace with the Athlon 64 X2 4800+ on this front. All things considered, that’s impressive for a desktop Pentium.
We used FRAPS to capture BF2 frame rates just as we did with F.E.A.R. Graphics quality options were set to BF2’s canned “High” quality profile. This game has a built-in cap at 100 frames per second, and we intentionally left that cap enabled so we could offer a faithful look at real-world performance.
The dual-core processors pull ahead in BF2, probably thanks to NVIDIA’s multithreaded video drivers. All of these CPUs can play BF2 quite well, as the average and low frame rate numbers attest. Our seat-of-the-pants experience was good across the board. The most impressive number here, though, has got to be the FX-60’s low frame rate of 81 FPS. That’s well ahead anything else in the pack, which suggests the FX-60 has lots of headroom left in it for future games with more complex physics, AI, and the like.
Unreal Tournament 2004
We used a more traditional recorded timedemo for testing UT2004, but we tried out two versions of the game, the original 32-bit flavor and the just-released 64-bit version. UT has long been one of the more CPU-bound games around. Can multithreaded drivers and a 64-bit executable help matters?
Indeed, both things seem to help. The dual-core CPUs outperform the higher-frequency single cores, and the relatively slower Intel processors show some gains from the move to 64 bits. The FX-60 stakes a claim on the FX line’s traditional territory as the fastest gaming processor, though, while the Extreme Edition 955 can only outdo its Pentium siblings.
We also decided to try out the 64-bit version of Half-Life 2. This one is also a timedemo.
Again, the AMD processors take the top three spots, and again, the dual-core CPUs make a statement about their ability to serve as gaming processors.
These results would seem jarring had we not seen similar things in several of the games on the previous page. Dual-core processors are now faster for 3DMark05and not just for the multithreaded CPU test, but for the single-threaded main tests. The two new CPUs prove faster than their predecessors, but all of the Athlon 64s are faster than their Intel counterparts.
The Task Manager plots tell the story of multithreading in the graphics drivers. The drivers really make use of multiple CPUs in the “Firefly Forest” and “Canyon Flight” scenes, it appears.
3DMark’s CPU tests use the processor to handle vertex calculations, and they are inherently multithreaded. In CPU test 1, the Extreme Edition 955’s four logical CPUs are all put to good use, and it takes the overall 3DMark CPU test as a result.
WorldBench overall performance
WorldBench uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. More impressively, WorldBench spits out individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests. Because WorldBench tests are entirely scripted, we weren’t able to capture Task Manager plots for them.
AMD’s lead from our gaming test carries over into WorldBench, surprisingly enough. We’ve seen closer results between AMD and Intel processors in WorldBench in the past, but scores are much higher now overall, and the field has separated in AMD’s favor.
Audio editing and encoding
LAME MP3 encoding
LAME MT is, as you might have guessed, a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. You can even download a paper (in Word format) describing the programming effort.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. The author notes, “In general, this approach is highly recommended, for it is exponentially harder to debug a parallel application than a linear one.”
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in our previous CPU reviews.
It’s close, especially with the Intel compiler, but the FX-60 proves fastest at audio encoding.
This one doesn’t look to be multithreaded; the fast single-core CPUs fare best.
Video editing and encoding
Windows Media Encoder x64 Edition Advanced Profile
We asked Windows Media Encoder to convert a gorgeous 1080-line WMV HD video clip into a 320×240 streaming format using the Windows Media Video 8 Advanced Profile codec.
Windows Media Encoder
VideoWave Movie Creator
The tests change, but the FX-60 stays on top. The Extreme Edition 955 is clearly the fastest Intel processor of the bunch, but that’s not enough.
The FX-57 hangs on by a toenail in these two image processing tests, but the FX-60 is close enough that it doesn’t matter.
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded.
Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.
Chalk up another win for the FX-60 and AMD. The dual-core processors outrun the single-core models across the board here, too.
Multitasking and office applications
WorldBench’s Office test involves switching between the various components of the Office suite, which are all running at once. The whole field runs pretty close together overall here, but once more, the FX-60 finishes first.
Mozilla and Windows Media Encoder
The Mozilla test is sensitive to memory access latencies, as I understand it, and that helps explain why the Athlon 64s clean up here. Adding Windows Media Encoder alongside it for a multitasking test moderates things a bit, but the Athlon 64s are faster by a wide margin nonetheless.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.
At last, a place where the Pentiums can claim a win. The Extreme Edition 955 can’t quite push past its single-core sibling here, but it’s very close considering the difference in clock frequency.
The tables turn back AMD’s way with WinZip and Nero.
3D modeling and rendering
Cinebench measures performance in Maxon’s Cinema 4D modeling and rendering app. This is the new 64-bit version of Cinebench, primed and ready for these new 64-bit processors.
The FX-60 continues to astonish with its dominance.
Cinebench’s shading tests are single-threaded, and it shows in the order of results.
POV-Ray just recently made the move to 64-bit binaries, and thanks to the nifty SMPOV distributed rendering utility, we’ve been able to make it multithreaded, as well. SMPOV spins off any number of instances of the POV-Ray renderer, and it will divvy up the scene in several different ways. For this scene, the best choice was to divide the screen horizontally between the different threads, which provides a fairly even workload.
We considered using the new beta of POV-Ray with native support for SMP, but it proved to be very, very slow. We’ll have to try it again once development has progressed further.
We’ve been rendering this same scene at the same resolution since the 800MHz days, and render times are now a tenth of what they used to be. With multiple threads, the Pentiums get a nice boost thanks to Hyper-Threading. Overall, though, AMD once more renders the scene fastest.
3dsmax 7 rendering
We tested 3ds max performance by rendering 20 frames of a sample scene at 320×240 resolution. This particular scene makes use of a motion-blur effect that requires extensive multi-pass rendering. We tried two different renderers: 3ds max’s default scanline renderer and its built-in version of the mental ray renderer.
AMD, uh, wins again. Sorry, folks, but we don’t rig ’em like NASCAR.
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX and SSE/2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:
This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.
We’re using the 64-bit port of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations at once.
High degrees of optimization can produce very good results on the Pentium Netburst microarchitecture, especially if the math being done lends itself to such things. Unfortunately, this example isn’t typical of the vast bulk of the real-world tests we’ve run.
We measured the power consumption of our entire test systems, except for the monitor, at the wall outlet using a Watts Up PRO watt meter. The test rigs were all equipped with OCZ PowerStream 520W power supply units. The idle results were measured at the Windows desktop, and we used SMPOV and the 64-bit version of the POV-Ray renderer to load up the CPUs. In all cases, we asked SMPOV to use the same number of threads as there were CPU front ends in Task Managerso four for the Pentium XE 840, two for the Athlon 64 X2, and so on.
The graphs below have results for “power management” and “no power management.” That deserves some explanation. By “power management,” we mean SpeedStep or Cool’n’Quiet. In the case of the Pentium XE 840 CPU, the C1E halt state is always active, even in the “no power management” tests. The Extreme Edition 955 and the P4 Extreme Edition 3.73GHz don’t support the C1E halt state or SpeedStep.
The FX-60 and Extreme Edition 955 consume exactly the same amount of power at idleuntil AMD’s Cool’n’Quiet kicks in and the FX-60’s power use drops by 45 watts. It’s unfortunate than Intel chose not to include power management in the Extreme Edition 955.
The move to 65nm serves the Extreme Edition 955 well. Despite having more L2 cache, more transistors, and a higher clock frequency, it uses less power under load than the Extreme Edition 840. Sadly for Intel, the process shrink isn’t sufficient to close the power consumption gap with AMD’s amazing dual-core Athlons, which barely require more power than their single-core counterparts.
I’ve gotta hand it to the Extreme Edition 955 for one thing: it overclocks like a champ. Our sample would boot into Windows at 4.5GHz, although it wasn’t quite stable thereit threw calculation errors in Prime95. The thing was rock-solid at 4.266GHz, though, with only a minor voltage increase. Now, mind you, I was using one of these things to cool it, but it still hit 4.26GHz on air cooling.
Intel, by the way, is now doing what AMD has long done with the FX series and making the upper multiplier on the Extreme Edition unlocked. I suppose if you pay $999 for a processor, they figure you should be able to have some fun with it.
Anyhow, the FX-60 wasn’t as strong an overclocker, relatively speaking. It wouldn’t boot into Windows at 3.0GHz, even with lots of extra voltage. At 2.9GHz, core 0 threw errors in Prime95, although core 1 seemed fine. At 2.8GHz, both cores were stable in Prime95 at 1.4V.
Fortunately for the FX-60, it started out pretty fast anyhow. One can see how Intel might have remained more competitive had they been able to raise clock speeds on the Netburst architecture as originally planned. At 4.26GHz, it’s pretty darned fast.
More clock speed means more power consumption, though, and the 4.26GHz monster system sucks up 339W under load.
The Pentium Extreme Edition marks real progress for Intel on multiple fronts. It is the fastest all-around desktop CPU that Intel has ever produced, and thanks to its faster bus, larger cache, and higher clock speeds, the Extreme Edition 955 consistently outruns the older Extreme Edition 840. These features, combined with NVIDIA’s multithreaded graphics drivers, even make the Extreme Edition 955 a reasonably solid choice for 3D gamingfaster than the P4 Extreme Edition 3.73GHz, believe it or not. At the same time, the Extreme Edition 955 consumes less power at peak than the Extreme Edition 840, proving that Intel’s 65nm fabrication process can deliver the tangible benefits that we’ve come to expect from a die shrink. That’s comforting news after our faith was shaken by the Pentium 4’s power and heat problems at 90nm. Not only that, but there’s apparently quite a bit of clock frequency headroom left in this 3.46GHz processor. Ours ran stable for hours at 4.26GHz with nary a hiccup.
I’ve never recommended buying a $999 processor and I’m not going to start now, but most of these things are good news for the rest of Intel’s desktop processor lineup, as well. We will have to test the new Pentium D 900 series soon to see exactly how it handles. Products in the Pentium D 900 line are available in the wild right now, and the Extreme Edition 955 should be available in the next week or so, according to Intel.
Our test results make it clear, however, that Intel probably won’t be able to catch up with AMD using processors based on the Netburst microarchitecture; it will have to wait for its new microarchitecture for that. Despite being produced at 90nm and having a much lower clock speed, the Athlon 64 FX-60 nearly ran the tables in our array of benchmarks, and it did so while consuming less powerboth at idle and under loadthan the Pentium Extreme Edition 955. The FX-60’s performance dominance wasn’t always deep, but it was very wide, with the top spot in only a few tests going to an Intel processor.
I had kind of expected our use, this time out, of newly compiled 64-bit binaries, multithreaded applications that can take advantage of Hyper-Threading, exceptionally fast Crucial 800MHz DDR2 memory, and multithreaded graphics drivers to give the Netburst architecture something of a boost in relative performance. Turns out that wasn’t the case.
AMD was certainly right to choose this time to transition the FX line to dual-core processors. The FX-60 looked every bit a worthy successor to the FX-57 in our gaming tests, and it creamed the FX-57 in pretty much anything multithreaded. It only makes sense for AMD’s image product to be a dual-core design, as Intel’s has been for a while now. Fortunately, the FX-60 requires virtually no compromises of gamers, despite its slightly lower clock speed than the FX-57.
One wouldn’t expect to have to make any compromises if one were paying $1031 for a microprocessor, which is what AMD intends to charge for the Athlon 64 FX-60. Personally, I’d be content with an overclocked Athlon 64 X2 3800+, but if you want to have the fastest desktop CPU money can buy, the FX-60 is undoubtedly it.