Before you nod off to sleep and plant your face in the keyboard, realize that this CPU is actually a pair of chips wound up to 3.73GHz and, well, that’s a lot and stuff. Perhaps more importantly, thanks to refinements to its 65-nanometer manufacturing process, Intel has found a way to crank up the clock frequency while dialing back the heat on this double-barreled blowtorch of death. In fact, it’s more like a blowtorch of pain now, with power consumption actually reduced from the Pentium Extreme Edition 955 we reviewed a few months back, despite the 965’s increased clock speed.
The result could be that this CPU based on a lame-duck microarchitecture manages to do something few Pentiums have done in recent times: catch up with the competition from AMD in terms of performance and power use. And if that doesn’t work, we can always try turning up the clock speed to 4.53GHz, right? Let’s have a look at what Intel’s new fastest processor has to offer.
The 965 takes a bow
For the unfamiliar, Pentium Extreme Edition processors are Intel’s flagship desktop products, the top-of-the-line fastest CPUs it sells, traditionally priced just one buck shy of a cool thousand. The Extreme Edition 965 is primarily distinguished from the previous model 955 by its higher 3.73GHz clock speed. Like the rest of the Pentium D 900 series, Intel manufactures these chips using its 65nm fabrication process, and although they’re billed as dual-core processors, they’re really more like Siamese twins, with a pair of identical Pentium 4 “Cedar Mill” chips arranged together in one package. Each “core” is an independent CPU, complete with its own 2MB of L2 cache onboard.
Because the 965 is an Extreme Edition, though, it has a few extras the Pentium D 900 series lacks. The 965 comes with official support for a 1066MHz front-side bus, allowing it to talk to the rest of the systemand its two cores to one anotherat an accelerated pace. Dual-core Extreme Editions also have support for Hyper-Threading, which creates an irresistible bragging-rights scenario. Fire up Windows Task Manager or the like, and you’ll see four virtual CPUs showing on this single-socket wonder. If that’s not enough to impress your friends, perhaps the Extreme Edition’s unlocked multiplier will do the trick. This thing overclocks easily with no need for bus speed adjustments or running the rest of the system at odd frequencies. Heck, the Intel motherboard we used for this review comes complete with easy BIOS-based multiplier adjustments and fine-grained control over CPU overvolting. (There was a day when the impact of those words would have bordered on cataclysmic, but now, such things are practically expected, even from Intel.)
The Pentium Extreme Edition 965 processor comes in a standard LGA775 package The 965 has also learned a trick the 955 didn’t know: the enhanced “C1E” halt state that kicks in when the operating system lets the CPU know it can sit idle briefly. C1E turns down the CPU clock frequency dynamically, conserving power and reducing heat production. Previous Pentium 4 and D processors came with C1E halt state, but it wasn’t implemented in Intel’s earlier production 65nm processors. This useful mechanism makes a return in recent steppings of the 965, including the one we received for review. When idle, this puppy eases back its clock rate to 3.2GHz. The 965 still doesn’t work with Intel’s Enhanced SpeedStep clock throttling tech, but that’s hardly a major drawback given the minimal practical differences between C1E and SpeedStep.
Here’s your morsel of Moore’s Law food for thought for the day: the Extreme Edition 965 is literally twice the CPU of the Pentium 4 Extreme Edition 3.73GHz introduced just a little more than a year ago. The P4 Extreme Edition 3.73GHz was a single-core 90nm chip based on the same NetBurst microarchitecture, with the same 2MB of L2 cache, the same 1066MHz front-side bus, and obviously the same 3.73GHz clock speed. Only thing is, each of the Extreme Edition 965’s two cores have faster L2 caches than older 90nm processors, so the 965 is a little more than double the fun. Not bad for a year’s progress, huh?
Please note that the two Pentium D 900-series processors in our test are actually a Pentium Extreme Edition 955 chip that’s been set to the appropriate core and bus speeds and had Hyper-Threading disabled in order to simulate the actual products. Performance should be identical to the real McCoys.
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test systems were configured like so:
|Processor|| Pentium Extreme Edition 840 3.2GHz
Pentium D 930 3.0GHz
Pentium D 950 3.4GHz
|Pentium 4 Extreme Edition 3.73GHz
Pentium Extreme Edition 955 3.46GHz
|Pentium Extreme Edition 965 3.73GHz||Athlon 64 X2 3800+ 2.0GHz
Athlon 64 X2 4800+ 2.4GHz
Athlon 64 FX-57 2.8GHz
Athlon 64 FX-60 2.6GHz
Opteron 165 1.8GHz
Opteron 180 2.4GHz
|System bus||800MHz (200MHz quad-pumped)||1066MHz (266MHz quad-pumped)||1066MHz (266MHz quad-pumped)||1GHz HyperTransport|
|Motherboard||Intel D975XBX||Intel D975XBX||Intel D975XBX||Asus A8N32-SLI Deluxe|
|North bridge||975X MCH||975X MCH||975X MCH||nForce4 SLI X16|
|South bridge||ICH7R||ICH7R||ICH7R||nForce4 SLI|
|Chipset drivers||INF Update 220.127.116.116
Intel Matrix Storage Manager 18.104.22.1685
|INF Update 22.214.171.1246
Intel Matrix Storage Manager 126.96.36.1995
|INF Update 188.8.131.526
Intel Matrix Storage Manager 184.108.40.2065
|SMBus driver 4.5
IDE/SATA driver 5.52
|Memory size||2GB (2 DIMMs)||2GB (2 DIMMs)||2GB (2 DIMMs)||2GB (2 DIMMs)|
|Memory type||Crucial Ballistix PC2-8000
DDR2 SDRAM at 800MHz
|Crucial Ballistix PC2-8000
DDR2 SDRAM at 800MHz
|Crucial Ballistix PC2-8000
DDR2 SDRAM at 800MHz
DDR SDRAM at 400MHz
|CAS latency (CL)||4||4||4||2.5|
|RAS to CAS delay (tRCD)||4||4||4||3|
|RAS precharge (tRP)||4||4||4||3|
|Cycle time (tRAS)||15||15||15||8|
|Hard drive||Maxtor DiamondMax 10 250GB SATA 150|
with SigmaTel 5.10.4825.0 drivers
with SigmaTel 5.10.4825.0 drivers
with SigmaTel 5.10.4825.0 drivers
with Realtek 220.127.116.1170 drivers
|Graphics||GeForce 7800 GTX 512 PCI-E with ForceWare 81.98 drivers|
|OS||Windows XP Professional x64 Edition
Windows XP Professional with Service Pack 2 (WorldBench only)
Thanks to Crucial for providing us with memory for our testing. Their products and support are both far and away superior to generic, no-name memory.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- SiSoft Sandra 2005 SR3 10.10.69 64-bit
- CPU-Z 1.31
- Compiled binary of C Linpack port from Ace’s Hardware
- POV-Ray for Windows 3.6.1 64-bit
- SMPOV 4.3
- Cinebench 2003 64-bit Edition
- 3ds max 7.0 with Service Pack 1
- LAME MT 3.97a 64-bit
- Windows Media Encoder 9 x64 Edition
- Sphinx 3.3
- picCOLOR 4.0 build 561 64-bit
- Half-Life 2 64-bit Edition with trbuggy2 demo
- Battlefield 2 1.12
- FEAR 1.02
- Unreal Tournament 2004 v3369 and 3369 64-bit Edition with trdemo1
- 3DMark05 v120
- WorldBench 5.0
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Although the dual-channel DDR2-800 memory subsystems on our Intel 975X test rigs have a peak theoretical memory bandwidth of 12.8GB/s, the Extreme Edition 965’s system bus maxes out in theory at 8.5GB/s. In actual use, with overhead, our synthetic memory bandwidth tests place this combo at about 6.6GB/sbelow its theoretical peak but above anything else around. The dual-core Extreme Edition 965 achieves only a hair’s breadth more throughput than the single-core P4 Extreme Edition 3.73GHz
Our rendition of Linpack is no great measure of scientific computational power, but it does give the Extreme Edition 965 a chance to show off its boffo L2 cache, which markedly outperforms the 90nm Pentium 4 XE 3.73GHz’s like-sized cache. Intel’s 65nm SRAM offers superior performance at the same clock rate.
The 965’s impressive memory bandwidth and large, speedy L2 cache can help mask memory access latencies, but those latencies remain quite a bit longer than on Athlon 64 processors with their built-in, on-chip memory controllers. Note, also, that the Extreme Edition 965’s faster front-side bus doesn’t really help cut access latencies in comparison to the Pentium systems with an 800MHz bus.
We tested F.E.A.R. by manually playing through a specific point in the game five times for each CPU while recording frame rates using the FRAPS utility. Each gameplay sequence lasted 60 seconds. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.
We played F.E.A.R. with both CPU and graphics performance options set to the game’s built-in “High” settings.
Above the following benchmark graph, and throughout most of the tests in this review, we’ve included a Task Manager plot showing CPU utilization. These plots were captured on the Pentium Extreme Edition 955 system, and they should offer some indication of how much impact multithreading has on the operation of each application. Single-threaded apps may sometimes show up as spread across multiple processors in Task Manager, but the total amount of space below all four lines shouldn’t equal more than the total area of one square if the test is truly single-threaded. Anything significantly more than that is probably an indication of some multithreaded component in the execution of the test. Because WorldBench’s tests are entirely scripted, however, we weren’t able to capture Task Manager plots for them.
We used FRAPS to capture BF2 frame rates just as we did with F.E.A.R. Graphics quality options were set to BF2’s canned “High” quality profile. This game has a built-in cap at 100 frames per second, and we intentionally left that cap enabled so we could offer a faithful look at real-world performance.
Unreal Tournament 2004
We used a more traditional recorded timedemo for testing UT2004, but we tried out two versions of the game, the original 32-bit flavor and the 64-bit version.
We also decided to try out the 64-bit version of Half-Life 2. This one is also a timedemo.
Our real-world application benchmarks begin painfully for the Extreme Edition 965, as it shows itself to be the cream of Intel’s crop but not up to the task of taking on AMD’s finest. This story will be a familiar one to many watchers of the CPU wars of the past couple years, but things have improved for Intel for several reasons. The Extreme Edition 965’s gravity-defying clock speed is one reason; although its performance per clock may be relatively weak in these types of applications, clock speed makes up for a lot. On top of that, the advent of multithreaded graphics drivers looks like it provides a real boost for the Extreme Edition 965 over its like-clocked P4 Extreme Edition 3.73GHz counterpart. As a result, the 965 could be a credible choice as the centerpiece of a gaming system, with average and median low frame rates that are passable in our tests. Our seat-of-the-pants impression during our gameplay testing was reasonably good, as well. You could save several nice fistfuls of cash and get comparable performance by going with a low-end Athlon 64 X2 instead, though.
Our 3DMark results align pretty well with the results of our gaming tests. 3DMark05’s CPU test is different animal, though; it’s multithreaded and uses the CPU to perform vertex calculations.
The Extreme Edition 965 excels at handling 3DMark’s multithreaded vertex processing algorithms, beating out even the ferocious Athlon 64 FX-60 in one test.
WorldBench uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. More impressively, WorldBench spits out individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.
Desktop processors from AMD and Intel used to be relatively closely matched in overall WorldBench performance, but the introductions of higher speed grades and revision-E CPU cores from AMD have opened up a performance gap. The Extreme Edition 965 makes up a little ground, but not nearly enough. Audio editing and encoding
LAME MP3 encoding
LAME MT is, as you might have guessed, a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. You can even download a paper (in Word format) describing the programming effort.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. The author notes, “In general, this approach is highly recommended, for it is exponentially harder to debug a parallel application than a linear one.”
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in many of our previous CPU reviews.
The Extreme Edition 965 breezes through our audio encoding tests with ease, but the FX-60 again takes the top spot overall.
Windows Media Encoder x64 Edition Advanced Profile
We asked Windows Media Encoder to convert a gorgeous 1080-line WMV HD video clip into a 320×240 streaming format using the Windows Media Video 8 Advanced Profile codec.
Windows Media Encoder
VideoWave Movie Creator
You can’t go wrong with an Extreme Edition 965 for video processing, but some of the AMD processors again finish faster.
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded.
Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.
The 965 further solidifies its grasp on the title of Intel’s Fastest Processor, but it’s a step behind the Athlon 64 FX 4800+ and its twin, the Opteron 180.
Mozilla and Windows Media Encoder
The 965’s performance in office applications is obviously excellentbut not as excellent as the competing AMD products’.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.
Despite its single-threaded nature, Sphinx runs faster on the Extreme Edition 965 than on anything else. Sphinx has shown an affinity for large, fast L2 caches with data prefetch mechanisms and systems with lots of memory bandwidth. The 965 delivers in spades on both counts. WinZip
The 965 XE retains its usual spot above the rest of the Pentiums but below the faster AMD offerings in these tests.
Cinebench measures performance in Maxon’s Cinema 4D modeling and rendering app. This is the 64-bit version of Cinebench, primed and ready for these 64-bit processors.
The Extreme Edition 965 nearly catches up to the Athlon 64 X2 4800+ in Cinebench’s multithreaded rendering test, aided by Hyper-Threading, which the Cinema 4D renderer uses well.
The rest of the Cinebench tests are single-threaded shading exercises, which the 965 handles adeptly.
POV-Ray just recently made the move to 64-bit binaries, and thanks to the nifty SMPOV distributed rendering utility, we’ve been able to make it multithreaded, as well. SMPOV spins off any number of instances of the POV-Ray renderer, and it will divvy up the scene in several different ways. For this scene, the best choice was to divide the screen horizontally between the different threads, which provides a fairly even workload.
We considered using the new beta of POV-Ray with native support for SMP, but it proved to be very, very slow. We’ll have to try it again once development has progressed further.
We’ve been rendering the same scene in POV-Ray for years, and it has been a long, hard struggle for the NetBurst microarchitecture to handle it well. With four threads, though, the 965 comes within a handful of seconds of matching the Athlon 64 X2 4800+. Not too shabby, all things considered. 3dsmax 7 rendering
We tested 3ds max performance by rendering 20 frames of a sample scene at 320×240 resolution. This particular scene makes use of a motion-blur effect that requires extensive multi-pass rendering. We tried two different renderers: 3ds max’s default scanline renderer and its built-in version of the mental ray renderer.
3ds max proves to be a tougher challenge for the Extreme Edition 965, especially with the mental ray renderer, where the relatively low-end Athlon 64 X2 3800+ outdoes it.
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX and SSE/2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:
This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm. The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assigns [sic] each thread to a different CPU.
We’re using the 64-bit port of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations at once.
If you want to do lots and lots of iterations, high clock speeds can be a powerful ally, and so it is here. The Extreme Edition 965 posts new records in these tests, leaving the AMDs in the dust.
We measured the power consumption of our entire test systems, except for the monitor, at the wall outlet using a Watts Up PRO watt meter. The test rigs were all equipped with OCZ PowerStream 520W power supply units. The idle results were measured at the Windows desktop, and we used SMPOV and the 64-bit version of the POV-Ray renderer to load up the CPUs. In all cases, we asked SMPOV to use the same number of threads as there were CPU front ends in Task Managerso four for the Pentium XE 840, two for the Athlon 64 X2, and so on.
The graphs below have results for “power management” and “no power management.” That deserves some explanation. By “power management,” we mean SpeedStep, PowerNow!, or Cool’n’Quiet. In the cases of the Pentium XE 840 and the Pentium XE 965, the C1E halt state is always active, even in the “no power management” tests. The Extreme Edition 955 and the P4 Extreme Edition 3.73GHz don’t support the C1E halt state or SpeedStep. We have omitted the Pentium D 930 and 950 processors here because we don’t have actual samples of these individual chips; our “simulated” versions with an underclocked Extreme Edition 955 are fine for performance testing, but not for power consumption.
In spite of its higher clock speed, the Extreme Edition 965 draws much less power than the 955. That’s impressive. Of course, these are power consumption numbers at the wall socket, so various inefficiencies in the system power supply chain will inflate the differences between the chips to some degree. But over 50W less power use under load for the 965 XE is a noteworthy result, regardless. At idle, the C1E halt state on the 965 kicks in, lowering power draw there compared to the 955 XE. Notice that we also have numbers for the 955 XE on a “new mobo.” You may have noted in our testing methods section that we tested the Extreme Edition 965 on a newer revision of Intel’s “Bad Axe” D975XBX motherboard. We received the new motherboard during our efforts to resolve some overheating problems with our original 955 XE setup. I wanted to be sure the motherboard wasn’t the main cause of the power consumption differences between the 955 and 965 processors, so I tested the 955 on the newer-rev motherboard, as well. As you can see, the motherboard did play a partial role in the power use difference under load, but not at idle.
The most eyebrow-raising result of all here is that the Extreme Edition 965 system consumes no more power under load than the Athlon 64 FX-60 system. That’s huge. I would be more impressed were it not for the relatively high power consumption of the Asus A8N32-SLI motherboard and the nForce4 SLI X16 chipset. The AMD processors can post lower numbers on a different motherboard like the Asus A8R32-MVP. I note that issue because it’s only fair, not to take away from Intel’s accomplishment here, which is still eye-popping. Obviously, Intel’s 65nm fabrication process is improving with time.
I should mention, however, that the cooler Intel shipped with our Extreme Edition 965 review sample was very loud under load, just like the replacement cooler we eventually used to help resolve our thermal problems with the 955 XE. It’s not exactly whisper-quiet at idle, and when you fire off a program that heats up the chip, the cooler spins up linearly in a whining, hissing crescendo. You will almost certainly want to go with an aftermarket cooler with this CPU if this cooler is representative of what Intel is shipping with retail boxed processors.
So the Extreme Edition 965’s performance at stock speeds is decent but unspectacular. Power consumption, however, is magically lower than the 955’s. And it gets even better. Using a Zalman CNPS9500 LED cooler and a bump in voltage from the stock 1.3V to 1.4375V, I was able to overclock the 965 to a staggering 4.53GHz, simply by raising the CPU multiplier in the BIOS.
The processor was stable at this speed while running four simultaneous instances of Prime95’s torture test loop for 15-20 minutes, so I decided to go ahead and run some benchmarks.
When the Extreme Edition flips bits at 4.53GHz, its performance is directly in league with the Athlon 64 FX-60, even in UT2004, which has given this CPU microarchitecture nothing but fits over the years. I was curious to see what this massive overclock did for power consumption, so here are the numbers.
The C1E halt state brings the overclocked 965 back to 3.2GHz at idle, just like it does at stock speeds. However, the higher CPU voltage raises idle power consumption. Under load at 4.53GHz, the 965 doesn’t exactly conform to the Kyoto protocol, but it could be worseas the Extreme Edition 840 is.
I said in my review of the Extreme Edition 955 a few months ago that Intel wouldn’t likely catch up to AMD using processors based on the NetBurst microarchitecture. My faith in that prediction has been shaken somewhat by the Extreme Edition 965’s combination of overclocking headroom and reduced power draw. This CPU is still no match for the Athlon 64 FX-60or even the Athlon 64 X2 4800+when running at its default 3.73GHz clock speed. But this puppy is a powerful reminder of the benefits better process technology can bring. For its mission in life as a thousand-dollar play-toy that will serve as the centerpiece of an ultra-high-end PC, likely with exotic cooling and extensive overclocking, the Extreme Edition 965 is a startlingly worthy rival to the Athlon 64 FX-60. That mission isn’t exactly a populist one, and the 965’s virtues don’t put Intel’s other desktop processors on par with the Athlon 64 X2 in the meatier, more value-driven part of the market. For high-rent PCs, though, the 965 has undeniable appealnot that I recommend dropping a grand on a processor. My Midwestern sensibilities would never condone such madness. Intel also plans to bump up the regular Pentium D to 3.6GHz with the release of the Pentium D 960. Had I known that sooner, I’d have disabled Hyper-Threading and underclocked the Extreme Edition 965 in order to provide some performance numbers for it. Unfortunately, I found out too late, so we’ll have to look at the Pentium D 960 in a future article. Intel also hasn’t yet set pricing on the 960, so I can’t comment on its likely mix of price and performance.
The Extreme Edition 965 is almost certainly one of the last of its kind before the sun sets on the NetBurst microarchitecture and on the Pentium name, believe it or not. It’s also the best of its breed, as is expected in the ever-progressing world of microprocessors. Already, though, most of our attention is focused intently on the promise of what comes next: a new microarchitecture with much higher performance per clock and per watt than this one. Given what Intel’s 65nm fab process has been able to do for the Extreme Edition 965, AMD may have one heck of a fight on its hands if the upcoming Core microarchitecture is anywhere near reasonably competent.