Good things often come in powers of two, especially in computers. Two, four, eight, or sixteen copies of a common resourcerendering pipelines, megabytes of memory, processors, what have youare instantly recognizable quantities that will most assuredly lead to additional goodness.
But three? Not so much.
Oh, sure, you have the odd exceptions, like a three-disk RAID 5 array or three-way SLI, but these are exceptions, and they are quite literally odd. Even less common is the case of three CPUs. I’ve been racking my brains for a few days trying to come up with past examples of three-way multiprocessor configurations in PC history, and I’ve been coming up blank. Now that I’ve said that, some old-timer will post in the comments about the Univac EP-3333, to which he fed punch cards back in the day. Bully for you, Methuselah, but my point remains: triple-processor configurations are exceptionally rare in the PC world.
They are, however, about to get a whole heckuva lot more common thanks to AMD’s new triple-core Phenom X3 processors. These are essentially just quad-core chips with one core disabled, sacrificed for the cause of product segmentation. Can’t you just hear millions of tiny transistors screaming out in pain and then going silent? The core-botomy has happy side-effects, though, not least of which is extending the Phenom lineup to under 150 bucks.
The advent of these triple-core specimens raises some intriguing questions. Can AMD gain ground on Intel’s very potent dual-core CPUs by disabling a core and slashing its prices? Will the Phenom’s relatively low per-core performance be offset by the presence of a third core? What’s the right tradeoff here? We’ve taken these questions as an excuse to run way too many benchmarks on the new Phenom X3 chips. Then we made up some answers. Keep reading to see what we found.
The powers of three
Since the Phenom X3 really is a quad-core Phenom with one core disabled, there’s really not much new to know about it. (You can find out more about the Phenom itself in our original review of it.)
We should talk about cache briefly, though. Each core on a Phenom has 64K of L1 cache and 512KB of L2 cache associated with it, so Phenom X3s have a total of 1.5MB of L2 cache onboard. The L2 caches are augmented by a larger 2MB level-3 cache designed to assist in data sharing between the cores. CPU geeks take note: because the Phenom shares its L3 cache between all of its cores in a round-robin fashion, we had suspected the deletion of one core might reduce L3 cache access latencies, an Achilles’ heel of the Phenom architecture. Unfortunately, we weren’t able to measure this effect in our testing. At 2.4GHz, the Phenom X3 and X4 have more or less identical access latencies.
Anyhoo, that’s the cache picture. Here’s how the Phenom lineup looks with the addition of the triplets:
X4 9850 Black Edition
Notice, first, that all of the Phenoms in the table above come with model numbers ending in -50. That means they’re all based on B3-revision silicon, which squashes the unfortunate TLB bug. AMD has been selling tri-core Phenoms based on older silicon with -00 model numbers through PC vendors, but only these newer chips should make it into regular distribution channels.
Step past that issue, and your eye will probably focus on the $195 price point, where AMD presents us with a perplexing choice. You can buy a Phenom X4 9550 with four cores and a 2.2GHz clock speed, or you may choose a Phenom X3 8750 with three cores at 2.4GHz for the same price. On the face of it, giving up a CPU core in order to gain 200MHz seems like a bad bargain to me. Then again, Phenoms are currently a little low on single-threaded performance, so perhaps the compromise works. The 8750’s closest competition from Intel is probably the Core 2 Duo E8400, which lists at $183.
Things become infinitely simpler as we move down the price ladder and the quad-core options become more distant. At $145, the Phenom X3 8450 brings AMD’s new microarchitecture into territory formerly occupied by the Athlon 64 X2. The X3 8450 has a relatively low 2.1GHz clock frequency, but packs a third core; the tradeoff here is clear. This product will face off against Intel’s Core 2 Duo E7200, similarly priced at $133.
Three cores is weird
There, I’ve said it. You know you were thinking it. We’re modern folks, open to many possibilities in life, including this one. But three cores is just plain weird. You will need to know this before making the decision to drop a Phenom X3 into your own computer.
This weirdness manifests itself in several ways. Although many of the applications we use for CPU testing had no trouble recognizing the X3’s triple cores and putting them to good use, some did. Several SiSoft Sandra modules lost bladder control when asked to quantify the performance of a tri-core processor and simply refused to run. Microsoft’s Windows Media Encoder pegged the X3 at 67% utilization and would go no further; two cores were all it would use. Even the 32-bit versions of Windows Vista apparently have trouble recognizing odd numbers of CPU cores. Already, updates are becoming available to fix some of these problems, but owners of Phenom X3s are bound to run into such issues over the next little while as software developers adjust to unconventional core counts.
We totally faked some of this. I don’t actually have a Core 2 Quad Q9450, so I underclocked a Core 2 Extreme QX9650 in order to simulate one. The performance should be exactly the same, so no worries. I can’t say the same for sure about power consumption, though, so I left the Q9450 out of those tests. We used the same basic method to simulate a Core 2 Duo E8400 with an underclocked E8500 and a Phenom X3 8450 with an underclocked 8750.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test systems were configured like so:
|Processor||Core 2 Quad Q6600 2.4GHz||Core 2 Duo E6750 2.66GHz
Core 2 Extreme QX6850 3.00GHz
2 Extreme QX9770 3.2GHz
2 Extreme QX9775 3.2GHz
| Athlon 64 X2 5600+ 2.8GHz
Athlon 64 X2 6000+ 3.0GHz
Athlon 64 X2 6400+ 3.2GHz
|Core 2 Extreme QX9650 3.00GHz||Phenom
X3 8450 2.1GHz
Phenom X3 87502.4GHz
Phenom X4 9850
2 Duo E7200 2.53GHz
2 Duo E8400 3.0GHz
Core 2 Duo E8500 3.16GHz
2 Quad Q9300 2.5GHz
2 Quad Q9450 2.66GHz
|System bus||1066MHz (266MHz quad-pumped)||1333MHz (333MHz quad-pumped)||1600MHz
|1GHz HyperTransport||1GHz HyperTransport|
|Motherboard||Gigabyte GA-P35T-DQ6||Gigabyte GA-P35T-DQ6||Gigabyte
|Asus M2N32-SLI Deluxe||MSI
V1.2B1 (TLB patch)
|North bridge||P35 Express MCH||P35 Express MCH||X38
|nForce 590 SLI SPP||790FX|
|South bridge||ICH9R||ICH9R||ICH9R||6321ESB ICH||nForce 590 SLI MCP||SB600|
|Chipset drivers||INF Update 126.96.36.1993
Intel Matrix Storage Manager 7.5
|INF Update 188.8.131.523
Intel Matrix Storage Manager 7.5
|INF Update 184.108.40.2063
Intel Matrix Storage Manager 7.5
Intel Matrix Storage Manager 7.8
|Memory size||4GB (4 DIMMs)||4GB (4 DIMMs)||4GB (4 DIMMs)||4GB
|4GB (4 DIMMs)||4GB (4 DIMMs)|
|Memory type||Corsair TWIN3X2048-1333C9DHX
DDR3 SDRAMat 1066MHz
DDR3 SDRAMat 1333MHz
DDR2 SDRAMat 800MHz
ECC DDR2-800 FB-DIMM at 800MHz
DDR2 SDRAMat ~800MHz
DDR2 SDRAMat 800MHz
|CAS latency (CL)||8||8||4||5||4||4|
|RAS to CAS delay (tRCD)||8||9||4||5||4||4|
|RAS precharge (tRP)||8||9||4||5||4||4|
|Cycle time (tRAS)||20||24||18||18||18||18|
with Realtek 220.127.116.1149 drivers
with Realtek 18.104.22.16849 drivers
with Realtek 22.214.171.12449 drivers
with SigmaTel 6.10.5511.0 drivers
|Integrated nForce 590 MCP/AD1988B
with Soundmax 126.96.36.19900 drivers
with Realtek 188.8.131.5232 drivers
|Hard drive||WD Caviar SE16 320GB SATA|
|Graphics||GeForce 8800 GTX 768MB PCIe with ForceWare 163.11 and 163.71 drivers|
|OS||Windows Vista Ultimate x64 Edition|
|OS updates||KB940105, KB929777 (nForce/790FX systems only), KB938194, KB938979|
Please note that testing was conducted in two stages. Non-gaming apps and Supreme Commander were tested with Vista patches KB940105 and KB929777 (nForce/790FX systems only) and ForceWare 163.11 drivers. The other games were tested with the additional Vista patches KB938194 and KB938979 and ForceWare 163.71 drivers.
Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.
Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket systems were powered by PC Power & Cooling Turbo-Cool 1KW-SR power supplies. Thanks to OCZ for providing these units for our use in testing.
Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.
The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.
We used the following versions of our test applications:
- SiSoft Sandra XI.SP4a 64-bit
- CPU-Z 1.40
- WorldBench 6 beta 2
- Team Fortress 2
- Lost Planet: Extreme Condition with DirectX 10
- BioShock 1.0 with DirectX 10
- Supreme Commander 1.1.3260
- Valve VRAD map build benchmark
- Valve Source Engine particle simulation benchmark
- Cinebench R10 64-bit Edition
- POV-Ray for Windows 3.7 beta 21a 64-bit
- CASE Lab Euler3d CFD benchmark multithreaded edition
- MyriMatch proteomics benchmark
- notfred’s Folding benchmark CD 8/8/07 revision
- picCOLOR 4.0 build 598 64-bit
- The Panorama Factory 4.5 x64 Edition
- Windows Media Encoder 9 x64 Edition
- LAME MT 3.97a 64-bit
- VirtualDub 1.7.6 with DivX 6.7
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Team Fortress 2
We’ll kick off our gaming tests with some Team Fortress 2, Valve’s class-driven multiplayer shooter based on the Source game engine. In order to produce easily repeatable results, we’ve tested TF2 by recording a demo during gameplay and playing it back using the game’s timedemo function. In this demo, I’m playing as the Heavy Weapons Guy, with a medic in tow, dealing some serious pain to the blue team.
We used a relatively low display resolution with low levels of filtering and AA in order to prevent the graphics card from becoming a primary performance bottleneck, so we could show you the performance differences between the CPUs. We tested at 1024×768 resolution with the game’s detail levels set to their highest settings. HDR lighting and motion blur were enabled. Antialiasing was disabled, and texture filtering was set to trilinear filtering only.
Notice the little green plot with four lines above the benchmark results. That’s a snapshot of the CPU utilization indicator in Windows Task Manager, which helps illustrate how much the application takes advantage of up to four CPU cores, when they’re available. I’ve included these Task Manager graphics whenever possible throughout our results. In this case, Team Fortress 2 looks like it probably only takes full advantage of a single CPU core, although Nvidia’s graphics drivers use multithreading to offload some vertex processing chores.
Our first test is a single-threaded game, and it illustrates nicely the dilemma posed by the Phenom X3. The 8750 is every bit as fast as the more expensive Phenom X4 9750 because it runs at the same clock frequency. Dropping a core simply doesn’t hurt herenot a bad tradeoff. On the other hand, the Phenom X3 8450 has trouble keeping pace with the rest of the pack because of its combination of low clock speeds and per-clock performance. The 8450’s additional core is of no use, and the similarly priced Athlon 64 X2 5600+ outperforms it.
Unfortunately for AMD, all of this product positioning banter seems strangely academic when the Core 2 Duo E7200 outruns anything AMD has to offer. The single-core/single-threaded performance of the Phenom simply isn’t up to the standard set by Intel’s Core 2 processors right now.
Lost Planet: Extreme Condition
Lost Planet puts the latest hardware to good use via DirectX 10 and multiple threadsas many as eight, in fact. Lost Planet‘s developers have built a benchmarking tool into the game, and it tests two different levels: a snow-covered outdoor area with small numbers of large villains to fight, and another level set inside of a cave with large numbers of small, flying creatures filling the air. We’ll look at performance in each.
We tested this game at 1152×864 resolution, largely with its default quality settings. The exceptions: texture filtering was set to trilinear, edge antialiasing was disabled, and “Concurrent operations” was set to match the number of CPU cores available.
Lost Planet‘s Snow test is pretty much graphics-bound and so not terribly interesting to us, except to offer the lesson that in some games, any reasonably good CPU will do. The Cave level, meanwhile, shows us the other side of the triple-core compromise. This test puts multiple cores to good use, and as a result, the Phenom X3 8450 delivers higher frame rates than the Core 2 Duo E7200. The X3 8750, though, still can’t quite catch the Core 2 Duo E8400, whose two execution cores combine higher frequencies with strong clock-for-clock throughput.
We tested BioShock by manually playing through a specific point in the game five times while recording frame rates using the FRAPS utility. The sequence? Me trying to fight a Big Daddy, or more properly, me trying not to die for 60 seconds at a pop.
This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.
For this test, we largely used BioShock‘s default image quality settings for DirectX 10 graphics cards, but again, we tested at a relatively low resolution of 1024×768 in order to prevent the GPU from becoming the main limiter of performance.
BioShock performance suffers a little bit with slower processors, but once you reach a certain level of performance, the pack bunches up into what is essentially a many-way tie. Both Phenom X3s are potent enough to fit into the pack, more or less. The 8450’s lower clock speed puts it right on the bubble, just a few FPS behind the Core 2 Duo E7200.
We tested performance using Supreme Commander‘s built-in benchmark, which plays back a test game and reports detailed performance results afterward. We launched the benchmark by running the game with the “/map perftest” option. We tested at 1024×768 resolution with the game’s fidelity presets set to “High.”
Supreme Commander’s built-in benchmark breaks down its results into several major categories: running the game’s simulation, rendering the game’s graphics, and a composite score that’s simply comprised of the other two. The performance test also reports good ol’ frame rates, so we’ve included those, as well.
Supreme Commander gives us one more example of a game whose performance isn’t especially CPU-bound. The Phenom X3s don’t gain much here from their extra cores, but like the other processors, they offer acceptable performance.
Valve Source engine particle simulation
Next up are a couple of tests we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up a couple of benchmarks to demonstrate the benefits of multithreading.
The first of those tests runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.
Thanks to a nicely multithreaded workload, the X3 8450 scores a clear win over the Core 2 Duo E7200 here. Once again, though, the X3 8750’s additional core isn’t sufficient to close the gap with the E8400.
Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into games like Half-Life 2. This isn’t a real-time process, and it doesn’t reflect the performance one would experience while playing a game. Instead, it shows how multiple CPU cores can speed up game development.
And just like that, a pattern begins to emerge. With relatively parallel workloads, the Phenom X3 8450 outperforms the Core 2 Duo E7200, but the Core 2 Duo E8400 more than holds its own against the X3 8750.
WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests. Because WorldBench’s tests are entirely scripted, we weren’t able to capture Task Manager plots for them, as you’ll notice.
Most of today’s desktop applications are single-threaded, and WorldBench’s overall score reflects that fact. Consequently, the Phenom X3s place near the bottom of the pack. Here’s another case where the Core 2 Duo E7200 convincingly beats out any AMD processor.
Productivity and general use software
MS Office productivity
Firefox web browsing
Multitasking – Firefox and Windows Media Encoder
WinZip file compression
Nero CD authoring
WorldBench’s MS Office and Firefox/Windows Media Encoder tests are both multitasking workloads where multiple applications are open and in use simultaneously. This is a place where the Phenom X3’s extra core might make a difference, and perhaps it does. However, the X3 8750 still finishes behind the E8400 in both cases, just as the X3 8450 places behind the E7200. As for the tri-versus-quad question, the X3 8750 consistently edges out the Phenom 9500.
The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.
picCOLOR image analysis
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded, and in this latest revision, five of those eight functions use four threads.
Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1 GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.
The Phenom X3s don’t show well in our image manipulation tests, sadly. In case you were wondering, The Panorama Factory does appear to be making use of all three cores. picCOLOR, though, doesn’t seem to be. CPU utilization tops out at 67%, suggesting it’s using only two threads on the X3.
Video encoding and editing
Windows Media Encoder x64 Edition video encoding
Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.
Like picCOLOR, Windows Media Encoder only spins off two threads on the X3, maxing out at 67% CPU utilization. AMD says its working with Microsoft on this issue.
Windows Media Encoder video encoding
Roxio VideoWave Movie Creator
VirtualDub and DivX encoding with SSE4
Here’s a brand-new addition to our test suite that should allow us to get a first look at the benefits of SSE4’s instructions for video acceleration. In this test, we used VirtualDub as a front-end for the DivX codec, asking it to compress a 66MB MPEG2 source file into the higher compression DivX format. We used version 6.7 of the DivX codec, which has an experimental full-search function for motion estimation that uses SSE4 when available and falls back to SSE2 when needed. We tested with most of the DivX codec’s defaults, including its Home Theater base profile, but we enabled enhanced multithreading and, of course, the experimental full search option.
The rest of our video encoding tests also seem to participate in the 67% pathology. This last test, with DivX and SSE4, definitely does. However, we’ve included this particular VirtualDub/DivX test largely as a showcase for SSE4’s potential. The other three are better real-world examples of video encoding workloads.
LAME MT audio encoding
LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.
With only two threads in play, the Phenom X3s must rely on only two cores, which makes for tough going.
Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.
Here’s a more parallel workload, and what do you know? The competition from Intel brackets the two X3s, with the E8400 just above and the E7200 just below.
We caved in and moved to the beta version of POV-Ray 3.7 that includes native multithreading. The latest beta 64-bit executable is still quite a bit slower than the 3.6 release, but it should give us a decent look at comparative performance, regardless.
3ds max modeling and rendering
The remaining rendering tests yield mixed results. In both POV-Ray’s chess2 scene and the 3ds max rendering test, the Phenom 9500 clearly outperforms the Phenom X3 8750.
Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.
The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.
notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.
On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.
This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.
Core count trumps single-threaded performance here. The Phenom 9500 easily produces more points per day, in total, than the X3 8750, and the 8750 in turn outproduces the Core 2 Duo E8400. Still, the E8400 keeps things very close. If the WUs you’re crunching mainly use the Gromacs cores, I expect the E8400 would outproduce the X3 8750 overall.
Power consumption and efficiency
Now that we’ve had a look at performance in various applications, let’s bring power efficiency into the picture. Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire systemthe CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.
Almost all of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile. The exception here was the Skulltrail system, since its BIOS didn’t support SpeedStep.
Anyhow, here are the results:
Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.
Somewhat unexpectedly, the Phenom X3 8750’s idle power consumption is literally no lower than the quad-core Phenoms’, despite the fact that one of its execution cores is entirely disabled. Interesting.
Next, we can look at peak power draw by taking an average from the ten-second span from 30 to 40 seconds into our test period, during which the processors were rendering.
The Phenom X3’s peak power draw looks to be quite a bit lower than its quad-core counterpart’s. Even so, the X3 pulls more juice than the Core 2 Quad Q6600, a quad-core processor produced on Intel’s older 65nm process tech. Intel’s 45nm dual-cores, the closest performance and price competition, use substantially less power.
Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.
We can quantify efficiency even better by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.
The Phenom X4 9750 may draw more power under load than the X3 8750, but it finishes first, allowing it to consume less energy while rendering the scene. More strikingly, Intel CPUs are vastly more power efficient overall than AMD’s, by every measure. Quite the reversal from several years ago.
The X3 8750 made it up to 2.8GHz (well, OK, 2.796GHz, if you want to be exact) on a 233MHz HT clock without much fuss, at stock voltage, but then it hit a wall. I tried raising the core voltage as high as 1.432V, but was unable to get it to boot Windows when clocked at 2.88GHz. Dropping down to 2.82GHz wasn’t any help, either. So I settled on 2.8GHz.
Notice that the north bridge is overclocked to 2.1GHz in the screenshot above. Our MSI K9A2 Platinum motherboard apparently had no facility for adjusting the north bridge multiplier or for locking down the PCI Express clock. Either one of these things may have been holding back my overclocking efforts, but it’s hard to say. That’s a true disappointment, because the K9A2 Platinum is one of the better, more reasonably priced Socket AM2+ motherboards around. Asus’ 790FX board is more tweakable, but costs more than any Phenom, which is a little upside down. If you want to overclock a Phenom, you’re better off getting the Phenom X4 9850 Black Edition, which has an unlocked multiplier.
A three-core processor is a little bit oddball, but I’ve made peace with the concept. For multitasking or more parallelizable workloads like image and video processing, choosing three lower speed execution cores over two higher performance ones might make some sense. This is a debatable proposition because desktop PC software has been annoyingly slow to migrate to multiple threads overall, but I see the potential merits. I can even see past the momentary road bumps we hit in programs like Windows Media Encoder, where only two cores were put to use. Those problems will be ironed out in due time.
The Phenom X3 processors’ problems aren’t in the concept, but the execution. The three cores simply aren’t quick enough, individually, to make this triple-core product look appealing. They’re a liability in single- and dual-threaded tasks, where the X3 8750 sometimes falls behind the much older Athlon 64 X2. The X3 8450 almost always does so. And the cores aren’t quick enough to really sell the three-way concept when they are all working together. More than once, we saw the Core 2 Duo E8400 outperform the Phenom X3 8750 in intelligently multithreaded applications. Not only that, but the savings in peak power draw from deactivating a core weren’t enough to put the Phenom into the same league as Intel’s 45nm chips, which are astoundingly power efficient.
I can’t help but think this all must have looked different on AMD’s roadmap when it was first being put together. I doubt they expected that the fastest Phenom would only run at 2.4GHz and, in doing so, would only just match the Core 2 Quad Q6600an older product on the way out, replaced by the Core 2 Quad Q9300. That’s the reality, though, and it’s constrained AMD’s pricing so much that the top Phenom quad core is $235. The compression through the rest of the lineup makes the triple-core value proposition suspect. Give up a core to get 200MHz more at $195? Not likely when the Phenom X4 9850 Black Edition, at 2.5GHz with an unlocked multiplier, is 40 bucks more. The logic of the pricing scheme may be internally consistent, but the stakes are too low. I’d go with the X4 9850 ten times out of ten. If, that is, I were somehow bound and determined to choose an AMD processor over one of Intel’s current offerings.