Five desktop quad-core solutions compared

AS YOU MAY KNOW, we’ve already reviewed the top-end desktop quad-core processor rigs from Intel and AMD. We examined the Core 2 Extreme QX6700 upon its debut, and we covered AMD’s Quad FX platform even before it hit store shelves. What we found, in a nutshell, is that four processor cores is a wonderful thing to have, but only if you have some heavy multitasking to do or you happen to make extensive use of one of the few applications out there capable of taking full advantage of four cores simultaneously. But things have changed somewhat since our last dance with quad-core systems, and so we’re gathered here today to take another look.

Chief among the new developments is the availability of cheaper—err, less expensive—quad-core options like the Core 2 Quad Q6600 and Athlon 64 FX-70. Intel and AMD like to showcase their top performing chips in order to show off what they can do, but top-speed-grade processors are rarely the best values. What’s more, we’ve found that practically any top-speed-grade incarnation of a processor tends to be in a rough spot with respect to heat and power consumption. Lower speed grades promise higher performance per watt.

For instance, the basic power and heat rating, or TDP, of the Core 2 Extreme QX6700 is 130W. Although it’s the same technology and runs only 266MHz slower, the Core 2 Quad Q6600’s TDP is an Al Gore-approved 105W. Officially, the Athlon 64 FX-70’s thermal power rating is the same considerable 125W per chip (in a two-chip solution) as its bigger brother, the FX-74, but we had a hunch the 2.6GHz FX-70 wouldn’t be the same class of double-barreled blowtorch as the 3GHz FX-74. There is, of course, one way to find out: test ’em. And so that’s what we’ve done.

We’ve also recently made the transition to Windows Vista for our test platforms, a move that promises to take better advantage of these quad-core system architectures in various ways. Join us as we fire up our widely multithreaded suite of test applications, many of them 64-bit executables, to see which quad-core solution offers the best mix of price, performance, and energy efficiency.

Four cores, two chips, and how many sockets?
Since we’re already reviewed both of these basic technologies, I will spare you most of the gory details here. If you’d like more info on the “Kentsfield” quad-core processors from Intel, go have a look at our Core 2 Extreme QX6700 article. And if you’re unfamiliar with AMD’s Quad FX platform, you can find our take on it right here. The Cliff’s Notes version goes like this: neither of these products feature a truly native quad-core processor. Kentsfield is comprised of two Core 2 Duo chips cuddled up together in a single package. You get four cores in total, but in groups of two—and the two chips have to communicate with one another by means of the system’s front-side bus. The great advantage of this scheme is that Core 2 Quad variants can use the same motherboard infrastructure as the Core 2 Duo.


A pair of Athlon 64 FX-70s and the Core 2 Quad Q6600.

Quad FX, meanwhile, uses a variation of the Opteron infrastructure, complete with dual CPU sockets and a pair of dual-core processors, in order to reach four cores in a desktop platform. Quad FX’s main concession to desktop use is that it doesn’t require registered DIMMs, just run-of-the-mill DDR2 memory. Although it makes for larger, pricier motherboards, Quad FX’s dual-socket arrangement arguably offers better integration between two dual-core chips thanks to AMD’s HyperTransport interconnect. Because each CPU has dual memory channels hanging off of it, Quad FX also has twice the peak theoretical memory bandwidth of the Core 2 Quad—or more, if you count the Core 2 Quad’s front-side bus as the primary constraint.

Making the most of the Quad FX platform’s memory architecture, however, can be tricky. Since each CPU has its own integrated memory controller, access to RAM can be very quick, but if CPU 1 has to access data in the memory space controlled by CPU 2, memory access will be considerably slower. Fortunately, Windows Vista incorporates kernel improvements for non-uniform memory access (NUMA) architectures like Quad FX, so testing with this OS ought to let the Quad FX platform put its best foot forward.

As I’ve mentioned, the Core 2 Quad’s motherboard situation is simple: it will work with most newer motherboards intended for the Core 2 Duo. Quad FX’s motherboard situation is also simple, but in a different way: you can choose any motherboard you want, so long as it’s the Asus L1N64-SLI WS. This beast is indeed a decent enthusiast-class motherboard, but it’s a little too, er, enthusiastic for some of us, with a total of four PCIe x16 slots, 12 SATA ports, and dual Nvidia chipsets that draw nearly as much power as a theater screening An Inconvenient Truth. Here’s another inconvenient truth: the L1N64-SLI WS costs $350.

That little wrinkle complicates life for the Quad FX platform. AMD has said that more Quad FX motherboards are on the way, but as far as we know, no others are immediately imminent. For now, Quad FX remains tied to one board.

AMD has attempted to keep the price equation in line by selling Quad FX processors at relatively reasonable prices. The original plan was to sell them in pairs at $599 per pair for the FX-70, $799 per pair for the FX-72, and $999 per pair for the FX-74. In fact, they’re still listed in pairs on AMD’s price sheet today. If you go to buy ’em at Newegg, though, you’ll find the CPUs priced and selling individually at a bit of a premium over list. Two FX-70s will set you back about $614, while two FX-72s are $820, and a couple of FX-74s total up to $1040. That’s still not a bad price for the FX-70, relatively speaking, but then you have to take the $350 mobo into account. All told, the cheapest Quad FX mobo and processor combo will add $964 to your next Visa bill, while the most expensive one rings up just ten bucks short of $1400.

By contrast, quality mobos for Intel’s quad-core processors can be had for under $180—well under, if you’re not too concerned about high-end features like dual PCIe x16 slots. Probably the most apt comparison to the L1N64-SLI WS, if there is such a thing, is eVGA’s version of the Nvidia nForce 680i SLI motherboard, which currently lists for $220. The combination of a Core 2 Quad Q6600 and this mobo costs $1066 at Newegg. Switching to a Core 2 Extreme QX6700 would raise the total to $1190 for the mobo and CPU.

None of these are exactly value products, but Quad FX systems wind up costing more than you might think, if you were to start with the AMD processor price list. Of course, the question of value becomes cloudy when talking about four-core rigs whose two main components add up to a grand or more. I should say right here, at the outset, that few of us really need four cores, especially since few of today’s applications can really put ’em all to use at once. Glance over the test results in this article if you want to see that fact illustrated. We have now compiled a nice set of apps for our test suite, most of which can use four cores, but we had to dig a little bit to find them. So the results you’re about to see are as much about potential as about what most folks would get out of a quad-core system today—something to keep in mind.

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

We’ve tested our quad-core systems against one another and against the top dual-core desktop processors, as well. In some cases, getting the results meant simulating a slower chip with a faster one. For instance, our Core 2 Duo E6700 is actually a Core 2 Extreme X6800 processor clocked down to the appropriate speed. Its performance should be identical to that of the real thing. Similarly, our Athlon 64 FX-72 results come from an underclocked pair of Athlon 64 FX-74s.

Our test systems were configured like so:

Processor Core 2 Duo E6700 2.66GHz
Core 2 Extreme X6800 2.93GHz
Core 2 Quad Q6600 2.4GHz
Core 2 Extreme QX6700 2.66GHz
Athlon 64 X2 6000+ 3.0GHz (90nm) Athlon 64 FX-70 2.6GHz
Athlon 64 FX-72 2.8GHz
Athlon 64 FX-74 3.0GHz
System bus 1066MHz (266MHz quad-pumped) 1GHz HyperTransport 1GHz HyperTransport
Motherboard Intel D975XBX2 Asus M2N32-SLI Deluxe Asus L1N64-SLI WS
BIOS revision BX97520J.86A.2618.2007.0212.0954 0903 0205
North bridge 975X MCH nForce 590 SLI SPP nForce 680a SLI
South bridge ICH7R nForce 590 SLI MCP nForce 680a SLI
Chipset drivers INF Update 8.1.1.1010
Intel Matrix Storage Manager 6.21
ForceWare 15.00 ForceWare 15.00
Memory size 2GB (2 DIMMs) 2GB (2 DIMMs) 2GB (4 DIMMs)
Memory type Corsair TWIN2X2048-6400C4
DDR2 SDRAM
at 800MHz
Corsair TWIN2X2048-8500C5
DDR2 SDRAM
at 800MHz
Crucial Ballistix PC6400
DDR2 SDRAM
at 800MHz
CAS latency (CL) 4 4 4
RAS to CAS delay (tRCD) 4 4 4
RAS precharge (tRP) 4 4 4
Cycle time (tRAS) 12 12 12
Audio Integrated ICH7R/STAC9274D5 with
Sigmatel 6.10.0.5274 drivers
Integrated nForce 590 MCP/AD1988B with
Soundmax 6.10.2.6100 drivers
Integrated nForce 680a SLI/AD1988B with
Soundmax 6.10.2.6100 drivers
Hard drive Maxtor DiamondMax 10 250GB SATA 150
Graphics GeForce 7900 GTX 512MB PCIe with ForceWare 100.64 drivers
OS Windows Vista Ultimate x64 Edition
OS updates

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Also, all of our test systems were powered by OCZ GameXStream 700W power supply units. Thanks to OCZ for providing these units for our use in testing.

The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

The Elder Scrolls IV: Oblivion
We tested Oblivion by manually playing through a specific point in the game five times while recording frame rates using the FRAPS utility. Each gameplay sequence lasted 60 seconds. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

For this test, we set Oblivion’s graphical quality to “Medium,” but with HDR lighting enabled and vsync disabled, at 800×600 resolution. We’ve chosen this relatively low display resolution in order to prevent the graphics card from becoming a bottleneck, so differences between the CPUs can shine through.

Notice the little green plot with four lines above the benchmark results. That’s a snapshot of the CPU utilization indicator in Windows Task Manager, which helps illustrate how much the application takes advantage of up to four CPU cores, when they’re available. I’ve included these Task Manager graphics whenever possible throughout our results. In this case, Oblivon really only takes full advantage of a single CPU core, although Nvidia’s graphics drivers use multithreading to offload some vertex processing chores.

I guess you could say things don’t start well for the Quad FX rigs, but all of these systems can run Oblivion smoothly without breaking a proverbial sweat. You simply don’t need a high-end processor to run many of today’s games quite well. This one is mostly a speed contest involving one or two cores, as the dual-core Core 2 Extreme X6800’s first-place finish indicates. Overall, the Intel systems sweep the top spots.

Rainbow Six: Vegas
Rainbow Six: Vegas is based on Unreal Engine 3 and is a port from the Xbox 360. For both of these reasons, it’s one of the first PC games that’s multithreaded, and ought to provide an illuminating look at CPU gaming performance.

For this test, we set the game to run at 800×600 resolution with high dynamic range lighting disabled. “Hardware skinning” (via the GPU) was disabled, leaving that burden to fall on the CPU. Shadow quality was set to very low, and motion blur was enabled at medium quality. I played through a 90-second sequence of the game’s Terrorist Hunt mode on the “Dante’s” level five times, capturing frame rates with FRAPS, as we did with Oblivion.

The Task Manager plots and the benchmark results agree: Rainbow Six: Vegas doesn’t really seem to use more than two cores to any great effect (at least on the PC). Regardless, all of our test rigs are able to run the game very smoothly, with very little separation between them performance-wise.

 

Valve Source engine particle simulation
Next up are a couple of tests we picked up during a visit to Valve Software, the developers of the Half-Life games. They’ve been working to incorporate support for multi-core processors into their Source game engine, and they’ve cooked up a couple of benchmarks to demonstrate the benefits of multithreading.

The first of those tests runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects is limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

Both the Intel and AMD systems scale up nicely from two cores to four here. The Core 2 Duo E6700 is the dual-core equivalent, at 2.66GHz, of the Core 2 Extreme QX6700, and its performance is just about half that of the QX6700. Similarly, the Athlon 64 X2 6000+ is a single 3GHz processor, while the FX-74 system uses two of the same; the FX-74 is nearly twice as fast as the X2 6000+. The big contrast here is that the Intel processing cores execute the particle simulation much faster, so they come out on top.

Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into its games. This isn’t a real-time process, and it doesn’t reflect the performance one would experience while playing a game. It does, however, show how multiple CPU cores can speed up game development.

Again we see reasonably good scaling from two cores to four, but the Intel systems are faster overall.

 

3DMark06
3DMark06 combines the results from its graphics and CPU tests in order to reach an overall score. Here’s how the processors did overall and in each of those tests.

As you’ll see below, performance in the graphical tests in 3DMark06 are limited almost entirely by the graphics card (which is as it should be for a graphics benchmark). 3DMark06 also incorporates a couple of multithreaded CPU tests that involve physics simulations, AI, and game logic. That’s where we’ll see separation between the CPUs.

3DMark’s CPU test give us a glimpse of the potential for quad-core CPUs in games, and indications are good. Even the slowest quad-core solution, the FX-70, comes out well ahead of the top dual-core processor.

 

The Panorama Factory
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

Virtually everything this program does is widely multithreaded, and the quad-core systems outperform the dual-cores substantially. Yet again, the Core 2 processors are faster than the Athlon 64s. Those of you who are exceptionally bright may be noticing the beginnings of a pattern there.

picCOLOR
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded, and in this latest revision, five of those eight functions use four threads.

Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1 GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.

Look at those trippy colors in the picCOLOR screenshot. I’ve replicated them in the lower graph. Sweet, eh? Oh, and the Core 2 processors come out ahead again.

 

Windows Media Encoder x64 Edition
Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

We keep varying the tests and trying new things, and the Core 2 processors keep beating the Athlon 64s. This time around, the Core 2 Quad Q6600 finishes 20 seconds before the Athlon 64 FX-74.

LAME MP3 encoding
LAME MT is the multithreaded version of the LAME MP3 encoder that we discussed earlier. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. (Of course, multithreading works even better on multi-core processors.) You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in many of our previous CPU reviews.

With only two threads at work, the fastest dual-core solution grabs the top spot, and that’s definitely the Core 2 Extreme X6800.

 

Cinebench
Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.

At last, an AMD system picks up an undisputed victory as the FX-74 comes in first. The FX-70 even outperforms the Core 2 Quad Q6600 for once.

POV-Ray rendering
We’ve finally caved in and moved to the beta version of POV-Ray 3.7 that includes native multithreading. The latest beta 64-bit executable is still quite a bit slower than the 3.6 release, but it should give us a decent look at comparative performance, regardless.

Chalk up another win for AMD in POV-Ray, where the three FX rigs finish 1-2-3. Obviously, both the Athlon 64 and Core 2 systems scale well to four threads in this app.

The POV-Ray benchmark scene uses some features that our Chess2 scene does not, and some of those features are not (yet?) multithreaded. As a result, the dual-core processors finish stronger here. The Intel CPUs also make up some ground on the AMDs, as well.

 

MyriMatch
Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.

In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to four threads. Also, this is a newer version of the MyriMatch code than we’ve used in the past, with a larger spectral collection, so these results aren’t comparable to those in some of our past articles.

Here’s an intriguing result. Notice that the FX-72 and FX-74 are neck and neck with four threads, and the FX-72 is actually faster with one and two threads. I believe that’s the result of a quirk of Athlon 64 processors. Since they base their memory clocks on overall CPU speeds, they don’t always run their RAM at the precise frequency requested. In this case, the FX-74 is running at 3GHz, and its memory is running at one eighth that speed, or 375MHz. The FX-72, on the other hand, can run its memory at one seventh its 2.8GHz clock speed, or right at 400MHz (that’s 800MHz DDR) on the nose. MyriMatch looks to be hitting the memory subsystem hard enough that the FX-72’s faster RAM essentially offsets its lower CPU frequency. If you watch, you’ll see this effect subtly at work in some of our other results where the FX-74 isn’t much faster than the FX-72, but it’s more prominent here.

Even in this case where the Athlon 64 systems appear to be memory limited, the quad Core 2 systems scale up nicely. The Core microarchitecture’s memory disambiguation feature can be very effective in avoiding memory bottlenecks, and I expect it’s helping out quite a bit in this test.

STARS Euler3d computational fluid dynamics
Our next benchmark is also a relatively new one for us. Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us recently to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. I understand the STARS Euler3D routines are both very floating-point intensive and oftentimes limited by memory bandwidth. Charles has updated the benchmark for us to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

The Quad FX systems struggle an inordinate amount here, as witnessed by the fact that the FX-74 is only marginally faster than the Athlon 64 X2 6000+ when four threads are in use. With two threads, the 6000+ is faster. I asked Charles to enable thread control for us because I suspected such scaling problems might be under the surface, given the results we’ve seen. This application may be a case where the program needs to be coded with an explicit awareness of NUMA architectures in order to achieve optimal performance.

Even in the dual-core processors, though, the E6700 is 30% faster than the X2 6000+.

 

Folding@Home
Next, we have another relatively new addition to our benchmark suite: a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

As we’ve seen before, the performance race between Intel and AMD for Folding is divided pretty strongly by work unit type. For the Tinker and Amber WUs, the Athlon 64s are easy winners. Just the opposite is true in the two Gromacs WU types, where the Intel processors reign supreme. If you want to pump out lots of work units for your team, though, any brand of quad-core setup will do quite well.

 

SiSoft Sandra Mandelbrot
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX, SSE, and SSE2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.

The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We’re using the 64-bit version of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations in parallel.

The results from this test are predictably dramatic, since the Core microarchitecture can process 128-bit SSE instructions in a single cycle. We’ll keep tracking the results, though, because we expect big things from AMD’s next-gen Barcelona core when it arrives. That chip will be a native quad-core processor with enhanced 128-bit SSE capabilities, and a pair of ’em should drop right into any Quad FX motherboard to yield an eight-core monster.

 

Power consumption and efficiency
We’re trying something a little different with power consumption. Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, video card, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor and speakers into a separate outlet, though.) We measured how each of our test systems used power during a roughly one-minute period, during which time we executed Cinebench’s rendering test. All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests.

You’ll notice that I’ve not included the Athlon 64 FX-72 here. That’s because our “simulated” FX-72 system is based on underclocked FX-74s, and we can’t enable Cool’n’Quiet on the L1N64-SLI WS motherboard when manual multiplier control is in use. I have included our simulated Core 2 Duo E6700, because SpeedStep works fine on the D975XBX2 motherboard alongside underclocking. The simulated E6700’s voltage may not be exactly the same as what you’d find on many retail E6700s. However, voltage and power use can vary from one chip to the next, since Intel sets voltage individually on each chip at the factory

The results are dramatically different for the various processors, especially when you look at that 460W peak for the double-barreled blowtorch, the FX-74. Rather than get ahead of ourselves, let’s analyze the data a little bit. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

There’s not much of a power premium at idle for the Intel quad-core processors—about 10-15W over their dual-core counterparts. The Athlon 64 X2 6000+ draws a little more power on our Asus M2N32-SLI Deluxe motherboard than the competing Intel dual-core processors, but the gap is pretty small. And then we come to the Quad FX system, where two separate CPU sockets and two full Nvidia chipsets are onboard. Idle power use is between 50 and 60W higher than the quad-core Intel systems.

Next, we can look at peak power draw by taking an average from the five-second span from 10 to 15 seconds into our test period, during which the processors were rendering.

Here’s where things become even more interesting. The first effect I want to point out here is how the top speed grades of the quad-core processors tend to consume quite a bit more power than the lower ones. That’s obviously most apparent in the case of the FX-74 versus the FX-70. Despite sharing the same 125W thermal rating, the FX-70 system peaks at “only” 334W, while the FX-74 tops out at 466W. A similar dynamic is at work—in a smaller way—with the Intel quad-core processors, where the Q6600 draws 32W less the QX6700.

The other result that pops out is how much less power draw we see from the Core 2 CPUs. The Q6600’s peak power draw with multithreaded rendering is a few watts below the idle power draw of the FX-74.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, equivalent to joules.

All of the systems are more power efficient over the time period when multithreading is employed. However, the dual-core systems draw less power over the duration of the test period than their quad-core counterparts, thanks mainly to their lower power draw at idle.

Finally, we can consider the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve chosen to identify the end of the render as the point where power use begins to drop from its steady peak. There seems to be some disk paging going on after that, but we don’t want to include that more variable activity in our render period.

We’ve computed the amount of energy used by each system to render the scene, expressed in watt-seconds. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

This may well be our best measure of energy-efficient performance, and sure enough, quad-core systems with multithreading require the least energy to render the scene. The key thing to notice here is that among their respective types, the lower-speed quad-core processors are most efficient. Even though it takes longer to finish rendering, the Q6600 uses less energy than the QX6700. The same is true of the FX-70 versus the FX-74, although the Quad FX systems are clearly much less efficient overall.

 
Conclusions
When the Quad FX platform first debuted, we were disappointed with its very high peak power consumption and its lack of cheaper, more power-efficient motherboard options. We also had some concerns about whether a NUMA platform was being used optimally in Windows XP. Now, time has passed, Vista has debuted, and we’ve gotten our hands on a pair of FX-70s, but not enough has changed. We were right to suspect that the lower speed grade Quad FX processors might not have the outrageous peak power consumption of the FX-74s. The FX-70s are much tamer on that front, to the tune of roughly 130W less. But Vista’s NUMA-aware kernel hasn’t allowed Quad FX to gain much, if any, performance ground on Intel’s quad-core processors. And the nice-sounding idea of a pair of FX processors for $599 doesn’t truly translate into a good value for a quad-core system. In fact, moving up to a Q6600-based system is worth every penny of the extra hundred bucks or so. As much as we like the idea of a dual-socket platform for PC enthusiasts, Quad FX isn’t likely to gain any traction unless and until we see some more reasonable—in every sense of the word—motherboards for it. Here’s hoping somebody comes out with a Quad FX mobo that has the admirable power efficiency of Opteron systems before AMD scuttles the Quad FX concept entirely.

Meanwhile, Intel’s quad-core processors are just amazing. Their performance doesn’t seem to be significantly impacted by a front-side bus bottleneck, contrary to what one might expect. They typically scale up to four threads at least as well as Quad FX systems. Their power use is restrained, and as we anticipated, the Core 2 Quad Q6600 looks to be even more power-efficient than the Core 2 Extreme QX6700. In fact, the Q6600 proved to be the most power-efficient processor in our render energy test, beating out dual-core chips in the process. That’s poster-boy-type behavior for the multi-core revolution.

As I said earlier, not everyone needs a quad-core CPU, and our gaming tests illustrated that abundantly. Even the higher-end dual-core CPUs aced those tests. For most folks, an affordable dual-core CPU will probably be more than enough processor for the next couple of years. Still, there’s something to be said for quad-core rigs, if you know you’ll put those cores to good use. A Core 2 Quad Q6600 or Core 2 Extreme QX6700 system purchased today should offer ample power for now and could achieve extensive longevity if widely multithreaded applications become more common. 

Comments closed
    • stryqx
    • 13 years ago

    Nice review.
    One thing missing from quad-core reviews is benchmarking of virtualisation systems.
    It would be really good to see benchmarking stats across AMD/Intel quad-cores, as well as how they stack up against the quickest dual-cores.

    • arildo
    • 13 years ago

    Great article. I am surprised that there is software that can use 4 cores at this moment. I like the comparision with dual core CPU’s.
    I am in to video editing and the test with Windows Media Encoder was interesting. But the file used in the test was not so big, appr.150 MB. I want to see a simular test with bigger files. Files big enough to envolve the disk system and. Perhaps a test wich show the difference between CPU controlled disk systems like SATA and autonome systems like SCSI or FW.

    • Bensam123
    • 13 years ago

    Not all that related, however not entirely unrelated, it would be nice to see some legacy comparisons. I know you can look at past articles, but so much has changed between older articles and now that they aren’t directly comparable anymore.

    Such as, throwing in a Intel 9xx or 8xx, and some single core variants of the 64 or Intel 5xx/6xx. I’m sure there are still people running Athlon XPs and P4 ‘C’s as well.

    Not everyone has a top of the line or new generation system. After older system parts work their way off the benchmarks page people gain the “my computer is still amazing because I can’t see how much it sucks” mind set.

    • albundy
    • 13 years ago

    nice review, but the quads don’t seem to scale properly. maybe its the apps. should we be seeing a dramatic improvement over dual core? Hell YEAH!

    • raymin
    • 13 years ago

    Why the crowded graphs?

    TR had it right when they dropped all the other CPUs from the charts and focussed on the direct comparison of the Athlon64 X2 6000 and the C2D E6700 – since as a platform, they had price parity and could therefore be considered direct competitors.

    I wish they had done the same in this case for the FX-70 and Q6600. Previously articles had already established the the dualcore-quadcore comparison.

    Anyway, I hold my breath – and my wallet! – for Barcelona! It’ll finally give us AMD users the drop-in-upgrade advantage for quadcore that enthusiast Intel users now enjoy.

    • Coruscant
    • 13 years ago

    Is it nit-picking to note that a Watt-second is really a Joule? I suppose conceptually it might be easier to understand a Watt-second. Draws one watt, and is on for 1 second.

      • Damage
      • 13 years ago

      I dunno. Was I nitpicking when I noted in the review that a watt-second is equivalent to a joule?

        • eitje
        • 13 years ago

        yes. 🙂

      • IntelMole
      • 13 years ago

      I’ve held off on being nit-picky on the phrase “this is equivalent to” as opposed to “this is the unit of”, because it’s … well, nitty nitty nit-picking of the finest order.

        • Mr Bill
        • 13 years ago

        Of the first order, you mean, oh nittiest nit picker?

          • IntelMole
          • 13 years ago

          This is going to turn into a tongue twister in a minute.

          Which is the finest order of nit-picking? I’m thinking it might not be the first. Who knows!

      • eitje
      • 13 years ago

      yes.

    • DrDillyBar
    • 13 years ago

    A 45nm variant of the Q6600 is looking better and better as time passes. Just can’t find a good reason to own one now.
    Glad to see that the FX-70 reduced that power consumption a bit, l[

    • Chaos-Storm
    • 13 years ago

    #35.You can run a benchmark with a command line parameter that runs a simulated game at current graphic settings. I think the command is “map /perftest”. It then returns a score (higher is better with latest patch)

    • Chaos-Storm
    • 13 years ago

    sorry double post

    • wierdo
    • 13 years ago

    Nice review, but pretty early imho, I mean we’re still trying to work with dual cores right now, I don’t think software’s sufficiently taking advantage of just two cores enough 😀

    But, something to keep in mind for three years down the road I guess when they might possibly be more mainstream 😛

    • flip-mode
    • 13 years ago

    NICE review. That was a good mix of systems you put to the test, although the C2D6600’s absence is inexcusable.

    Only the extreme fringe of computer users have any use for quad core, and it is very silly to suggest that making the purchase now can be an investment for the future IMO. Buy now what you need now, buy in the future what you need in the future. In the future there will be $100 quad core CPUs. For the time being, I’m still wondering how, other than folding, to put my dual-core to good use. Sadly, even at work I just don’t do enough with my computer to make dual-core noticeable over single core. Well, I take that back, Firefox now has a core all to itself to senselessly thrash with flash adds. And I guess there’s virtual machines too. Hey is there any way virtual machine benchmarks can be added to the test suite?

    To bad there’s no new QuadFX mobos as this article really needed some hardware porn.

    • tfp
    • 13 years ago

    Nice review. I would have loved to see Supreme Commander used as a multi threaded brench mark. The game in general is a multi-threaded CPU hog. So at least to me, it would have been interesting to see.

      • drfish
      • 13 years ago

      Yeah, I thought for sure it would make an appearance.

        • Damage
        • 13 years ago

        Turns out I couldn’t get sound working on the Intel rig in Supreme Commander. I’m still hoping for a fix, but it will have to wait for a few other things before I can revisit it.

          • tfp
          • 13 years ago

          I know they are having problems with Creative sound cards but I don’t remeber reading anything about problems with Intel on board sound. I didn’t look that hard because I have an old Turtle Beach sound card and sound seems to work.

          I guess you could always run with the /nosound, I know there are people doing that to try to improve performance but on a quad core setup it would be silly not to try to run everything.

          (*edit: I mean leave all SupCom features enabled for the test.*)

          • StashTheVampede
          • 13 years ago

          Can you run the bench without sound on both setups? Even if they aren’t official TR approved results, I’d like to see how well Supreme Commander scales across cores.

            • Damage
            • 13 years ago

            Like I said, it will have to wait for several other things first. When I get back to it, I’ll probably either find a new audio driver for the Intel system or use a sound card. There is no “bench” to run, and you have to play it for quite a while in order for frame rates to average out. Doing that without sound would be… trying.

            Quick preview: X2 6000+ is faster than a pair of FX-70s, so it’s nothing to get too excited about.

            • tfp
            • 13 years ago

            Just as an FYI, at the end of the short cut you would do the following:

            “(path)\Supreme Commander\bin\SupremeCommander.exe” /map perftest

            or with out sound:
            “(path)\Supreme Commander\bin\SupremeCommander.exe” /nosound /map perftest

            With either of these it will run the same bench each time. The results are stored in the SupremeCommander bin file directory. With in that there are 3 scores, one for simulation (CPU), one for render, and a combined score. Else where with in that log file there is min, max, average FPS scores.

            Anyways I hope that is helpful, I completely understand why SupCom wasn’t bench marked. A number of people have been having problems getting it to run or run well from what I have read.

            • swaaye
            • 13 years ago

            I really want to see SupCom used as one of the benches in your vid card and CPU roundups/reviews. That game is so demanding!

            Not saying this review is lacking! It is an excellent looksee and informative as ever. Thanks for the work! Duggeded

            • drfish
            • 13 years ago

            Food for thought, here’s a post I made in the forums:

            “I had task manager running on one screen last night and SupCom on the other… 7 supreme AIs, 81km map… CPU usage started out as about 70-80% of one core and RAM at 800-900MB… Over the course of about an hour of play both cores were running at 90-95% and I was up to 1.8GB of memory usage… It was beautiful and frightening to watch… In truth the game had barely started considering the map size, things got ugly really fast after that…”

            By ugly I mean it crashed because I hit the 2GB addressing limit of 32bit XP… I fully believe that a system with 4GB of memory and four cores will allow for games that would be completely unplayable on any dual core system. That’s exciting to me anyway, and the reason I’ll be buying a QX6700, 4GB, and Vista 64-bit soon…

            • flip-mode
            • 13 years ago

            Are there any good reviews of that game about? I know I could Google and find probably a dozen but is there one that is actually worth reading?

            • tfp
            • 13 years ago

            There is a post in the GPG forums for a work around with the 2GB limit.

            I believe the perftest only gets the game into the 400-600 MB area.

            *edit should have been a reply to #41*

            • Shintai
            • 13 years ago

            The /3GB kernel flag? Nothing special about that.

            • tfp
            • 13 years ago

            thanks, that was very helpful…

            • derFunkenstein
            • 13 years ago

            I didn’t know it existed until I read the thread tfp referenced, maybe you shouldn’t be such a snide moron.

            • Shintai
            • 13 years ago

            Eh? Did I miss something? Or you just need to let out some steam at someone?

            • derFunkenstein
            • 13 years ago

            no, you just generally not being very polite. that’s nothing special, either, though…is it? 😉

            • Shintai
            • 13 years ago

            Says the guy that uses the word “moron”? I just said the /3GB flag wasn´t something special. Aka. it doesn´t need some exotic fix and can be used for other games aswell.

            So please relax.

            • Bensam123
            • 13 years ago

            I didn’t know that either…

            Perhaps you should post your “common” knowledge more often, as most people aren’t on that level. It would be beneficial to the entire human race.

            • drfish
            • 13 years ago

            I know, but it’s not even playable for me by the time it hits that limit so I’ll just wait until I’m running 4GB with Vista…

            • swaaye
            • 13 years ago

            CGW inexplicably gave it a 70% in their latest issue. Errr. I mean Games For Windows. I was disturbed by that considering the game seems to be insane fun to play (especially on a LAN w/ friends vs. a few AI). Though, I suppose it is a fairly buggy game! They are patching it rapidly though.

            • Krogoth
            • 13 years ago

            There are a few bugs and exploits, most of the complains are users who feel that their epenises are now inadequate. It is safe to say that Supreme Commander is the new Oblivion in terms of pushing systems to their limits. 😉

            Supreme Commander isn’t for the faint-hearted. It can be quick on small maps.

            • derFunkenstein
            • 13 years ago

            My epeen is inadequate. Of course, it’s single-core, too. 😆

            • Freon
            • 13 years ago

            It’s a new game, probably not too unreasonable to give it some time for patches to correct issues.

            But I agree it would be a great benchmark game. It’s very unforgiving, especially on the CPU it seems.

    • Hattig
    • 13 years ago

    It just goes to show that apart from a few scientific and graphical jobs, nobody needs a quad-core system, however the 4 cores are arranged.

    But AMD are hurting in the enthusiast world until Barcelona, and especially the dual-core variants thereof, ship in quantity.

      • Shark
      • 13 years ago

      Like in the Xbox360 and PS3, right?

    • Proesterchen
    • 13 years ago

    And showing just how well its current products are selling, AMD has apparently (though not yet shown on their website) issued a revenue warning for Q1/07:

    §[<http://uk.theinquirer.net/?article=38006<]§ , also reported on CNBC Or maybe AMD only uses its own products, explaining why the updated forecasts emerge only now, more than 2/3rds through the quarter. ;-)

      • jobodaho
      • 13 years ago

      Why do you care so much about one company? It just doesn’t make sense. Were you around when Netburst was sucking it up? Were you making douchebag fanboy comments then? Go away.

    • mirkin
    • 13 years ago

    Recent reports suggest Al-Gore approved does now include a mansion full of 130W chips – just not your mansion.

    • Proesterchen
    • 13 years ago

    Oh great, /[

      • Krogoth
      • 13 years ago

      Stop complaining.

      The article wasn’t about X2 6000+ versus E6600 comparison.

    • Fighterpilot
    • 13 years ago

    Bottom line,AMD quads are underperforming,power hungry and overpriced……and only work with one very expensive motherboard.
    Hardly the stuff of dreams now are they….
    Good to see TR has made the switch to Vista 🙂

    • blitzy
    • 13 years ago

    Great article! I really liked the power consumption info, very well thought out and some surprising results (surprising to me probably not so much to the reviewers)

    Can I also suggest Supreme Commander as a test game, apparently it has Quad Core support.

    • crazybus
    • 13 years ago

    1st page
    “…few of us really need for cores”

    should read “four cores”

      • Damage
      • 13 years ago

      Doh, that was caught in editing, yet I failed to fix it. Fixed.

        • Proesterchen
        • 13 years ago

        q[<...ut we had a hunch the *[<2.8GHz FX-70<]* wouldn't be th...<]q, teaser & page 1 FX-70 = 2.6 GHz 2.8 GHz = FX-72

    • Proesterchen
    • 13 years ago

    q[

      • crazybus
      • 13 years ago

      The Quad FX systems features 4 ie. “quad” cores . . . I don’t get your point.

        • Shintai
        • 13 years ago

        4×4 aint quadcore but a standard SMP system we had the last i dont know..15-20 years?

        So 2 quadcores comaped with 3 SMP systems and 2 dualcores?

        • eloj
        • 13 years ago

        It makes sense to distinguish a quad-core system (a system using CPUs with four cores/CPU) from a four-core system (a system with four cores in total, but potentially fewer than four cores per CPU)

        Since the word “four” is already pretty well known, maybe one could use it for disambiguation.

        I mean, we never spoke of classic 2CPU SMP-systems as “dual-core systems”. It’s just a dumb way to go.

        Edit: 2007 and still no GNU/linux tests? Would at least expect a quick multithread build of some large project to see how these things would do as a dev box or in a compile farm.

          • poulpy
          • 13 years ago

          I second the GNU/Linux tests requests!
          I understand time constraints may not allow such things on the daily but when a deeper analysis of a cpu/arch is done that would be highly appreciated.

          And no need to start FUDing away, any major distro would do it.

          It can get a bit confusing whether you have 4 cores in the box in one, two or four chips (one could also count the number of dies if we want to go that way) but IMO the important thing at the end of the day if that very number of cores and their performance not really the packaging/name/label you put on it..

      • dragmor
      • 13 years ago

      I’m seeing what you are…

      • toxent
      • 13 years ago

      y[

      • CampinCarl
      • 13 years ago

      keyword in phrase: solutions

      • eitje
      • 13 years ago

      and those just share a package. it’s all an illusion….!

    • Buub
    • 13 years ago

    No big surprises here.

    Will be interesting to see how Barcelona stacks up in this mix once it hits the streets.

      • Damage
      • 13 years ago

      Bump!

    • UberGerbil
    • 13 years ago

    Wow, I wasn’t expecting this kind of review quite so soon in 2007. Vista, 64bit, multithreaded tests… you certainly have been keeping busy.

    Now we just need a /[

      • flip-mode
      • 13 years ago

      So, we should all check back in Q4 (hopefully).

Pin It on Pinterest

Share This