Intel’s Core 2 Extreme QX9650 processor

We’ve been hearing a lot of doom and gloom about the prospects for microprocessors in the past few years. Pundits have told us that Moore’s Law is destined to hit physical limitations that will bring the incredible every-two-years doubling of CPU power to a screeching halt—and probably sooner rather than later, since we’ve already seen CPUs run into heat and clock speed barriers. They have a point—up to a point. But the constant hum-drum of negativity begins to sound dated as time marches further and further away from Intel’s fiasco with the 90nm Pentium 4.

In fact, such thoughts seem like ancient history today, as we get our first look at the desktop version of Intel’s quad-core processors manufactured on its 45nm fabrication process. This new process not only packs in twice as many transistors as 65nm, but also employs new materials to deliver reductions in electrical current leakage. These changes add up to the sort of generational improvement that transports old codgers like me back to the roaring 1990s, when the horizon for CPU progress seemed limitless.

Of course, these days, Intel has hedged its bets by multiplying the number of cores per processor and ramping up the cadence of design innovations to those cores. The result? The new Core 2 Extreme QX9650 quad-core processor promises big reductions in power consumption and heat production, along with performance increases of up to 20%—at the same 3GHz clock speed as the chip that preceded it. Not that there’s anything wrong with that. In fact, this processor could make the prophets of doom and gloom look like downright fuddy-duddies, if you know what I mean. Keep reading to see whether the QX9650 puts a clown suit on the doubters.


A wafer full of 45nm “Penryn” chips. Source: Intel.

The Penryn lands in the Yorkfield

Those of you sick, sick people who follow CPUs closely are probably already familiar with the bevy of code-names involved here, but I’ll recount the major points for the healthier among us. True to Moore’s Law, Intel’s code names double every 18 to 24 months, so there’s much to track. The most relevant names for our present discussion are Penryn and Yorkfield. Penryn is the name of the basic building block of Intel’s entire 45nm lineup; it is the dual-core 45nm processor design on which most of Intel’s mobile, desktop, and server products will be based. Yorkfield is the first desktop implementation of Penryn, and it’s a two-fer special, situating two dual-core chips together nice and cozy-like in a single LGA775-style package, just as Intel’s Kentsfield quad cores like the QX6850 did before it. The Core 2 Extreme QX9650 will be the first version of Yorkfield to hit the streets.

While we’re dropping names, we should probably enter a couple of others into the discussion. Yorkfield is arriving right on time for a generational battle with its somewhat tardy opponent, AMD’s Phenom processor. The Phenom is based on AMD’s K10 design, and unlike Yorkfield, it incorporates four cores natively onto a single chip—or at least it will when it arrives later this month. We’ve already shown you a preview of this microarchitectural battle in the heavyweight division with our previews of AMD’s K10-based quad-core “Barcelona” Opterons and Intel’s 45nm “Harpertown” Xeons. Now we have a chance to reprise this contest on the desktop, starting with the QX9650.


The QX9650 uses the same LGA775 infrastructure as previous Core 2 processors.

As I’ve mentioned, the key to the QX9650’s advances is Intel’s new 45nm fab process, which represents a fundamental change in the structure of the transistors on a chip. Intel says it’s the biggest advancement in transistor technology since the late 1960’s, although this is clearly an evolutionary step. The transistor combines a high-capacitance gate oxide, made of halfnium, with a metal gate, and it delivers some eye-popping purported advantages in addition to the customary doubling of transistor density. Among them, Intel claims, is a 30% reduction in switching power, an improvement of over 20% in switching speed, and a more-than-10X reduction in gate oxide leakage. In layman’s terms, that means 45nm chips should be smaller, run faster, and consume less power than Intel’s 65nm parts—which were already quite good.

Each dual-core Penryn chip crams roughly 410 million transistors into a space of 107 mm². By contrast, the dual-core 65nm Conroe chips fit fewer transistors, 341 million, into a larger 143 mm² die area. Intel has to produce two good chips in order to make one Yorkfield processor, but the small die area involved should make things relatively easy, in terms of avoiding defects and keeping yields high. AMD, on the other hand, has chosen tighter integration and a higher degree of difficulty via a single-chip approach to quad-core processors; each of its upcoming Phenom chips packs 463M transistors into a 283 mm² die via AMD’s 65nm fab process.


The two cores on a Penryn die mirror each other. Source: Intel.

Penryn isn’t quite so revolutionary on the CPU design front, since it’s based on the same basic microarchitecture as previous Core 2 chips. It ain’t exactly chopped liver, either, since the Core 2 chips are the fastest desktop processors around. What’s more, Intel’s chip architects have endowed Penryn with more than its fair share of new tricks and tweaks. The most visible of those tweaks is a larger (6MB) and smarter (24-way set associative) L2 cache on each chip, shared between the two cores. (That works out to 12MB of total L2 cache in a Yorkfield processor, for my fellow liberal arts degree holders.)

With the QX9650, Yorkfield begins life riding a 1333MHz front-side bus like older Core 2 CPUs, but that’s not likely to be the limit forever. Penryn-based Xeons will start out on a 1600MHz FSB, and Intel has already demoed a Core 2 Extreme QX9770 with a 1.6GHz bus.

Both the larger cache and faster bus are traditional vehicles for performance gains, but Penryn has some internal execution tweaks, as well. The chip features a new divider, capable of handling both integer and floating-point math. The divider’s radix-16-based design lets it process four bits per cycle (up from two bits in previous chips) and includes an optimized square root function. The divider has an early-out mechanism that can reduce instruction latencies in some cases, too.

Penryn also extends the Core microarchitecture’s 128-bit single cycle SSE capabilities to shuffle operations, potentially doubling execution throughput for certain tasks, including the formatting of data for other SSE-based vector operations.

Another common vehicle for performance advances is the addition of tailored instructions for specific uses. Penryn has some of those, too, in the form of SSE4. SSE4 is comprised of 47 instructions aimed at HD video acceleration, basic graphics operations (including dot products), and the integration and control of coprocessors over PCI Express links. Developers will have to update their applications and compilers in order to take advantage of these instructions, of course. Fortunately, we’ve been able to include an SSE4-enabled video compression codec in our test suite, as you’ll see.

As the first desktop-oriented derivative of Penryn, the Core 2 Extreme QX9650 is very much a premium product. Like Intel’s other Extreme Editions, the QX9650 has an unlocked upper multiplier and will probably sport a price tag around a grand. Since it drops into LGA775-style sockets, the QX9650 is compatible with many newer Intel-oriented motherboards, especially those based on Intel’s P35 and X38 chipsets, usually with the help of a BIOS update. You’ll want to check with the mobo maker to see whether a particular board supports the QX9650.

As for cooling, Intel officially lists the QX9650’s TDP at 130W, like past Core 2 Extreme processors. I think that’s crazy conservative, like the love-child of Ann Coulter and Pat Buchanan, for reasons that will become clear once you see how it looks on the power meter.

And, as I’ve said, the QX9650 runs at 3GHz on a 1333MHz bus, just like the 65nm Core 2 Extreme QX6850 did before it. The comparison between these two CPUs should give us a nice look at how Penryn/Yorkfield’s architectural tweaks boost clock-for-clock performance.

Before we move on to our results, I should mention that this an early preview of the QX9650. This product is officially slated to debut, and become available for purchase, on November 12. Intel plans to introduce several 45nm Xeons at the same time, but that will be it for a while. Additional Penryn-based desktop processors, both dual- and quad-core, aren’t expected until early next year.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core 2 Quad Q6600 2.4GHz
Core 2 Extreme QX6800 2.93GHz
Core 2 Duo E6750 2.66GHz
Core 2 Extreme QX6850 3.00GHz
Dual Xeon X5365 3.00GHz Athlon 64 X2 5600+ 2.8GHz
Athlon 64 X2 6000+ 3.0GHz
Athlon 64 X2 6400+ 3.2GHz
Dual Athlon 64 FX-74 3.0GHz
Core 2 Extreme QX9650 3.00GHz
System bus 1066MHz (266MHz quad-pumped) 1333MHz (333MHz quad-pumped) 1333MHz (333MHz quad-pumped) 1GHz HyperTransport 1GHz HyperTransport
Motherboard Gigabyte GA-P35T-DQ6 Gigabyte GA-P35T-DQ6 Intel S5000VXN Asus M2N32-SLI Deluxe Asus L1N64-SLI WS
BIOS revision F1 F1 S5000.86B.06.00.0076.

0409200070751

1201 0505
F4
North bridge P35 Express MCH P35 Express MCH 5000X MCH nForce 590 SLI SPP nForce 680a SLI
South bridge ICH9R ICH9R 6231 ESB ICH nForce 590 SLI MCP nForce 680a SLI
Chipset drivers INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

INF Update 8.3.0.1013

Intel Matrix Storage Manager 7.5

ForceWare 15.01 ForceWare 15.01
Memory size 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs) 4GB (4 DIMMs)
Memory type Corsair TWIN3X2048-1333C9DHX

DDR3 SDRAM at 1066MHz

Corsair TWIN3X2048-1333C9DHX

DDR3 SDRAM at 1333MHz

Samsung ECC DDR2-667

FB-DIMM at 667MHz

Corsair TWIN2X2048-8500

DDR2 SDRAM at ~800MHz

Corsair TWIN2X2048-8500C5D

DDR2 SDRAM at ~ 800MHz

CAS latency (CL) 8 8 5 4 4
RAS to CAS delay (tRCD) 8 9 5 4 4
RAS precharge (tRP) 8 9 5 4 4
Cycle time (tRAS) 20 24 15 18 18
Audio Integrated ICH9R/ALC889A

with Realtek 6.0.1.5449 drivers

Integrated ICH9R/ALC889A

with Realtek 6.0.1.5449 drivers

Integrated ICH9R/ALC260

with Realtek 6.0.1.5449 drivers

Integrated nForce 590 MCP/AD1988B

with Soundmax 6.10.2.6100 drivers

Integrated nForce 680a SLI/AD1988B

with Soundmax 6.10.2.6100 drivers

Hard drive WD Caviar SE16 320GB SATA
Graphics GeForce 8800 GTX 768MB PCIe with ForceWare 163.11 and 163.71 drivers
OS Windows Vista Ultimate x64 Edition
OS updates KB940105, KB929777 (nForce systems only), KB938194, KB938979

Please note that testing was conducted in two stages. Non-gaming apps and Supreme Commander were tested with Vista patches KB940105 and KB929777 (nForce systems only) and ForceWare 163.11 drivers. The other games were tested with the additional Vista patches KB938194 and KB938979 and ForceWare 163.71 drivers.

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Our primary test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket Xeon and Quad FX systems were powered by PC Power & Cooling Turbo-Cool 1KW-SR power supplies. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

We’ll start, as ever, with some quick synthetic tests of the memory subsystem, which will help give us the lay of the land before we dive into our real-world benchmarks.

The QX9650 easily surpasses the QX6850 here, probably because it can prefetch more data into its larger L2 cache and thus effectively transfer more data. There are clear striations here among the Intel processors based on bus speed, with the CPUs on the 1066MHz at the back of the pack. The top spots all go to Athlon 64 processors, whose integrated memory controllers are very tough to beat with a front-side bus-based system architecture.

This useful little test gives us a look at L2 cache bandwidth. You’ll notice that it’s multithreaded, so systems with more cores show up as having higher L2 cache bandwidth. Not just one processor or cache is being measured. As a result, the dual-socket quad-core Xeon X5365 (65nm) soars above everything else. We’ve included this system because it was marketed to enthusiasts as part of Intel’s “V8” media creation platform, an answer of sorts to AMD’s dual-socket Quad FX platform, represented here by the Athlon 64 FX-74. I’m happy to be able to include these systems as a curiosity, especially since the FX-74 is AMD’s only quad-core solution for the desktop, but they both have their quirky performance drawbacks as well as benefits that I won’t discuss in too much detail, lest they become a distraction. Besides, as I’ve mentioned before, the Xeons are total show-offs.

Back to the QX9650, its L2 cache bandwidth mirrors that of its 65nm predecessor until we reach the 16MB test block size, where its larger L2 cache grants it a slight advantage.

The QX9650’s memory access latencies also mirror those of the QX6850, despite the QX9650’s larger L2 cache. That’s impressive, though perhaps not quite as impressive as the roughly 15ns advantage the Athlon 64 X2’s integrated memory controller gives it.

We can look at this issue in a little more detail. In the graphs below, yellow represents L1 cache, light orange is L2 cache, and dark orange is main memory.

We measured the QX9650’s 6MB L2 cache latency at 15 cycles, just one cycle more than the smaller 4MB L2 cache in the QX6850. Larger caches tend to bring latency penalties with them, but the smarter L2 in Penryn has barely any penalty at all. That helps explain why the QX9650’s memory access latencies are effectively equivalent to the older chips.

But enough of this CPU geekery! Let’s play some games.

Team Fortress 2

We’ll kick off our gaming tests with some Team Fortress 2, Valve’s class-driven multiplayer shooter based on the Source game engine. In order to produce easily repeatable results, we’ve tested TF2 by recording a demo during gameplay and playing it back using the game’s timedemo function. In this demo, I’m playing as the Heavy Weapons Guy, with a medic in tow, dealing some serious pain to the blue team.

We tested at 1024×768 resolution with the game’s detail levels set to their highest settings. HDR lighting and motion blur were enabled. Antialiasing was disabled, and texture filtering was set to trilinear filtering only. We used this relatively low display resolution with low levels of filtering and AA in order to prevent the graphics card from becoming a primary performance bottleneck, so we could show you the performance differences between the CPUs.

Notice the little green plot with four lines above the benchmark results. That’s a snapshot of the CPU utilization indicator in Windows Task Manager, which helps illustrate how much the application takes advantage of up to four CPU cores, when they’re available. I’ve included these Task Manager graphics whenever possible throughout our results. In this case, Team Fortress 2 looks like it probably only takes full advantage of a single CPU core, although Nvidia’s graphics drivers use multithreading to offload some vertex processing chores.

The QX9650 produces some very nice clock-for-clock performance gains right off the bat. Yow. All of these CPUs are pushing acceptable frame rates for TF2, but the QX9650 is in a class by itself in terms of raw performance. If you want future-proofing, this puppy has it.

Lost Planet: Extreme Condition
Lost Planet puts the latest hardware to good use via DirectX 10 and multiple threads—as many as eight, in the case of our dual quad-core Xeon test rig. Lost Planet‘s developers have built a benchmarking tool into the game, and it tests two different levels: a snow-covered outdoor area with small numbers of large villains to fight, and another level set inside of a cave with large numbers of small, flying creatures filling the air. We’ll look at performance in each.

We tested this game at 1152×864 resolution, largely with its default quality settings. The exceptions: texture filtering was set to trilinear, edge antialiasing was disabled, and “Concurrent operations” was set to match the number of CPU cores available.

As I’ve stated before—and watch me do it again—Lost Planet‘s Cave level is exciting because it puts a cubic assload of flying doodads on the screen and uses multiple threads to control them all. That gives us a nice look at how quad-core processors can speed up a game. Oddly, the QX9650 stumbles just a little bit in the Snow level, for whatever reason, but in the Cave level with all of those doodads, it’s well ahead of the pack—and roughly 10% faster than its 3GHz 65nm counterpart.

BioShock

We tested BioShock by manually playing through a specific point in the game five times while recording frame rates using the FRAPS utility. The sequence? Me trying to fight a Big Daddy, or more properly, me trying not to die for 60 seconds at a pop.

This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

For this test, we largely used BioShock‘s default image quality settings for DirectX 10 graphics cards, but again, we tested at a relatively low resolution of 1024×768 in order to prevent the GPU from becoming the main limiter of performance.

The QX9650 take the top spot again, though not by much. Any of the Core 2 processors here can run BioShock more or less optimally, obviously. And while playing, I didn’t notice any real slowdowns or problems, even on the Athlon 64 X2 5600+.

Supreme Commander

We tested performance using Supreme Commander‘s built-in benchmark, which plays back a test game and reports detailed performance results afterward. We launched the benchmark by running the game with the “/map perftest” option. We tested at 1024×768 resolution with the game’s fidelity presets set to “High.”

Supreme Commander’s built-in benchmark breaks down its results into several major categories: running the game’s simulation, rendering the game’s graphics, and a composite score that’s simply comprised of the other two. The performance test also reports good ol’ frame rates, so we’ve included those, as well.

We’ve had a heck of a time trying to tease out big performance differences between CPUs in this game. They don’t come easily and obviously aren’t very large. However, the QX9650 again sits atop the field, this time in each of Supreme Commander‘s several performance measurements.

Valve Source engine particle simulation

Next up are a couple of tests we picked up during a visit to Valve Software, the developers of the Half-Life games. They’ve been working to incorporate support for multi-core processors into their Source game engine, and they’ve cooked up a couple of benchmarks to demonstrate the benefits of multithreading.

The first of those tests runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

The QX9650 posts a gain of about 15% over the QX6850 in this test, even surpassing the dual Xeons.

Valve VRAD map compilation

This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into games like Half-Life 2. This isn’t a real-time process, and it doesn’t reflect the performance one would experience while playing a game. Instead, it shows how multiple CPU cores can speed up game development.

Even when the QX9650 can’t deliver major progress over the QX6850, it wins. Nothing AMD has to offer even comes close.

WorldBench

WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests. Because WorldBench’s tests are entirely scripted, we weren’t able to capture Task Manager plots for them, as you’ll notice.

Niiiiice.

Productivity and general use software

MS Office productivity

This WorldBench component test has a multitasking element, since several Office apps are in use at once. In this case, the QX9650 finishes a tick behind the QX6850, for whatever reason. The Athlon 64 X2 6400+ puts in a relatively strong showing here, as well.

Firefox web browsing

If you want proof positive that an Intel processor will make your Internet faster, here it is. I wouldn’t exactly recommend trading the Pentium 4 and cable modem for a QX9650 and dial-up, though.

Multitasking – Firefox and Windows Media Encoder

Here’s another WorldBench component test with a multitasking bent. This one uses a multithreaded application, Windows Media Encoder, alongside the Firefox web browser. Once more, the QX9650 achieves an impressive per-clock performance improvement. In fact, I’m going to make “impressive per-clock performance improvement” into a keyboard macro.

WinZip file compression

Nero CD authoring

The QX9650 is a tad bit faster that its predecessor in WinZip, but not in the Nero test, where performance seems to be dictated by (1) disk controller performance and (2) sheer, blind luck.

Image processing

Photoshop

The QX9650 show continues in Photosop, where the Yorkfield processor will let you cut out pictures of your friends and place them almost-convincingly into incriminating circumstances better than any other CPU.

The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

Intel’s new baby excels at stitching together multiple pictures to create a panorama, though it only finishes a couple of seconds ahead of the QX6850. If you’re into CPU geekery, the per-clock performance gains shown in the indvidiual operations below may interest you. Looks to me like the majority of the difference comes in the “stitch” operation, which is the heart of the panorama generation process.

picCOLOR image analysis

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded, and in this latest revision, five of those eight functions use four threads.

Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1 GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.

When the first Core 2 Duo processor debuted in July of 2006, it scored 10.94 in this same test. Today, we’re at 16.46 times the performance of a Pentium III 1GHz, a little over a year later—and only some of picCOLOR’s functions are multithreaded. Not too shabby.

Video encoding and editing

VirtualDub and DivX encoding with SSE4

Here’s a brand-new addition to our test suite that should allow us to get a first look at the benefits of SSE4’s instructions for video acceleration. In this test, we used VirtualDub as a front-end for the DivX codec, asking it to compress a 66MB MPEG2 source file into the higher compression DivX format. We used version 6.7 of the DivX codec, which has an experimental full-search function for motion estimation that uses SSE4 when available and falls back to SSE2 when needed. We tested with most of the DivX codec’s defaults, including its Home Theater base profile, but we enabled enhanced multithreading and, of course, the experimental full search option.

Well, this isn’t even fair at all—and that’s sort of the point. A couple of SSE4’s new instructions are specifically targeted to accelerate H.264-style motion estimation, and they seem to do it well. The QX6850 takes nearly 10 seconds longer to process this short video clip, and the Athlon 64 FX-74 takes twice as long as the QX9650.

Windows Media Encoder x64 Edition video encoding

Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Windows Media Encoder video encoding

Roxio VideoWave Movie Creator

The remainder of our video tests don’t take advantage of SSE4, but the QX9650 still leads in each of them. Some of these are notable performance differences, too, when one CPU finishes processing a short video clip a full 20 or 30 seconds ahead of another one.

LAME MT audio encoding

LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.

Yep. Uh huh. Yep. Moving on…

Cinebench rendering

Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.

Rendering is generally computationally bound, not limited by memory bandwidth or the like. In this case, then, the QX9650 is achieving its clock-for-clock performance boost thanks to its fast radix-16 divider or its single-cycle 128-bit SSE shuffle ability.

POV-Ray rendering

We caved in and moved to the beta version of POV-Ray 3.7 that includes native multithreading. The latest beta 64-bit executable is still quite a bit slower than the 3.6 release, but it should give us a decent look at comparative performance, regardless.

3ds max modeling and rendering

The computational performance enhancements in the QX9650 bring benefits in all three of our rendering test apps. In the case of the POV-Ray chess2 scene, the QX9650 shaves 17 seconds off of the QX6850’s render time, vaulting it ahead of the Athlon 64 FX-74.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

Wow. At the very same clock speed, the QX9650 can haul in quite a few more points per day than its QX6850 precursor, and it easily leads all contenders in the single-threaded processing of three of the four WU types.

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.
In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

I should mention that performance scaling in Myrimatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

Since memory bandwidth is the primary limiter among the very fastest processors here, the QX9650 doesn’t separate itself much from the QX6850, which shares the same bus speed and memory subsystem.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.
The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

The Yorkfield processor hits a 15% higher processing frequency than its Kentsfield counterpart, another impressive jump in performance at the same clock speed.

SiSoft Sandra Mandelbrot

Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX, SSE, and SSE2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We’re using the 64-bit version of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations in parallel.

I keep this test around because it seems to show off the Core 2 chips’ single-cycle SSE2 execution capabilities rather well. However, Penryn’s single-cycle 128-bit SSE shuffle doesn’t help much here.

Power consumption and efficiency

Now that we’ve had a look at performance in various applications, let’s bring power efficiency into the picture. Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile.

Anyhow, here are the results:

If you’re like me, you looked at that raw data on the QX9650 and immediately did a double-take. It’s for real, though.

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

Surprisingly, our QX9650 system draws substantially less power—34W, to be exact—at idle than the otherwise-identical QX6850 system did. That drops the QX9650 power consumption even below that of the dual-core Core 2 Duo E6750.

Next, we can look at peak power draw by taking an average from the ten-second span from 30 to 40 seconds into our test period, during which the processors were rendering.

The 45nm chip’s reduction in power use under load is even more impressive. The QX9650 system pulls 74W less under load than the QX6850-based one—less than an Athlon 64 X2 5600+, astoundingly enough.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

Obviously, with such low idle and peak power consumption, and its quick render time, the QX9650 doesn’t draw much power during the duration of our test period.

We can quantify efficiency even better by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

The Core 2 Extreme QX9650 combines some of the lowest power consumption of the group with the quickest render times. That means it’s able to render the scene with under half the energy used by the Core 2 Duo E6750. Compared to the 65nm Core 2 Extreme QX6850 running at the same clock speed, the QX9650 brings a 33% reduction in system-level energy use during this task.

Overclocking

I started overclocking the QX9650 by setting its multiplier to 12, which would yield a 4GHz clock speed on a 1333MHz front-side bus (whose base clock is 333MHz). I initially raised the CPU core voltage from the default of 1.25V to 1.2625, just to help things along. The system came up and immediately began to POST, but then locked in mid-POST.

Doh.

I tried several times, and the problem persisted.

After recovering to the BIOS defaults, I started cranking up the voltage in an attempt to achieve 4GHz. A little extra juice allowed the system to begin booting Windows, but it crashed before completing the boot process. Things got no better as I stepped up to 1.3V and then 1.325V. I could have gone for more voltage, but I figured backing down on the clock speed a little bit would probably be the best path to stability. After several attempts at 3.85GHz and 3.795GHz with a slightly overclocked bus, I finally settled on a stable config: 3.66GHz at 1.2875V on a stock 1333MHz bus. I then took this screenshot:

It’s like a postcard. From a vacation. In megahertz-land.

This setup proved stable while running four instances of Prime95 for quite a while, so I called it good. The QX9650 also ran through a couple of benchmarks flawlessly at this speed.

I’d say that’s an acceptable start for Intel’s 45nm process, although the actual clock speed is only 166MHz faster than what we reached with our 65nm QX6850. If this chip is any indication, Intel easily has clock room to release some Penryn-based parts at 3.2 or 3.4GHz, at least, and it’s still very early in the game for 45nm.

Conclusions

Sometimes we have to craft finely nuanced analyses of our CPU test results in order to summarize the various merits and weaknesses of different processors as fairly as possible. Not so today. Intel was already well ahead in the performance game with its 65nm quad-core processors, and the Core 2 Extreme QX9650 simply extends that lead by anywhere from a few percentage points to nearly 20%. What’s more, it does so on the strength of a handful of key revisions to the chips, including a larger L2 cache and a fast divider, that benefit a startlingly broad range of applications, from games to office apps and scientific computing. In the video encoding application we tested that supports SSE4, we saw even larger performance gains. The Core microarchitecture has always had strong clock-for-clock performance, but Intel’s design team has found ample room for improvement—and delivered it.

Yet the QX9650’s advances in per-clock performance may not even be its best quality. Our power consumption testing confirmed Intel wasn’t just blowing smoke when it claimed big reductions in switching power and leakage current for its 45nm fabrication process. Our QX9650 test system drew 34W less power at idle and 74W less under load than a comparably equipped Core 2 Extreme QX6850-based one. Taken together with the increases in clock-for-clock performance, the QX9650 brought a 33% reduction in the overall system power needed to render a scene. That’s a huge step forward in power-efficient performance.

All of this comes without any increase in clock speed—yet. Intel seems to be holding higher speeds in reserve, since we were easily able to reach 3.66GHz with our QX9650, without having to resort to crazy-insane core voltages. We can probably expect to see both higher core clocks and higher bus speeds from this generation of products as it matures.

Let the prophets of doom-and-gloom stick that in their pipes and smoke it. They may be right about transistor scaling limits eventually—duh—but many of them spoke too much, too soon.

The crazy thing is that the QX9650 may not even be the fastest desktop microprocessor to arrive this year, if AMD somehow manages to hit the right clock speeds with its Phenom. Let’s not kid ourselves. Based on everything we’ve seen from the 45nm Xeons and Barcelona Opterons, Intel appears positioned to hang on to the performance crown for the foreseeable future. But one never knows until the chips arrive, as the Phenom is set to do very soon. Stay tuned.

Comments closed
    • somedude743
    • 12 years ago

    Didn’t AMD demonstrate a 3 Ghz Phenom quad core a few weeks/months ago? Why in the hell can’t they make more of these 3 Ghz Phenoms just like that one?

    It’s gotta be the 65nm process. AMD needs to just phase out 65nm and start ramping up to 45nm …. FAST. I want to see a big announcement no later than March saying that they have 45nm Phenoms available for sale that are just as good or better than the 45nm Intel Core 2 Extreme QX9650 chips.

    Who cares if the 45nm Phenoms are priced at $1,300+ each? Just get it out there for the deep pocket extreme gamers to play around with. AMD needs to get some good P.R. about its chips or the gamers are going to buy QX9650 chips in droves and forget about AMD. I hear you can overclock the QX9650 to 4 Ghz with air cooling … no problem. THAT right there is a BIG problem for AMD.

    How can AMD compete with 4 Ghz Penryns? Imagine the panic that would set in at AMD if Intel released 45 nm Nehalem chips in Summer 2008! If those Nehalems can overclock up to 4 Ghz like the Penryns, AMD is going to be in bigtime trouble. The benchmarks would be unbelievable for a 4 Ghz Nehalem!

    AMD needs to QUICKLY add some high K hafnium into their new Phenom chips, crank up the FSB to 1,333 Mhz+ for DDR3, cram 12 MB of cache in there and get those frequencies UP, UP, UP. Get it to overclock to 4 Ghz like the new QX9650 already can.

    I’d love to see AMD shock the world in late 2008 and maybe crank out 32nm Phenoms with even more real estate for cache and other processor improvements. I’m sure it’s just a fantasy, but everyone just loves to see the little guy (AMD) take on the big guy (Intel) and beat the hell out them. Getting to 32nm first would sure shock Intel.

    • Mr Bill
    • 12 years ago

    Now would be a good time to find ways to make our supposedly multi threaded OS’s actually work that way. Its stupid that every benchmark gets full control of the PC for its test. Stupid that Win3.11 beat OS/2 in one-at-a-time benchmarks when it was clear to those of us using it in the lab that OS/2 beat the tar out of Win3.11 as a true multithreaded OS. Its not multi threaded unless you can bench a game, crunch linpack or PCA, rip a pile of MP3’s, transcode a video, run that big SQL application, and serve up your webpage all at the same time.

      • UberGerbil
      • 12 years ago

      /[

    • Damage
    • 12 years ago

    I’ve updated the graphs on the folding page for the Amber and Tinker WUs. The previous graphs had incorrect scores for the Core 2 Quad Q6600.

    • gratuitous
    • 12 years ago
      • UberGerbil
      • 12 years ago

      Do I have to quote this so everyone will know what you said when you delete the comment later?

      *[< As seen on a budget box bumper sticker... "My Other CPU is a QX9650"<]*

        • gratuitous
        • 12 years ago
          • Flandry
          • 12 years ago

          What an amazing show of refinement. I was considering a E6750-based home server/HTPC, but now it appears that if I can wait for a couple months, I can get a quad for the same power bill. 🙂

          Great review as always. My only request would be to see what you are able to achieve in power consumption by undervolting.

            • Usacomp2k3
            • 12 years ago

            Server and HTPC aren’t usually one and the same. Just pick quiet hard drives, I guess.

            • coldpower27
            • 12 years ago

            Or you can get an even more frugal E8400 at 3 GHZ for the same price in January, which would be even more economical.

    • Convert
    • 12 years ago

    It’s rare when I feel giddy over a processor. Sure it was nothing more than a evolutionary step of conroe but dang, I certainly wasn’t expecting anything better. Though I won’t be spending 1k on a processor I anxiously await the cheaper offerings.

    Great job Scott and Intel.

    • dukerjames
    • 12 years ago

    EDIT: you guys missed out the 95% off sale yesterday, not it’s back to $1000+
    ——————————————-
    “buy it now for $70
    §[<http://www.tristatecamera.com/lookat.php?refid=8&sku=INTBX80562QX6850<]§ BX80562QX6850 Core 2 Duo QX6850 Extreme Item #: INTBX80562QX6850 Mfr Part #: BX80562QX6850 Suggested Retail Price: $1,399.99 Price: $70.87"

    • SS4
    • 12 years ago

    About the Overclocking, i think u got unlucky with the chip u reviewed : §[<http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3137&p=7<]§ on stock cooling too!! And im sure itl get pushed harder by serious overclocker. But from all reviews i've read so far, this new chip is pretty awsome.

      • Krogoth
      • 12 years ago

      Well that is because, overclocking’s #1 is YMMV. 😉

    • deathBOB
    • 12 years ago

    Awesome, when can I buy one of these for $200?

      • emi25
      • 12 years ago

      after 3-5 years maybe …

        • SS4
        • 12 years ago

        lol i dont think so, even P4 extreme edition still sell for exorbitant price today. So extreme version of CPU will never become worth it as a performance per dollar imo

        • deathBOB
        • 12 years ago

        I mean Penryns in general, not this particular model.

          • insulin_junkie72
          • 12 years ago

          Latter part of January is when the rest of the line comes out (starting at about $130, I believe)

            • deathBOB
            • 12 years ago

            Really? That’s a huge gap…

            • insulin_junkie72
            • 12 years ago

            They’ll basically be upgrading their whole current lineup with 45nm parts at that time, so you’ll have the 45nm variants of the 4300/4400/4500-type line at around $130, and so on up the line – so they’ll be plenty of choices, depending on budget and needs.

    • krazyredboy
    • 12 years ago

    #1 Awesome ending to your comment Krogoth.

    #2 I think there may be more to this processor than what was tested. We all know that no one processor is the same as the next. I would like to think that the one you received, just may not have been one of the better to come off of the stack. I would imagine (at least, going by past record) that there is probably more overclocking potential with these processors. It just seems like this is one of those unlucky, few processors that could not handle the extra push.

    • Krogoth
    • 12 years ago

    Penyrn’s architect isn’t that impressive. It just a Conroe with a more refined SSE engine and may have more L2 cache. I do admit it is hard to beat already potent and proven Conroe design.

    However, the 45nm process is what helps separates Penyrn from its Conroe predecessor. 25-33% power consumption reduction is nothing to sneeze at.

    The more important question is whatever Intel can get good yields of it. I am sure that Intel has already build a large stock of Penyrn based chips over this past fiscal quarter.

    It is like P2 to P3 Coppermine over all again.

    BTW, very good work Damage. I hope some Greeks don’t steal your stuff again.

    THIS IS TECH-REPORT! [Damage kicks the theif into the pit full of trolls]

    • mczak
    • 12 years ago

    These power consumption figures are seriously impressive. A quad-core cpu perfectly suitable for a silent system.
    Too bad the 999$ cpu is the only one for now, the dual-core penryns should be very, very impressive too on that front (passive cooling anyone?)

    • Ruiner
    • 12 years ago

    Lots of idle cores in those game benches.

      • lyc
      • 12 years ago

      lots of idle neurons in that comment.

        • computron9000
        • 12 years ago

        it’s rare for me to find a comment like that so funny.

        • ew
        • 12 years ago

        Really? I thought it was a good point. Multi-core CPUs don’t seem to be much use for current games.

          • UberGerbil
          • 12 years ago

          Depends on how it was meant. You actually do have a lot of neurons idle at any given time.

        • A_Pickle
        • 12 years ago

        Lots of idle neurons in your comment. As far as I can tell, “multithreading” in games, as promised by developers, is a right joke.

        And I still can’t run Crysis. Hmm.

          • indeego
          • 12 years ago

          Seems to help Crysis. What do you expect, multi-threading wasn’t a design goal for most games until multiple cores were penetrating the market more on the high endg{<.<}g

            • A_Pickle
            • 12 years ago

            Agreed — however, the promises of multithreading have been around… well.. at earliest, since hyperthreading. Fact of the matter is, Quake 4, Call of Duty 2… etc. have all been advertised as being “multithreaded” and/or “optimized for dual-core.”

            Using 90% of one core and 5% of the next three isn’t “multithreading,” and that’s what we’ve been getting.

            Not that I want to be cynical — games, especially FPS games tend to be pretty linear in general — they wait on user input. It’s probably pretty tough to optimize an FPS for multiple cores… so all the effort in the world is by no means going to hurt. I’m just saying.

    • derFunkenstein
    • 12 years ago

    r[

    • lyc
    • 12 years ago

    lol, just reading through the first page… damage if the whole pc industry ever does collapse, rest easy in the knowledge that you can switch to doing standup ;D

    ann coulter and pat buchanan, aahahahah…

    edit: typo on page 7, “proof positive”

    • Missile Maker
    • 12 years ago

    Now, if we could just get X48 and more X38 main boards, not to mention nVIDIA’s new chipset main board in the retail channel, we all could see how far Yorkfield can stretch its wings!

    • flip-mode
    • 12 years ago

    Amazing chip. Amazing power -[

    • crazybus
    • 12 years ago

    Looking at the various reviews so far TR seems to have got somewhat of a dud overclocker although Damage didn’t get the voltage too high. Probably most impressive was HotHardware’s 3.9Ghz at a CPU-Z indicated 1.14v. It looks like lostcircuits got there’s to boot windows at 4.4Ghz at the same voltage.

    • LiamC
    • 12 years ago

    Can’t wait for the Q6600 equivalent—Q7600? Q9400??

    On a side note, my B3 Q6600 hates voltages higher than 1.3V, and according to Speedfan, when o/ced to 3GHz (1333 FSB) it’s running 1.25V, though the BIOS reports normal, 1.3V.

    • TakkiM
    • 12 years ago

    this chip should really kick AMD’s “performance-per-watt” balls with the improvement of power consumption

      • melvz90
      • 12 years ago

      i agree with you dude!

      maybe with the countless times they have been kicked in their balls… they’re now impotent to produce a kick-ass product…

    • crazybus
    • 12 years ago

    Colour me impressed. I was already sold by the solid performance increase particularly in the rendering and folding departments but the power consumption just blew me away. I can’t wait until I can pick up a cheap Yorkfield quad-core.

    • indeego
    • 12 years ago

    I’d say the best chip Intel ever made.

    Just fantasticg{<.<}g

      • jobodaho
      • 12 years ago

      Yup, dominating in both performance and power efficiency, a very welcome change.

    • MadManOriginal
    • 12 years ago

    I came to TR to see if there was a review of the 8800GT and more so for solid info on the new process 8800GTS. I find this gem of a review instead. Somehwat dissapointed but only because it wasn’t what I was looking for. I wasn’t really considering one of the 45nm-based chips but those power savings are considerable.

    My own random thought – I believe Intel’s engineers may have tweaked the 45nm process for yield over raw clock speed, at least to start. Remember that 65nm was designed for Netburst originally and with the increasing transistor counts that makes sense. Especially since 45nm will be used for Intel’s first monolithic quads when Nehalem comes out. Any way to verify this speculation?

    • Mr Bill
    • 12 years ago

    Surprise, another good review. Looking forward to lower prices on dual quad core systems eventually. Then I’ll buy, thanks to AMD vs Intel for the competition.

    • ssidbroadcast
    • 12 years ago

    Um, not to nit-pick, but it would’ve been nice if you guys included the Crysis Demo in your test suite, even if only amongst other intel C2Ds to cut down on time.

    Crytek stated that the engine plays nice with quad cores, and everyone has been b**ching lately about how resource-intensive it is. Why not throw a QX9650 at it?

    Edit: Wow, talk about “ask and ye shall recieve.” (see Sunday Shortbread)

    • UberGerbil
    • 12 years ago

    *[http://realworldtech.com/page.cfm?ArticleID=RWT012707024759<]§ Kudos on the review. Excellent as always. Other than the lucid grammar, you'd hardly guess it was the product of a liberal arts major (especially when you throw down highly technical terminology like "cubic assload of flying doodads").

    • Hamish
    • 12 years ago

    That looks like a really nice chip… looking forward to its less… ehem… premium siblings in January.

    The power consumption stuff is really amazing.

    • Sargent Duck
    • 12 years ago

    Woo Hoo! New Reiview. I should be studying, but bah to that, Yay to this…

    anyways, to reply to #2, yeah, I think Damage did that just the one time to shut all the whiners up. American site = American money. I’m pretty sure if Dissonance ever did a comparison, he’d use a Canadian quarter.

    • Usacomp2k3
    • 12 years ago

    I like how that first wafer picture was taken with a “JVC GZ-MG555” which is a video camera. Haha.

    • crazybus
    • 12 years ago

    The link to the article on the comments page is broken….. it’s §[<http://www.techreport.com/article.x/13470<]§ instead of §[<http://www.techreport.com/articles.x/13470<]§

    • UberGerbil
    • 12 years ago

    Also, I notice the “US coinage as a basis of measurement” is back (even if it’s Intel’s fault).

      • enzia35
      • 12 years ago

      If only a processor cost as much as the coin in the picture…

        • UberGerbil
        • 12 years ago

        Considering that coin is devaluing almost as rapidly as Moore’s Law is shrinking transistors…

        • insulin_junkie72
        • 12 years ago

        Unless Phenom can come through here shortly, you might be able to pick up lower-end AMDs for a few of those pretty soon.

        /[<"AMD 4200+, free with the purchase of an extra value meal"<]/

          • just brew it!
          • 12 years ago

          Yeah, no kidding.

          While cheap mid-range AMD CPUs would be great for us in the short term, it wouldn’t bode well for AMD’s long term health.

    • UberGerbil
    • 12 years ago

    “Extreme” is so five years ago.

      • Master Kenobi
      • 12 years ago

      So is “Athlon” but it took them this long to get rid of it =(

Pin It on Pinterest

Share This