Those numbers are correct, of course, but devoid of context, they may seem a little overly dramatic. At the heart of all of these product introductions is a core set of technologies, based on the microprocessor known as Clarkdale, and those technologies have been spun into a range of products for different markets. This explosion of new processors and chipsets marks the final major step in the march of the microprocessor architecture known as Nehalem across Intel’s core desktop and mobile lineups. If you’ve been paying attention, you’ll know that’s likely a very good thing, since earlier Core i5 and i7 processors have been excellent in most ways that matter, including performance, power efficiency, and—starting with the introduction of the Lynnfield chips this past fall—value.
Over the ‘dale and through the ‘field, to grandmother’s house… Wait, what?
From the beginning, the watchword for Nehalem-derived processors has been integration. The first Bloomfield chips brought the memory controller onto the CPU, and the Lynnfield follow-up integrated PCI Express connectivity, as well. Each of those steps has paid dividends in terms of performance, power consumption, and package size. Now, Clarkdale brings graphics into the same package, as part of an unorthodox two-chip solution.
The Clarkdale package with graphics and CPU. Source: Intel.
That solution involves pairing a fairly traditional-looking CPU with, really, a fairly traditional-looking north bridge component, both on one package that fits into a socket. The two chips communicate via a dedicated, high-speed link on the multi-chip package; you can see the MCP interfaces on both chips in the illustration below.
Clarkdale’s two chips mapped for functional blocks. Source: Intel.
The microprocessor is actually the smaller of the two chips on the Clarkdale package. This is a dual-core processor with 256KB of L2 cache per core and a shared, 4MB L3 cache that is 16-way set associative. Like prior CPUs derived from the Nehalem microarchitecture, this CPU can track dual hardware threads per core, so it exposes a total of four threads to the operating system and applications. (Yes, Nehalem is the name for an architecture now. Intel has evidently given up on trying to call multiple successive generations of its microarchitectures “Core.” In that vein, the CPU portion of Clarkdale is code-named Westmere, and the Westmere name will also refer to a family of 32-nm microprocessors.) Clarkdale inherits all sorts of Nehalem goodness, including a collection of intelligent power-saving measures and the related Turbo Boost feature that allows the CPU to range into higher clock frequencies when additional thermal headroom is available.
A shot of the Westmere die. Source: Intel.
The Westmere family isn’t just a straight shrink of prior parts, either. Intel has added six new SSE instructions aimed at accelerating encryption and decryption via the AES algorithm. Together, these instructions provide what the company calls “full hardware support” for AES. A seventh new instruction, PCLMULDQ, enables carry-less multiplication, which is also important for cryptographic work. Westmere processors include a few new tweaks for power savings, too, but as we understand it, that’s about it. This isn’t quite the overhaul that, say, Penryn Core 2 chips were compared to their Conroe predecessors. Then again, Westmere follows its mainstream 45-nm cousins by just a few months. In fact, Intel’s plans originally called for a dual-core, 45-nm processor with integrated graphics code-named Havendale, but the firm canceled that product and pulled forward the introduction of Clarkdale, instead.
At the time, Intel cited a very healthy 32-nm fabrication process as one of the primary reasons for the schedule change. The 32-nm process is the second generation to employ high-k + metal gate transistors, an innovation that worked exceptionally well at 45 nanometers. The 32-nm process has nine copper-and-low-k dielectric interconnect layers, and for the first time, Intel is using immersion lithography—employing a liquid medium to better focus light—for the production of critical layers. The dielectric thickness is down from 1.0 nm to 0.9 nm, and gate widths are down to 30 nm. Intel claims a 22% increase in transistor performance, and it says the process can be tuned to deliver a 5X to 10X reduction in leakage, depending on the transistor type.
|Penryn||Core 2 Duo||2||2||6 MB||45||410||107|
|Bloomfield||Core i7||4||8||8 MB||45||731||263|
|Lynnfield||Core i5, i7||4||8||8 MB||45||774||296|
|Westmere||Core i3, i5||2||4||4 MB||32||383||81|
|Deneb||Phenom II||4||4||6 MB||45||758||258|
|Propus||Athlon II||4||4||512 KB x 4||45||300||169|
The table above compares Westmere’s die size and transistor count to other current processors from Intel and AMD. At 81 mm², Westmere is a pretty darned small chip. Yes, the 45-nm Penryn had more transistors and wasn’t all that much larger, but Westmere has relatively more logic and less cache than Penryn—and cache is generally denser. The closest Nehalem-based comparison is probably Bloomfield, since Lynnfield has integrated PCI Express logic that adds some die area. Westmere has half the cache and cores of Bloomfield, but occupies under one-third of the area.
The north bridge and IGP
Although Clarkdale’s north bridge and integrated graphics processor only incorporates 177 million transistors, it is a larger chip at 114 mm², in part because it’s produced on a 45-nm fab process. This chip sports 16 lanes of PCIe Gen2 connectivity and a dual-channel memory controller capable of supporting DDR3 at up to 1333 MT/s.
The single largest component onboard, though, is the integrated graphics processor. The fact that Intel has integrated graphics into the same package as a 32-nm microprocessor makes us somewhat ambivalent. The company seems to have taken its best technology and arguably its worst technology and made us a sandwich—peanut butter and Vegemite, if you will. We can’t entirely overlook the fact that a firm with a near monopoly in the CPU market has decided to integrate a graphics solution directly into its core products, either. Third-party GPU suppliers like Nvidia can’t be pleased with this development. With that said, when I asked him about this move this past summer, Nvidia’s David Kirk argued persuasively that his company wouldn’t lose any business that it wouldn’t have lost to the IGP in Intel’s chipsets in the past.
For its part, Intel says it has focused on providing an improved user experience for today’s usage models, particularly in Windows 7, with Clarkdale’s IGP. (By the way, as far as I can tell, the IGP’s only name is the very generic Intel HD Graphics.) Although it’s not a world-beater, the IGP’s 3D graphics core has been enhanced in numerous ways. The number of execution units has risen to 12, versus 10 in the G4x-series chipsets. Those execution units are essentially unified shaders compatible with DirectX 10’s Shader Model 4.0 and with OpenGL 2.1. (The G4x supported DX10, but only OpenGL 2.0.)
In answer to our questions about the relatively weak performance of past Intel IGPs, some seemingly basic features are new to Clarkdale’s IGP. The vertex processing hardware now supports cull, clip, and setup, for instance. Intel has also added a fast Z clear function and hierarchical Z support. Both should improve the IGP’s efficiency and performance, but these are features Nvidia and ATI were busy adding to their hardware way back in the DirectX 8 generation. Seriously. Being able to discard the contents of a depth buffer and start over is a pretty old trick, as is the quick rejection of occluded objects. Better late than never, though!
The IGP’s display pipelines and video processing unit have both been substantially upgraded, as well, with near-best-in-class capabilities. The hardware supports dual displays, each at resolutions up to 2560×1600, and it can now drive two displays simultaneously over HDMI, which the GMA 4500 series couldn’t do. Richer colors are on the menu thanks to support for 12-bit-per-channel Deep Color over DisplayPort and HDMI, along with xvYCC capability for an expanded color gamut with wide-gamut displays.
Intel looks to have focused quite a bit on home theater PC-type usage models, and since display standards now carry audio signals, Clarkdale’s IGP incorporates robust support for sound, as well. The IGP can stream up to eight channels of LPCM audio at 24 bits and 96 KHz. Supported standards include the lossless Dolby TrueHD and DTA-HD Master Audio codecs, both of which are used by Blu-ray titles.
Speaking of Blu-ray, the IGP’s video unit provides “full” hardware acceleration for the decoding of the most popular video compression standards: H.264/AVC, VC-1, and MPEG2. This unit can now decode dual video streams simultaneously, helpful for Blu-ray discs that include features like picture-in-picture director’s commentary. To make sure those abilities don’t go unused, Intel has worked with third-party vendors to include support for its acceleration hardware in popular applications like ArcSoft Total Media Theater, CyberLink PowerDVD, and Corel WinDVD.
Core i3-500 and i5-600 series pricing
Now that we’ve properly introduced the technology, let’s take a look at the seven different Clarkdale variants Intel is introducing today for desktop PCs. The pricing and basic feature sets are listed below.
|2||2||2.8 GHz||–||3MB||533 MHz||73W||$87|
|Core i3-530||2||4||2.93 GHz||–||4MB||733 MHz||73W||$113|
|Core i3-540||2||4||3.06 GHz||–||4MB||733 MHz||73W||$133|
|Core i5-650||2||4||3.20 GHz||3.46 GHz||4MB||733 MHz||73W||$176|
|Core i5-660||2||4||3.33 GHz||3.60 GHz||4MB||733 MHz||73W||$196|
|Core i5-661||2||4||3.33 GHz||3.60 GHz||4MB||900 MHz||87W||$196|
|Core i5-670||2||4||3.46 GHz||3.73 GHz||4MB||733 MHz||73W||$284|
That’s a pretty broad range, and at the high end, it overlaps in interesting ways with the Lynnfield-based Core i5-700 series. Take, for instance, the Core i5-661 processor Intel supplied us for review. Thanks to Hyper-Threading, the i5-661 shows four threads in Task Manager, and thanks to Turbo Boost, its clock speeds range up to 3.6GHz—as high as the peak frequency for the much more expensive Core i7-870. The Core i5-750 is priced the same as the i5-661, also exposes four threads, and has a Turbo Boost peak of 3.2GHz. So, much like the Core 2 lineup, the Core i5 series presents you with a trade-off: you may have two faster cores or four slower ones for the same price.
Please do note that the Core i5-661 has a 900MHz IGP frequency and an 87W TDP rating, while the rest of the Clarkdale Core i3/i5 models have a 733MHz IGP and a 73W TDP. Also note that the Core i5-661 is the only Clarkdale processor we received for review ahead of this product launch. One Intel representative we spoke with called the i5-661 “a niche product” intended for premium home theater PCs. In other words, the vast majority of Clarkdale systems will likely have a 733MHz IGP clock, instead, and results from the 900MHz IGP on the Core i5-661 are not likely to reflect their graphics performance. We did request other Core i3/i5 models for review, but Intel declined, leaving us with only this self-described “niche product” to test.
Happily, the BIOS on our Asus H57 motherboard gives us easy control over the IGP clock, so we lowered it to 733MHz and ran a set of rests simulating the performance of the Core i5-660’s IGP. Sadly, we were not able to adjust the IGP voltage, so we couldn’t confidently simulate the power consumption of a 73W Clarkdale product.
Clarkdale chip… sets?
The final piece of the Clarkdale puzzle is a trio of new supporting chips, the H55, H57, and Q57. I believe they’re all just different spins on the same silicon. In fact, I believe they may be based on the same 65nm silicon as the Lynnfield processors’ P55 PCH. This I/O chip augments the Clarkdale package with a range of I/O interfaces for things like SATA and USB.
Block diagram of an Intel H55-based system. Source: Intel.
The most notable new addition in the Clarkdale-focused versions of this chip is support for the Intel Flexible Display Interface (FDI), needed to pipe the IGP’s graphics out to a display. The FDI interconnect is based on DisplayPort. The H55/H57/Q57 have two independent FDI links, one for each display supported, and each one has a 2.7Gbps data rate.
Beyond that, much in these chips will be familiar. The DMI interface handles all other communication between the Clarkdale package and the chipset, with aggregate data rates of up to 1 GB/s in each direction over its four pairs of unidirectional links. The PCIe x1 links on this chip remain Gen2-compliant but with only Gen1 data rates.
In fact, the differences between the chipset models is largely just positioning. The H57 includes Intel’s Rapid Storage Technology, evidently the new name for the former Matrix Storage Technology, which includes RAID capability. The H55 lacks this feature, and it has two fewer USB and PCIe x1 ports, although its 12 USB ports and six PCIe ports should suffice in most cases. And the Q57 is simply the commercial version of the H57, intended for corporate desktops. For the record, Intel says the Q57 costs $44, the H57 $43, the H55 $40, and the P55 $40.
We have two examples of motherboards based on these chipsets in house for testing. The first one to arrive was the Intel DH55TC, a microATX board with a pretty standard array of ports and slots.
Unfortunately, our particular copy of this board had some stability issues, which we suspected might have been caused by a memory problem. We tried several different sets of DIMMs, to no avail, and since this board’s BIOS offers zero control over memory voltages and timings, we were helpless to move ahead.
Instead, we dialed up Asus and got them to send out an H57 board, the P7H57D-V EVO. True to Asus form, this board has a full suite of BIOS-level controls over all sorts of variables, along with a robust set of features that includes USB 3.0 and SATA 6Gbps. Best of all, our stability problems were instantly resolved.
In a funny twist, this board has a pair of PCIe x16 slots with support for both SLI and CrossFire. If a single GPU is installed, the board directs all 16 PCIe lanes to slot one. If a second card is installed in slot two, it splits the PCIe lanes into a dual x8 config. That happens in spite of the fact that the Intel specifications say such a bifurcation of the CPU’s PCIe lanes is only supported when the P55 chipset is present. Looks like Asus is bringing outlaw goodness to the masses. We’ll hopefully be able to bring you a full review of this board after CES.
Clarkdale drops into the same LGA1156 sockets used with Lynnfield Core i5 and i7 processors. The motherboards are largely interoperable: one may, for instance, install a Core i5-750 in an H57 motherboard and use it, so long as a discrete graphics card is available. Similarly, we installed our Core i5-661 in a P55 board with discrete graphics, and it worked effortlessly. Just don’t expect that H57 motherboard’s VGA port to do anything when you have a Lynnfield processor installed, and don’t expect to make use of a Clarkdale IGP on a P55 board.
The Core i5-661 in its socket
The bare socket
The Core i7-870 (left) and the Core i5-661 (right).
From left to right: Socket AM3, LGA1156, and LGA775 processors
The stock Core i5-661 cooler is a puny thing, but it’s still fairly quiet
Key match-ups to watch
We’re about to dive into our performance results, but before we do, let’s talk about some competitive scenarios for the various CPUs.
- The battle at 200 bucks — The star of today’s show, the Core i5-661, is priced at $196. That’s a lot to ask for dual-core CPU these days, and that price tag puts it into direct competition with the Phenom II X4 965, AMD’s fastest processor, and a prior-gen quad-core, the Core 2 Quad Q9400. Now, the Core i5-661 has a much lower TDP than the 125W Phenom II X4 965, but its 87W rating isn’t far from the Q9400’s 95W TDP. The question is whether the i5-661’s quad-threaded performance and nosebleed-inducing 3.6GHz Turbo Boost frequencies are sufficient to justify its price in the presence of some very tough company.
- Duel of the duallies — I’ve also thrown in the top dual-core versions of the Phenom II and Core 2—the X2 550 and the Duo E8600, respectively—to compete against the newcomer. The prices on these products vary quite a bit, but architecturally, we’ll be interested to see how Westmere compares.
- The value slug-fest — We underclocked our i5-661 to make it perform like a Core i3-540, a $133 Clarkdale that lacks Turbo Boost. That product matches up against the Core 2 Duo E7600 (its $133 predecessor) and a potentially quite compelling rival from AMD, the $122 Athlon II X4 630. This value offering, based on the 45-nm Propus core, lacks any L3 cache, but matches the i3-540 with a 2.8GHz clock speed and four real CPU cores. Can AMD’s value quad manage to steal a win against Intel’s new hotness?
Incidentally, those match-ups reveal our thinking in processor selection for this review. We’ve also thrown in the cheapest Core i7-900-series processor, the i7-920, just for comparison’s sake.
Testing Clarkdale meant we had to convert our test systems to motherboards with competing integrated graphics solutions. Fortunately, we were able to find a couple of good boards that allowed us to make a pretty direct comparison.
Asus’ P5G43T-M Pro is one of the few G4x-based boards available that supports DDR3 memory, and it only costs $79. We were even able to run the DDR3 memory at 1333MHz for the discrete graphics tests, technically beyond what the board supports, although we had to drop it down to 1066MHz to achieve stability for IGP testing. Otherwise, this board gave us no problems and, as you’ll see, pretty darned low power consumption results.
Not only does the MA785GMPT-UD2H from Gigabyte have a catchy name, but this Socket AM3 board supports DDR3 memory and has 128MB of 1333MHz SidePort memory onboard for use by its Radeon HD 4200 integrated graphics.
One more thing to note. We’ve mentioned that we underclocked the Core i5-661 to 2.8GHz in order to simulate the Core i3-540’s performance. Although we did change the core clock to the proper speed, the processor’s uncore clock remained at the i5-661’s stock frequency of 3.2GHz. We suspect shipping Core i3-540 processors may have a lower uncore clock (although Intel has made a habit of not documenting these things well). If so, our simulated processor may perform slightly better than the real item due to a higher L3 cache speed. The differences are likely to be very minor, based on our experience with Lynnfield parts—the L3 cache is incredibly fast, regardless—but we thought you should know about that possibility. As is our custom, we’ve omitted the simulated processor speed grade from our power consumption testing.
Finally, after consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for our desktop processor performance tests, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world performance penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled. We did disable these power management features to measure cache latencies, but otherwise, it was unnecessary to do so.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we reported the median of the scores produced.
Our test systems were configured like so:
2 Duo E7600 3.06 GHz
2 Duo E8600 3.33 GHz
Core 2 Quad Q9400 2.66 GHz
i5-750 2.66 GHz
i3-540 3.06 GHz
Core i5-661 3.33 GHz
i7-920 2.66 GHz
II X4 630 2.8 GHz
II X2 550 3.1 GHz
4.0 GT/s (2.0 GHz)
Rapid Storage Technology 188.8.131.527
Rapid Storage Technology 184.108.40.2067
Rapid Storage Technology 220.127.116.117
Rapid Storage Technology 18.104.22.1687
Rapid Storage Technology 22.214.171.1247
|CAS latency (CL)||7||8||8||8||7||8|
|RAS to CAS delay (tRCD)||7||8||8||8||7||8|
|RAS precharge (tRP)||7||8||8||8||7||8|
|Cycle time (tRAS)||20||20||20||20||20||20|
ICH10R/ ALC887 with
Realtek 126.96.36.19995 drivers
Realtek 188.8.131.5295 drivers
P55 PCH/ ALC889 with
Realtek 184.108.40.20695 drivers
H57 PCH/ ALC889 with
Realtek 220.127.116.1195 drivers
ICH10R/ ALC888 with
Realtek 18.104.22.16895 drivers
SB750/ ALC889A with
Realtek 22.214.171.12495 drivers
RE3 WD1002FBYS 1TB SATA
ENGTX260 TOP SP216 (GeForce GTX 260) with ForceWare 195.62 drivers
785G/Radeon HD 4200 with Catalyst 9.12 drivers
Intel G43/GMA X4500 with 126.96.36.1996 drivers
Intel Core i5-661 with 188.8.131.528 drivers
7 Ultimate x64 Edition RTM
August 2009 update
Power & Cooling Silencer 610 Watt
I’d like to thank Asus, Corsair, Gigabyte, OCZ, and WD for helping to outfit our test rigs with some of the finest hardware available. Thanks to Intel and AMD for providing the processors, as well, of course.
The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.
We used the following versions of our test applications:
- SiSoft Sandra 2010.1.16.11
- Stream 5.8 64-bit
- CPU-Z 1.52.2
- WorldBench 6 Gold
- Left 4 Dead 2 184.108.40.206
- DiRT 2
- Modern Warfare 2
- Valve VRAD map build benchmark
- Valve Source Engine particle simulation benchmark
- Cinebench R10 64-bit Edition
- POV-Ray for Windows 3.7 beta 34 64-bit
- picCOLOR 4.0 build 677 64-bit
- 7-Zip 4.65 64-bit
- TrueCrypt 6.3a
- notfred’s Folding benchmark CD generated 8/25/09
- The Panorama Factory 5.3 x64 Edition
- Windows Live Movie Maker 14
- x264 HD benchmark 3.0
- LAME MT 3.97a 64-bit
- ArcSoft Total Media Theatre 220.127.116.11
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
A look at those IGPs
I’d like to get our tests of Clarkdale’s integrated graphics component out of the way before moving on to our more general CPU performance results. Although it’s been very cautious about its statements on this front, I believe Intel has been quietly hopeful that the Clarkdale IGP would acquit itself reasonably well when compared with competing integrated graphics offerings. To help its chances, Intel has spiffed up its graphics control panel with a clean new interface.
The menus only expose a small subset of the options offered by the Nvidia ForceWare and AMD Catalyst control panels, but this is a welcome improvement, nonetheless.
3D gaming performance
The folks at Intel are pretty adamant that the Clarkdale IGP isn’t meant for use with “enthusiast” games. In fact, they even offered us a marketing slide that explains their position, with a broad base of “casual games” like Bejeweled at the base, a healthy selection of games like Sims 3 and World of Warcraft in the middle, and a teeny little tip at the top labeled “enthusiast games,” of which Resident Evil 5 is the only example. The message: Intel HD Graphics provides a “great experience” with these casual and mainstream games, “where the PC market is growing the most.”
Of course, we have a fierce tradition here at TR of subjecting integrated graphics solutions to abuse at the hands of the latest games, a tradition that has grown stronger as games have pretty much stagnated at console-level graphical requirements. So we’re gonna give some Intel PR guys heartburn and run some tests with the latest games and see what happens.
The first part of this process, of course, was to figure out if the games would even run acceptably on these IGPs. To do so, we loaded up each game on the Core i5-661 system and fiddled with the menus until we got acceptable frame rates. In the case of the all of the games below, that meant dropping the graphical quality settings to their absolute minimums across the board. We then had to cut the resolution to 640×480, as well. But, in three of the four games we tried, the best IGPs achieved something close to playable performance, at least.
Remember: in the tests below, the Core i5-660 results are just our i5-661 with the IGP clocked down to the widely used speed of 733MHz. Also, for the processors here, we used similarly priced competitors: the Core i5-661, the Phenom II X4 965, and the Core 2 Quad Q9400. I was unsure whether to test against like-priced CPUs or the fastest dual-cores from the competing product lines, but I had to settle on something. I’m sure some of you will decry my choice, to which I can only answer: I didn’t set Intel’s pricing or product sampling policies.
Anyhow, we tested the 785G with its SidePort memory enabled and again with it disabled. In the case of the results marked “SidePort,” both the SidePort memory and the 785G’s access to main memory were enabled, which is the default config and generally offers the highest performance.
Ok, so all of these games look pretty horrible at 640×480 with the lowest detail settings, but they all ran on the Clarkdale IGP. And you can see that Intel’s graphics core has made huge strides from the G43 to Clarkdale—nearly enough to catch up to the 785G. Heck, without SidePort memory, the 785G is slower than the Core i5-661 in Left 4 Dead 2.
The 785G delivered fluid frame rates in Modern Warfare 2. We even tried bumping up the resolution to 1024×768, but the game wasn’t quite playable then.
The one snag we ran into with the Intel IGP was in DiRT 2, where the opening menu was just a black screen, rendering the game inaccessible. Meanwhile, we were somewhat surprised to find that the 785G handled DiRT 2 pretty decently at the lowest quality settings. Heck, the game was even playable after we bumped up the resolution from 640×480 to 800×600. This sort of compatibility with newer games has long been a relative weakness for Intel, since it offers no discrete graphics products and thus has little incentive to work with game developers as diligently as AMD and Nvidia.
On the other hand, there is a whole class of games intended to run well on low-end graphics solutions—those “mainstream games” Intel’s marketing folks like to talk about. Many of those do run well enough on older Intel IGPs, and the Clarkdale IGP should take them to a new level.
Torchlight runs beautifully on the Core i5-661, with all of the detail settings turned up and the screen resolution set to 1280×1024. In my experience, frame rates averaged about 60 FPS. (One nitpick that I’m not sure whether is the fault of the game or the IGP: I enabled 4X antialiasing to see how well it worked, and the game kept disabling it.)
We also found that the de-tuned mobile version of Clarkdale, known as Arrandale, can handle both Portal and Darwinia competently.
We conducted a few simple power consumption tests for these IGPs, as well. The first set of results comes from when the systems were idling at the Windows desktop, and the second from when they were running the “rthdribl” graphics demo. Since the Asus H57 board appears to exhibit relatively high power draw, we’ve included results from the Intel P55 motherboard, as well.
On the Intel board, the i5-661 system pulls only 42W at the wall socket when idle. Very nice indeed. With the IGP and CPU occupied by our graphics demo, the i5-661’s power draw rises above that of the G43-based system and very close to the 785G rig—which, I should remind you, is equipped with a 125W Phenom II X4 processor. Considering that these entire systems are drawing no more power than the 87W TDP of the Core i5-661, I find it hard to complain.
I also tried a quick-and-dirty test of Blu-ray disc playback on these three IGPs, consisting of playing chapter 9 of The Dark Knight and observing image quality, CPU utilization, and power consumption. I should emphasize that our CPU utilization results are intentionally approximate—we believe Hyper-Threading makes reported CPU utilization numbers a somewhat less than 100% reliable indication of how busy the CPU really is.
Happily, Blu-ray movie playback with ArcSoft’s Total Media Theatre 3 was essentially flawless on all three systems. We noticed no major artifacts, pauses, or dropped frames—motion was entirely fluid. CPU utilization on the G43/Q9400 ran at about 9-16%, while the 785/Phenom II X4 ranged between 7-14%, averaging about 12%. On the Core i5-661, utilization ranged from 2-10%, with an average of about 5%. Obviously, Clarkdale’s ability to offload decode chores from the CPU is reasonably complete.
Here’s a look at power consumption during Blu-ray playback. I had hoped to use the Intel H55 board here, as well, but it wasn’t stable enough to complete the test. Regardless, the H57/i5-661 proves to be more power efficient than the 785G/Phenom II X4 competition.
Power consumption and efficiency
Since we are using the first CPU based on a new process technology, we’ll continue with our power consumption theme as we transition to our broader CPU comparison.
For these tests, we used an Extech 380803 power meter to capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.
We can slice up these raw data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.
You’ll notice that we’ve included two sets of results here for the Core i5-661. Those marked (P55) were measured with the i5-661 installed in our Gigabyte P55 motherboard. We thought the power draw of the Asus H57 board was unusually high, and testing with the P55 gives us an apples-to-apples comparison between the Core i5-661 and the Core i5-750. Turns out that our P55-based system idles at a cool 83W with both processors. That’s intriguing, because the i5-750 has four cores and is based entirely on 45-nm process tech. Looks like Clarkdale’s dual-chip solution requires just as much power at idle.
These numbers are higher than those on the previous page, obviously, since we’re using a discrete graphics card with all of these systems.
Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.
Our two Core i5-661-based test systems draw no more than 130W under load, less even than the Phenom II X4 550, let alone the 198W peak hit by the Phenom II X4 965-based test rig.
We can highlight power efficiency by looking at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.
We can pinpoint efficiency more effectively by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.
By both of our two measures of power efficiency, the Core i5-661 looks very good. The task energy metric is arguably the best measure of power-efficient performance, and rarely have we seen a dual-core processor challenge the quad-cores for the lead here. By its nature, this very parallel task runs most efficiently on many-core processors. Nevertheless, the Core i5-661 takes second only to the Core i5-750. These two processors reach similar results by different paths—the i5-750 by finishing quickly, and the i5-661 by drawing substantially less power during its longer compute time.
Memory subsystem performance
This rather busy graph shows us bandwidth at different points in the cache hierarchies of these processors. With only two cores, the L1 and L2 caches of Clarkdale can’t match the aggregate bandwidth of the quad-core parts. Still, the Core i5-661’s caches deliver more bandwidth at nearly every point than the Core 2 Duo E8600’s or the Phenom II X2 550’s.
The Clarkdale processors don’t quite achieve the same memory throughput as their Core i5/i7 siblings, although they represent a doubling of throughput over the bus-limited Core 2 series.
Here’s a somewhat unexpected result. Memory access latencies are quite a bit higher for the Clarkdale processors than for, well, anything else with an integrated memory controller. Perhaps pushing off the memory controller to a separate chip is the culprit here, or perhaps Intel simply decided not to focus on optimizing for low access latencies for this solution. The Core 2 processors have certainly performed well enough without the benefit of fast memory access, and Clarkdale’s relatively large L3 cache can probably hide latencies fairly effectively in many situations. Whatever the case, the Core i3-540’s access latencies are the highest of the lot.
This is my favorite game in a long, long time, so I had to use in it our latest CPU test suite. Borderlands is based on Unreal Engine technology and includes a built-in performance test, which we used here. We tested with the game set to its highest quality settings at a range of resolutions. The results from the lowest resolutions will highlight the differences between the CPUs best, so I’d pay the most attention to them. The higher resolution results demonstrate what happens when the our GeForce GTX 260 graphics card begins to restrict performance.
With these frame rate averages, Borderlands looks to run quite acceptably on all of the processors tested. Those especially low minimum frame rates, which don’t vary by more than a few ticks from 20 FPS, appear to be a quirk of the game’s built-in performance test. Playing the game on one of these systems, you won’t feel such lows often.
The Clarkdale processors perform relatively well here, as the Core i3-540 beats the prior dual-core champ, the Core 2 Duo E8600. Notably, though, the Core i5-750 edges out the i5-661; buying a fast dual-core for gaming might still be a solid option, but at the same price, the Core i5-750 is faster in this game.
Left 4 Dead 2
We tested Left 4 Dead 2 by playing back a custom demo using the game’s timedemo function. Again, we had all of the image quality options cranked, and we tested with 16X anisotropic filtering and 4X antialiasing. The game’s multi-core rendering option was, of course, enabled.
Clearly, any of these CPUs can slice through L4D2 with ease. Relatively speaking, the Core i5-661 trails the Phenom II X4 965 and the Core i5-750. The Core i3-540 is better positioned against its price competitors, with a clear lead over both the Athlon II X4 630 and the rather unfortunate Core 2 Duo E7600.
Notice something here. Despite a clock-speed deficit, the Core 2 Quad Q9400 nearly matches the Core 2 Duo E8600. Likewise, the Athlon II X4 630 outright beats the Phenom II X2 550, though it’s handicapped with lower clock speeds and less cache. The performance differences are minor in the grand scheme, but they signal a changing of the guard: no longer is a higher-frequency dual-core processor the superior option in games. We saw quad-core processors perform relatively well in our first round of Windows 7-based CPU tests, and these results further the trend.
Of course, what happens at higher resolutions here is a classic example of a GPU bottleneck taking over. If your graphics card becomes the primary performance constraint, having a faster CPU won’t get you much.
This excellent new racer packs a nicely scriptable performance test. We tested at the game’s “high” quality presets with 4X antialiasing.
The Core i5-661 takes the top spot at the lowest resolution, which counts as a victory in my book. The field groups up and becomes somewhat jumbled as the GPU bottleneck kicks in. Note, also, that the three slowest processors here all have just two cores and two hardware threads. DiRT 2 appears to benefit pretty straightforwardly from four threads.
Modern Warfare 2
With Modern Warfare 2, we used FRAPS to record frame rates over the course of a 60-second gameplay session. We conducted this gameplay session five times on each CPU and have reported the median score from each processor. We’ve also graphed the frame rates from a single, representative session for each. We tested this game at a relatively low 1024×768 resolution, with no AA, but otherwise using the highest in-game visual quality settings.
Lesson number one here is familiar: most games just don’t need too terribly much CPU power to run well these days. Heck, our FPS minimums are in the 60s.
Relatively speaking, the i5-661 is just behind the Phenom II X4 965 and a little further behind the i5-750. One wonders what Intel was thinking when it priced the i5-661 opposite these processors. The Core i3-540, meanwhile, looks like dramatic progress from the Core 2 Duo E7600, although that Athlon II X4 630 remains formidable competition.
Source engine particle simulation
Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.
This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.
The Clarkdale processors benefit mightily here from Hyper-Threading, as does the Core i7-920.
Productivity and general use software
We have, for quite some time now, used WorldBench in our CPU tests. Over that time, we’ve found that some of WorldBench’s tests can be rather temperamental and may refuse to run periodically. We’ve also found that some of the same tests tend to have inconsistent performance results that aren’t always influenced much by processor performance. Other applications in WorldBench 6, like the Windows Media Encoder 9 test, make little or no use of multithreading, despite the fact that such applications are typically nicely multithreaded these days. As a result, we’ve decided to limit of our of WorldBench to a selection of its applications, rather than the full suite.
MS Office productivity
This MS Office test includes an element of light multitasking, since multiple Office applications are in use at once. Still, this obviously is a case where two fast cores will suffice, judging by these results. Given that, the two Clarkdale processors end up near the top of the heap, with only the Core 2 Duo E8600 ahead of them. I’ll bet that the E8600 is fastest here in part because of its 6MB L2 cache.
Firefox web browsing
Here’s another situation where single- or dual-threaded performance is king, and so the Core i5-661, with its 3.6GHz Turbo Boost peak, takes the crown.
Multitasking – Firefox and Windows Media Encoder
The i5-661 falls just behind its quad-core rivals in this multitasking scenario, while the Core i3-540 enjoys a comfortable lead over its liked-priced competition.
WinZip file compression
7-Zip file compression and decompression
Our two compression tests are a study in threading. WorldBench’s version of WinZip uses only one or two threads, and so the high-frequency dual-core processors tend to do well in it. 7-Zip is much more widely multithreaded, so the quad-core CPUs are fastest there. The Core i5-661 demonstrates its adaptability by using the high frequencies enabled by Turbo Boost to take the top spot in the WinZip test and then using its four hardware threads to nearly keep pace with the Core 2 Quad Q9400 in 7-Zip.
TrueCrypt disk encryption
Here’s a new addition at our readers’ request. This full-disk encryption suite includes a performance test, for obvious reasons. We tested with a 50MB buffer size and, because the benchmark spits out a lot of data, averaged the results in a couple of different ways. The big news from the overall results is the strength of the AMD quad-cores. Heck, the Athlon II X4 630 leads the Core i5-750, amazingly enough. The Clarkdale processors lead the dual-cores—by quite a bit in the case of the i5-661—but can’t touch the quads.
I’m fairly certain this version of TrueCrypt doesn’t yet support Westmere’s new encryption-related AES instructions. We’ll have to run this test again once the software is updated to make use of them.
The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.
In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. So this time around, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.
The i5-661 may be the top dual-core CPU yet again, but it just manages to match the performance of a much cheaper quad in the Athlon II X4 630.
picCOLOR image processing and analysis
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including SSE extensions, multiple cores, and Hyper-Threading. Many of its individual functions are multithreaded.
Recently, at our request, Dr. Müller graciously agreed to re-tool his picCOLOR benchmark to incorporate some real-world usage scenarios. As a result, we now have four new tests that employ picCOLOR for image analysis. I’ve included explanations of each test from Dr. Müller below.
Particle Image Velocimetry (PIV) is being used for flow measurement in air and water.
The medium (air or water) is seeded with tiny particles (1..5um diameter, smoke or oil fog in air,
titanium dioxide in water). The tiny particles will follow the flow more or less exactly, except may be
in very strong sonic shocks or extremely strong vortices. Now, two images are taken within a very
short time interval, for instance 1us. Illumination is a very thin laser light sheet. Image resolution is
1280×1024 pixels. The particles will have moved a little with the flow in the short time interval and
the resulting displacement of each particle gives information on the local flow speed and direction.
The calculation is done with cross-correlation in small sub-windows (32×32, or 64×64 pixel) with some
overlap. Each sub-window will produce a displacement vector that tells us everything about flow speed
and direction. The calculation can easily be done multithreaded and is implemented in picCOLOR with
up to 8 threads and more on request.
Real Time 3D Object Tracking is used for tracking of airplane wing and helicopter blade deflection and deformation in wind tunnel tests. Especially for comparison with numerical simulations, the exact deformation
of a wing has to be known. An important application for high speed tracking is the testing of wing flutter, a
very dangerous phenomenon. Here, a measurement frequency of 1000Hz and more is required to solve the
complex and possibly disastrous motion of an aircraft wing. The function first tracks the objects in 2 images
using small recognizable markers on the wing and a stereo camera set-up. Then, a 3D-reconstruction
follows in real time using matrix conversions. . . . This test is single threaded, but will be converted to 3 threads in the future.
Multi Barcodes: With this test, several different bar codes are searched on a large image (3200×4400 pixel).
These codes are simple 2D codes, EAN13 (=UPC) and 2 of 5. They can be in any rotation and can be extremely fine
(down to 1.5 pixel for the thinnest lines). To find the bar codes, the test uses several filters (some of them multithreaded). The bar code edge processing is single threaded, though.
Label Recognition/Rotation is being used as an important pre-processing step for character reading (OCR).
For this test in the large bar code image all possible labels are detected and rotated to zero degree text rotation.
In a real application, these rotated labels would now be transferred to an OCR-program – there are several good programs
available on the market. But all these programs can only accept text in zero degree position. The test uses morphology
and different filters (some of them multithreaded) to detect the labels and simple character detection functions to locate the text and to determine the rotational angle of the text. . . . This test uses Rotation in the last important step, which is fully multithreaded with up to 8 threads.
The Core i5-661 performs exceptionally well overall, taking the top spot in one test and second place in two others, despite having only two cores. Only in the Particle Image Velocimetry test does the i5-661 succumb to its dual-core nature.
picCOLOR still includes a synthetic test of its various functions, and here things get interesting. The Core i3-540, amazingly, matches the Athlon II X4 630 almost exactly, though the two CPUs take very different paths toward performance.
Media encoding and editing
x264 HD benchmark
This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark
From a CPU architecture standpoint, I’m astounded to see the Core i5-661 nearly matching the performance of the Core 2 Quad Q9400. That’s, frankly, mind-blowing. From a price-competitive standpoint, though, both Clarkdale variants perform poorly here. The Athlon II X4 630 is cheaper and faster than both.
Windows Live Movie Maker 14 video encoding
For this test, I used Windows Live Movie Maker to transcode a 30-minute TV show, recorded in 720p .wtv format on my Windows 7 Media Center system, into a 320×240 WMV-format video format appropriate for mobile devices.
Again, these are amazing results for a dual-core processor. Were it not for the existence of similarly priced quad-cores…
LAME MT audio encoding
LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.
This is an ideal test case for Clarkdale, and the i5-661 comes out on top.
3D modeling and rendering
The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.
Here’s another nicely multithreaded application where the Clarkdale processors overachieve to great effect: the i5-661 beats the Athlon II X4 630 and almost ties the Core 2 Quad Q9400—in addition to producing the highest single-threaded score by over 400 points.
Our multiprocessor speed-up graph documents the interplay between one thread and many, and between Turbo Boost and Hyper-Threading. Despite its Turbo Boost-driven ability to reach its highest clock speeds with just a single thread, the Core i5-661 gains more from the move to multiple threads than any of the other dual-cores.
We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support.
3ds max modeling and rendering
Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into games like Half-Life 2.
The story in the rest of our rendering tests remains more or less the same: the Clarkdale processors do amazing things for dual-core parts. They’re just not quite up to keeping pace with quad-cores in the same price range.
Next, we have a slick little [email protected] benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, [email protected] is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.
The [email protected] project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, [email protected] should be a great example of real-world scientific computing.
notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.
On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.
This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.
The individual results for each unit type tend to get a little wonky because the CPUs with Hyper-Threading are running multiple threads on each core. To get a clear sense of performance, you’ll want to focus on that final graph showing total projected points per day. Here, Clarkdale continues to achieve well beyond what one might expect from a dual-core part.
These Clarkdale processors are complex beasts. To get a handle on what we think of them, we should break things down into several pieces.
As a CPU technology, Clarkdale is excellent. I can’t get over how the Core i5-661 kept nearly matching the Core 2 Quad Q9400 in things like video encoding and rendering with just two cores. We’ve known for a while how potent the Nehalem microarchitecture can be, but seeing a dual-core processor take on a quad-core from the immediately preceding generation is, as I said, pretty mind-blowing. Clarkdale’s power consumption is admirably low at peak, as well, and its idle power draw of only 42W on the Intel H55 board during our IGP tests was quite the statement. There are only two little weaknesses I can name. First, when installed in our P55 motherboard, the Core i5-661 had the same power draw at idle as the Core i5-750. With its two 32-nm cores, I had hoped its idle power draw might be measurably lower. Second, the dual-chip package has apparently produced relatively high memory access latencies. Those higher latencies obviously didn’t hamper performance too much, but they may explain why the Core i5-661 wasn’t much faster than the Core 2 Duo E8600 in a number of lightly threaded applications, including WorldBench’s Office, Firefox, and WinZip tests.
As a consumer product, though, the Core i5-661 is simply overpriced for its performance. I’m not sure why Intel chose to price it comparably to the Core i5-750 and the Phenom II X4 965, very fast quad-core processors. Perhaps they were thinking of the nice premiums commanded by the Core 2 Duo E8600 when its single- and dual-threaded performance was prized for the absolute best gaming experience. With the advent of Windows 7 and a new crop of games, though, those days are long gone. I would choose the Core i5-750 over the i5-661 ten times out of ten. Against the Phenom II X4 965, the i5-661’s only advantage is lower power consumption—much lower, don’t get me wrong, but you’ll be sacrificing performance substantially to get there.
The Core i3-540 is considerably more attractive compared to its competition. The Core 2 Duo E7600 is no match for the processor that succeeds it. Our Athlon II X4 630-based test system pulled 38W more at the wall socket than our Core i5-661-based one, and the i3-540 has an even lower TDP than its sibling. Still, AMD is feisty, and the Athlon II X4 630 has its undeniable merits in multi-threaded applications. I could imagine choosing either product as the basis for a desktop system, depending on one’s needs. Intel undoubtedly has the technology lead, though, and could choose to cut prices until AMD’s products became unattractive, if it wished.
The integrated graphics processor on Clarkdale has, to some extent, managed to exceed my rather low expectations. Intel has obviously taken care to bring its display pipeline and video processing capabilities up to par, and even its 3D graphics performance was something of a pleasant surprise—like finding, perhaps, that you kind of enjoy eating Vegemite, somehow. Intel’s integration of graphics into the processor package and its, err, casual attitude about gaming performance, driver compatibility, and the user experience exacerbate some of my long-held concerns about about the future of the PC platform, though. I’m not sure how the average consumer is supposed to tell the difference between a “mainstream” game that will run well on his Clarkdale IGP and an “enthusiast” game that might come up with nothing but a blank screen for the main menu.
Having said that, I can see the potential for the total package of technologies provided by Clarkdale to succeed in many areas—not least of which, given its power draw and improved IGP, is the mobile space. That’s why we’ve also taken a look at Arrandale, Clarkdale’s mobile twin, today. Go see what we found.