Trouble was, the Core i7’s most dramatic performance gains were largely confined to specific types of applications, many of which have little relevance to the average computer user. On top of that, the price of entry for a Core i7-based system was fairly steep. The CPU themselves weren’t especially cheap, nor were the motherboards, and one had to populate those boards with six memory modules to achieve optimal performance. All of this was a natural consequence of the fact that those first Core i7 products were repurposed silicon mainly intended for servers and workstations in the guise of Nehalem Xeons, roles for which those CPUs are exceptionally well suited.
Thus, the Core i7 has held undisputed technology leadership in desktop processors, but Intel’s older Core 2 technology has remained the bread and butter of its product lineup. Against this less potent opposition, a resurgent AMD has made headway in the middle of the market all year with the steady improvements to the Phenom II. Today, however, Intel is officially taking the wraps off of a new weapon in its arsenal: the chip code-named Lynnfield, which looks to bring the native quad-core Nehalem microarchitecture into the mainstream. Thanks to some clever engineering and integration, this processor promises to enable systems that are faster, cheaper, quieter, smaller, and more energy efficient than prior desktop PCs.
You have, perhaps, heard such claims before, no? The question, then, is whether Intel has really pulled off such a feat with Lynnfield.
A brief Nehalem refresher
Lynnfield is simply a new implementation of the same Nehalem microarchitecture inside the first Core i7 processorsthose chips were code-named Bloomfield. Nehalem is, in turn, an evolutionary step beyond the familiar Core 2, but with a heaping helping of consequential changes, especially to the system architecture. Nehalem consolidates four execution cores onto a single piece of silicon, integrates an on-die memory controller, and eliminates the front-side busadopting a system layout familiar from AMD’s Athlon 64 and its descendants, although Intel’s version of the same is intended to be faster and better.
Despite the new plumbing around them, Nehalem’s execution cores are still based on the Core 2’s, but they have been tweaked in ways big and small. For example, changes to instruction decoding and branch prediction bring higher performance per clock. Tweaks to the internal memory subsystem complement the revamping of the whole memory hierarchy, which has been tuned for the freer flow of data and instructions. Each core has 32KB L1 data and instruction caches and a 256KB dedicated L2 cache. The third-level cache is larger, at 8MB, and is shared by all four cores; as a result, the L3 cache is crucial to inter-core communication.
One of the big highlights on Nehalem’s feature list is the return of simultaneous multithreading (SMT), better known in Intel marketing-speak as Hyper-Threading. Each Nehalem core can track and execute two hardware threads at once to make better use of its rich, four-issue-wide execution resources. Although Hyper-Threading proved to be a bit of a mixed blessing in the Pentium 4, Nehalem’s SMT implementation has proven to be a nearly unqualified win in the server and workstation markets, and it shows real promise for the desktop, too, for reasons we’ll soon explore.
Nehalem’s L3 cache, memory controller, and off-chip I/O components are separate from the quad execution cores and together make up what Intel calls the chip’s “uncore.” The uncore is clocked separately from the cores and has its own power states, as well. As you might be gathering, this design is really quite modular, and Intel has played mix-and-match with the uncore elements of the base Nehalem microarchitecture while cooking up Lynnfield. For a deeper look at Nehalem itself, I suggest reading our reviews of the original Core i7 and the Xeon 5500 series.
Romping through Lynnfield
The most pertinent question today is how Intel has adapted the Nehalem microarchitecture to suit the mainstream market. Those changes are oriented toward related goals of cost reduction and integration.
The most obvious change is a move from three memory channels in Bloomfield to two channels in Lynnfield. Dual channels have been the standard in desktop PCs for ages now, and Lynnfield regresses to the mean. Those memory channels are potent, thoughtwin sets of DDR3 DIMMs, with officially sanctioned speeds up to 1333MHz, although higher frequencies are possible on some models, if you’re willing to be an outlaw. The bump up in memory clocks somewhat offsets the loss of a channel, since Bloomfield is officially limited to 1066MHz RAM.
Lynnfield also, somewhat surprisingly, does away with the QuickPath Interconnect used on prior Nehalem incarnations. In its place are 16 lanes of built-in, on-die PCI Express 2.0 connectivity, used to connect the CPU directly to a discrete graphics card. These PCIe lanes can be split, if needed, into twin x8 links for use with dual graphics cards. Although this split arrangement is technically less than best, we’ve found it to be indistinguishable from the 32-lane X58 Bloomfield chipset in our initial tests with SLI and CrossFireX. The on-die integration of PCIe has the potential to reduce CPU-to-device latency over PCIe, which may turn out to be preferable to more lanes and higher latency, once GPU makers have had the opportunity to tune their drivers to take advantage of it.
A block diagram of the Lynnfield system architecture. Source: Intel.
The other bit of novel I/O in Lynnfield’s uncore is a relatively pedestrian 2GB/s DMI link used to talk to Intel’s single-chip core logic solution, the P55 platform controller hub, or PCH. We have a rather complete review of the P55 online today, with a look at motherboard solutions from three major manufacturers, so I won’t dwell too much on its specifics.
Consider, however, what moving to single-chip solution saves in terms of space, power, and thus cost. Intel claims this dual-chip (CPU and PCH) solution offers a roughly 40% reduction in package size versus the Core 2 and friends. Intel reckons Nehalem’s power-saving mojo reduces idle power consumption for the CPU alone by up to 50%. Not only that, but the 95W maximum power envelope, or thermal design power (TDP), of Lynnfield processors now encompasses the major PCIe links, and what’s left fits inside the P55 chip, which has a tiny 4.7W TDP. Compare that a 22W TDP for the P45 north bridge and 4.5W for the ICH10R south bridge, along with a 95W Core 2 processor. Platform power consumption for Lynnfield systems should be down substantially.
One upshot of this integration should be the flourishing of motherboards and systems that pack tremendous power into smaller form factors. Fewer chips, simpler power delivery and cooling, and easier routing all help on this front. Already, major players like Gigabyte have introduced feature-rich mATX P55 motherboards, and I expect Mini-ITX solutions will be forthcoming, as well.
With that said, Lynnfield itself isn’t exactly a small chip. Both Lynnfield and Bloomfield are manufactured on Intel’s 45nm high-k fab process. At 774 million transistors and 296 mm², Lynnfield is actually larger than Bloomfield (731M transistors and 263 mm²). Based on the die shot above, it appears much of Lynnfield’s additional die area is concentrated in its PCIe block, which clearly occupies more area than Bloomfield’s two QPI blocks (picture here.) Lynnfield also slightly outweighs AMD’s 45-nm Phenom II, which packs 758M transistors into 258 mm².
Meet the Core i5 and the Core i7-800 series
Much has been written already about Intel’s naming schemes for Nehalem-derived products, so I won’t rehash any of those arguments. What you need to know is that Lynnfield introduces a new LGA1156 socket type and is thus incompatible with older Core i7 motherboards. Lynnfield chips will initially slot into two product lines, the Core i5-700 series and the Core i7-800 series. Bloomfield CPUs will coexist as higher-end specimens and members of the Core i7-900 series. Intel has three initial models based on Lynnfield, although I’d expect more to come eventually.
|Core i5-750||4||4||2.66 GHz||8MB||2.13
|Core i7-860||4||8||2.8 GHz||8MB||2.4
|Core i7-870||4||8||2.93 GHz||8MB||2.4
Note that the lone Core i5 processor supports only four threads. In other words, Hyper-Threading has been disabled in this chip to differentiate it from the more expensive models. Another way the Core i5 differs from it siblings is its 2.13GHz uncore speed. That clock is important because it contributes to another Core i5-750 limitation: the max memory speed, without overclocking the base system clock, is 1333MHz. With a 2.4GHz uncore, the Core i7-800-series chips can hit 1600MHz memory speeds without extra help. The uncore speed also determines the clock speed of the L3 cache, another little Lynnfield control knob that will impact performance, if not tremendously.
Still, at $199, the Core i5 is squarely in the spotlight as the product with the broadest potential audience of the bunch. The closest competition from AMD is the Phenom II X4 955, which is selling for $189 at Newegg right nowa big discount off list price and a signal that AMD has anticipated Lynnfield’s debut in its pricing. Intel has less of an incentive to make outgoing Core 2 products attractive in the face of the Core i5; the Core 2 Quad Q9550 is selling for $219.99 at Newegg presently, and the firm says it has no plans to reduce Core 2 prices upon the Core i5’s introduction.
At $249, the Phenom II X4 965 is probably the closest competitor to the Core i7-860 from AMD. Other CPUs in its price range include the Core 2 Quad Q9650 at $320 online and the Core i7-920 at $280.
One rung up, well, AMD has no real rival to the Core i7-870, although the i7-870 does match up pretty closely against the Core i7-950 at $570. Keep these match-ups in mind as we move into our test results.
Now, forget what you just read about clock speeds. I’ve been holding out on telling you about one of Lynnfield’s most notable features because I wanted to get that product information into the mix first. Like all Nehalem-derived products, Lynnfield chips have an onboard microcontroller dedicated to power management. This controller governs dynamic clock speed scaling schemes like SpeedStep for power savings, and it can be programmed via firmware to implement different algorithms, to tune for higher performance or lower power consumption, and so on.
This microcontroller contributes to Nehalem-based chips’ impressive low power draw at idle, but it also enables a nifty little feature called Turbo Boost, which may be familiar from the Core i7-900 series. Turbo Boost can opportunistically raise clock speeds beyond their usual peaks, momentarily and dynamically, using the same P-state mechanism as SpeedStep.
The idea here is to take advantage of any additional thermal headroom available when the processor is under loadeither partially or fully. Uniquely, Nehalem includes a switch that can shut off power to an idle core entirely, eliminating even the leakage power that core would otherwise consume. Shutting down a core in this way opens up additional thermal headroom, so the remaining, engaged cores can ramp up their clock speeds and boost performance. Even with all four cores active, a chip may have some additional thermal headroom, and Turbo Boost can take advantage.
Bloomfield chips have Turbo Boost, but it’s a relatively conservative version. With one thread active, a Core i7-900-series chip can raise its clock speed by up to two “ticks” or increments of the 133MHz base clock. With two or more threads loading up cores, the chip can go up to 133MHz above stock. That nets you a little bit more performance, especially because a Core i7-975 Extreme rated at 3.33GHz will spend a lot of its time at 3.46GHz, but it’s not exactly eye-popping.
With its Lynnfield products, Intel has become much more aggressive with Turbo Boost tuning. The table below outlines the clock speeds possible with Turbo Boost doing its thing in the various Lynnfield models.
Turbo Boost speed
|Core i5-750||2.66 GHz||2.8 GHz||2.8 GHz||3.2 GHz||3.2 GHz|
|Core i7-860||2.8 GHz||2.93 GHz||2.93 GHz||3.33 GHz||3.46 GHz|
|Core i7-870||2.93 GHz||3.2 GHz||3.2 GHz||3.46 GHz||3.6 GHz|
The two Core i7-800-series processors have the most aggressive Turbo Boost tuning. What you’re looking at here could amount to a pretty substantial jump in performance for single- and dual-threaded applications, includingyepgames. The Core 2 Quad Q9650 tops out at 3GHz. Between the clock-for-clock performance gain and the jump to 3.46 or 3.6GHz, the Core i7-870 should be markedly faster with such applications.
Interestingly enough, Turbo Boost speeds are not guaranteed by Intel, will depend on the thermal properties of the individual chip in question, and as I understand it, are also dependent on having good CPU cooling. However, my experience with various Nehalem-derived processors suggests that the clock speed on the product label isn’t the speed at which the CPU will typically run under load. You can probably expect something more, with one tick up as a functional minimum.
This development also means that CPU performance is no longer exactly deterministic, which creates some emotional issues for me as a lab testing guy. One understands that taking as much performance as the thermal headroom will allow is a sensible behavior, especially now that thermals are the primary CPU performance constraint. Still, one senses there is no going back, and things will only become more complicated from here.
Another outgrowth of the Lynnfield Turbo mechanism is that we cannot use higher-end chips to exactly replicate the performance of lower-end products. We found this out when we investigated simulating a Core i7-860 with a Core i7-870. Although we know what the correct Turbo Boost ratios are for the i7-860, the ratios on the i7-870 cannot be modified via the BIOS, either on our Gigabyte P55-UD6 testbed motherboard or on the Intel DP55KG. As a result, we unfortunately don’t have performance results for the Core i7-860 in the following pages. We’ll have to acquire a specimen of the actual product in order to test its performance.
A little special sauce for Windows 7
You may have noticed that Lynnfield and Windows 7 are hitting the market not far from one another, and PCs based on both should be common during the fall buying season. Intel says it has worked with Microsoft on several specific optimizations for Windows 7, the most intriguing of which is a feature called “SMT parking.”
The basic notion behind SMT parking is that the Windows scheduler will attempt to schedule threads so that all physical cores are occupied before any core gets two threads scheduled on its two front-ends (or logical cores). Since Hyper-Threading involves some cache partitioning and other forms of resource sharing, this is a potentially important feature. We’ve seen scheduler quirks cause poor and oddly unpredictable performance on Core i7 processors in the past. Based on our limited experience testing with Windows 7 and a cadre of SMT-enabled processors for this review, our initial impressions of SMT parking are positive. We’ve seen performance results for executables that rely on the Windows scheduler for thread allocation that match the performance of executables with explicit, SMT-aware thread affinity built in. Our initial sense is that SMT parking blunts some potential disadvantages of Hyper-Threading, making it more of an unqualified win, even on the desktop.
A look at the new socket and package
Below are some pictures of the LGA1156 socket and the Core i5-750 processor in its new package. I believe they’re self-explanatory.
The LGA1156 CPU retention mechanism has the socket cover sliding beneath a bolt head
LGA1156 crams 1156 pins into a socket with the same basic dimensions as its LGA775 predecessor
The advent of Windows 7 prompted us to throw out our old CPU test results and start fresh with all new, well, just about everythingnew OS, new drivers, new revisions of nearly every application we use in testing. We also took this opportunity to freshen up our CPU test rigs, an endeavor made possible by a number of folks in the industry.
Here’s a look at one of our Lynnfield processors sitting in a Gigabyte P55-UD6 motherboard. Notice the Corsair Dominator modules nestled into those DIMM slotsthose are purpose-tuned for Lynnfield processors, capable of running at 1600MHz with a CAS latency of 8 at only 1.65V, which is the max voltage Intel recommends putting through its integrated memory controllers. We chose to run our RAM at an officially sanctioned 1333MHz for this first test, but I expect we’ll explore performance at higher memory clocks soon.
Asus kicked in some higher-clocked GeForce GTX 260 cards like this one for our test systems. This change was prompted mainly by my desire to aim for lower power draw at idle on our test systems. Right now, Nvidia GPUs draw quite a bit less at idle than comparable Radeons, which is one reason our test systems have finally dipped below the 100W mark, as you’ll see. Upgrading to these from our previous Radeon HD 4870 cards also got us almost twice the video RAM, higher performance, and quieter operation in the confines of Damage Labs.
The biggest noise reduction among our test rig upgrades, though, was easily the move to PC Power & Cooling Silencer 610W PSUs. We’ve downsized a little on wattage here, too, in an attempt to chase more efficiency at idle while retaining the proper connector payload for anything we might wish to do with our test systems.
Last but not least, we have our 1TB WD RE3 hard drives. With much higher transfer rates than our old Caviar SE16 320GBs, these drives have driven up our WorldBench scores across the board. We chose these drives over SSDs in part because we had some trepidation over new versus used-state performance issues in SSDs, which might lead to inconsistent test outcomes.
Thanks to Gigabyte, Corsair, Asus, OCZ, and WD for making this major upgrade of our test systems possible.
I should note several other things before we go on. First, one of our processor speed grades is simulated; the Core i7-950 is actually a Core i7-975 Extreme underclocked to the appropriate speeds. That’s fine for performance analysis, but it’s not always an exact match on power consumption, so we’ve left the Core i7-950 out of our power tests. Somewhat similarly, the Core 2 Quad Q9550 we used for testing was actually the Q9550S low-power variant, since that was what we had on hand. Its performance should be identical to the regular Q9550, but we’ve excluded it from our power consumption tests, since we’re not focusing on the low-power segment here.
Second, we decided to make a switch from averaging the results of multiple test runs, as we have typically done in the past, to reporting the median. This change was prompted by a number of considerations. We like how the median filters out statistical outliers, which are increasingly common as CPU performance becomes more variable and tests become more complex. We like that the numbers we report are now actual results produced by test run, and in cases where we graph performance over time from a single run, we can choose to display the data that corresponds to that median run. We like, uh, several other things I can’t remember right now. Point is: we saw enough upside to give this a try. We’ll see whether it works out long-term.
Finally, after consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for the first time in a desktop processor performance test, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world performance penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled. We did disable these power management features to measure cache latencies, but otherwise, it was unnecessary to do so.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we reported the median of the scores produced.
Our test systems were configured like so:
2 Duo E8600 3.33 GHz
Core 2 Quad Q9550 2.83 GHz
Core 2 Quad Q9650 3.00 GHz
i5-750 2.66 GHz
Core i7-870 2.93 GHz
i7-920 2.66 GHz
Core i7-950 3.06 GHz
i7-975 Extreme 3.33 GHz
II X2 550 3.1GHz
4.0 GT/s (2.0 GHz)
Matrix Storage Manager 18.104.22.1682
Matrix Storage Manager 22.214.171.1243
Matrix Storage Manager 126.96.36.1993
Matrix Storage Manager 188.8.131.523
|CAS latency (CL)||8||8||7||8||8|
|RAS to CAS delay (tRCD)||8||8||7||8||8|
|RAS precharge (tRP)||8||8||7||8||8|
|Cycle time (tRAS)||20||20||20||20||20|
with Microsoft 6.1.7600.16385 drivers
with Realtek 184.108.40.20619 drivers
with Realtek 220.127.116.1119 drivers
with Realtek 18.104.22.16819 drivers
with Realtek 22.214.171.12419 drivers
RE3 WD1002FBYS 1TB SATA
ENGTX260 TOP SP216 (GeForce GTX 260) with ForceWare 190.62 drivers
7 Ultimate x64 Edition RTM
Power & Cooling Silencer 610 Watt
The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.
We used the following versions of our test applications:
- SiSoft Sandra 2009.9.15.124
- Stream 5.8 64-bit
- CPU-Z 1.52.2
- WorldBench 6 Gold
- Left 4 Dead build 3939
- Crysis Warhead 1.1 64-bit
- Far Cry 2 1.03
- Wolfenstein 1.1
- Valve VRAD map build benchmark
- Valve Source Engine particle simulation benchmark
- Cinebench R10 64-bit Edition
- POV-Ray for Windows 3.7 beta 34 64-bit
- 7-Zip 4.65 64-bit
- notfred’s Folding benchmark CD generated 8/25/09
- The Panorama Factory 5.3 x64 Edition
- Windows Live Movie Maker 14
- x264 HD benchmark 2.0 with x264 version 0.59.819
- LAME MT 3.97a 64-bit
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
This bit of geeky goodness gives us a peek into the cache hierarchy of each chip at various block sizes to see how much bandwidth is available. Not surprisingly, the Lynnfield Core i5-750 and Core i7-870 look very similar to the Core i7-975, only a bit slower. They also look more similar to the Phenom II, with its three-level cache hierarchy, than to the Core 2 Quad Q9650, with its large L2 caches and no L3.
The CPUs sort themselves out pretty clearly by socket type here. Notice how close the Lynnfield processors, with dual channels of 1333MHz memory, come to matching the Core i7-920 and 950, with three channels of 1066MHz RAM. The Phenom II, which also has dual channels of DDR3 at the same clock speed and timings, can’t quite match the Lynnfield chips in measured throughput. The poor old Core 2 is limited by its front-side bus and can’t reach half the bandwidth the Lynnfield chips deliver.
Having only two channels of memory actually gives the Lynnfield processors an advantage in access latencies over the Core i7-900 series. Even the Core i7-975 Extreme, with which we used 1333MHz memory, isn’t as quick to memory as the Core i5-750 and i7-870. Once again, the Lynnfield memory subsystem outperforms a very similarly configured Phenom II, as well.
Now, we’ll take a closer (and gratuitously indulgent) look at access latencies across the board with our ridiculous 3D graphs.
All told, the Lynnfield chips have some of the lowest access latencies we’ve ever measured in a processor. Interesting fact: we measured the Lynnfield chips’ L1 cache latency at three cycles, whereas the Bloomfield Core i7 L1 caches measured consistently at four cycles. Also, Lynnfield L2’s latency was eight cycles, while Bloomfield’s was 11. I believe L3 latencies are best left in nanoseconds as we’ve reported them in the graphs above, since L3 cache speeds are independent of core clocks.
We measured Warhead performance using the FRAPS frame-rate recording tool and playing over the same 60-second section of the game five times on each processor. This method has the advantage of measuring real gameplay, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations.
We tested at relatively modest graphics settings, 1024×768 resolution with the game’s “Mainstream” quality settings, because we didn’t want our graphics card to be the performance-limiting factor. This is, after all, a CPU test.
The new Core i5 and i7 processors kick off our real-world performance tests with a flourish, finishing second and third behind a thousand-dollar processor. The Core 2 and Phenom II are quite a bit slower, both in average frame rates and in the lows.
To give you a closer look at things, here are frame rates over time from a single, hopefully representative test run.
That last graph gives us the clearest impression of the competitive situation among some key contenders. The Lynnfield chips produce frame rates that are pretty consistently between 10 and 20 FPS higher than the fastest Core 2 Quad or Phenom II.
Far Cry 2
We decided to try something new with Far Cry 2, as well, and test with its built-in benchmark tool to give us some automation and more repeatable results. This tool should be a little more realistic than your average timedemo, because it’s purported to keep the game’s AI and physics engines active. For this first test, we ran the tool’s “Ranch Small” demo at 1024×768 with DirectX 10 enabled and all of the game’s visual and physical simulation options at their highest settings.
These results pretty closely mirror what we saw in Crysis Warhead, with the Lynnfield processors taking second and third place once againundeniably impressive, especially the showing from the Core i5-750. We can only surmise that its very low memory access latencies must be giving it this unlikely edge over the Core i7-950.
Another trend of note is the relatively poor showing of the high-frequency dual-core processors we’ve included the group, the Core 2 Duo E8600 and the Phenom II X2 550. This isn’t a trend we’ve come to expect, the higher clocked dual-cores falling behind even the slower quad cores like the Core 2 Quad Q9550. We are using newer versions of both of these games, which could have better threading optimizations. I kind of doubt that’s it, though. My stronger suspicions involve Windows 7 and the switch to Nvidia GPUs and graphics drivers. Somewhere along the line, something has changed that’s tipped the balance in the favor of higher core counts.
Incidentally, that fact makes me hesitate to credit Turbo Boost for the strong performance of the Lynnfield processors. If more than two cores are occupied, they’ll only be running at one or two ticks up from stock clocks.
Now, some of you have been pestering me to take GPU limitations into account when testing CPU gaming performance. This game’s automated benchmark tool gave us an opportunity to take a look at that angle, so we did by testing at two additional resolutions: 1280×1024 with 4X antialiasing, and 1600×1200 with 4X antialiasing. When GPU performance limits came into play, well, some expected things happened. And some unexpected.
True to form when a GPU becomes the primary constraint, the pack of CPUs begins to bunch together. Yes, Virginia, it’s true: if your graphics card is too slow for the resoluton and quality settings you’re using, a faster CPU won’t do you much good when gaming. Shocking, I know.
Oddly, though, the CPUs don’t quite converge on the same point. Instead, the Core i5 and i7 processors converge on a point about 10 FPS below the other quad-cores. I’m not sure what the story is herecould be some sort of platform optimization issue. We may have to investigate further when we have time, but we don’t today.
Here’s a shiny new game with a benchmarking function that we can test. We recorded a demo during a multiplayer game on the Hospital map and played it back using the “timeNetDemo” command. The screen resolution was set to 1024×768 with the game’s quality options maxed out. We didn’t enable the game engine’s multithreaded renderer via the CLI, because doing so didn’t produce higher performance; instead, we tested at the game’s default settings.
Left 4 Dead
We also used a custom-recorded timedemo with Valve’s most excellent zombie shooter, Left 4 Dead. Here, we tested at 1280×1024 with 16X anisotropic filtering enabled and all of the game’s quality options cranked, though these settings were quite apparently not GPU limited. The game’s multicore rendering option was enabled by default.
Unlike the two games on the previous page, both of these games run exceptionally, preposterously well on all of the processors we tested. Nevertheless, the new Core i5-750 and i7-870 remain near the top of the charts.
Source engine particle simulation
Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.
This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.
The Core i7-870 gets a nice boost from Hyper-Threading here, allowing it to nearly double the performance of the fastest Phenom II available. Without HT, the i5-750 can’t break from the pack like its stable mate does.
WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.
The Lynnfield processors’ strong showing continues in WorldBench, which is in many ways a world apart from the gaming tests on the previous pages.
Productivity and general use software
MS Office productivity
Firefox web browsing
Multitasking – Firefox and Windows Media Encoder
WinZip file compression
7-Zip compression and decompression
Note that 7-Zip is a new addition to our test suite and not part of WorldBench. As the results indicate, this application is very nicely multithreaded and shows us the true potential of our multicore and multithreaded processors. Here’s another case where the Core i7-870’s Hyper-Threading separates it cleanly from the Core i5-750. Without HT, the i5-750 performs about like the Phenom II X4 955.
Nero CD authoring
We’ve long known the Nero test was limited by disk controller performance as much as anything, and that has long been a problem for AMD chipsets. This time around, rather than use AMD’s problematic AHCI driver, we opted for the Microsoft AHCI driver built into Windows 7, instead. We’ve found that driver to offer higher throughput than AMD’s, at the expense of some additional CPU utilization. True to form, the AMD systems’ Nero scores are much better than they’ve been in the past, relatively speaking, with the exception of the Phenom II X2 550, which doesn’t have an extra core to give to the cause.
This test is also somewhat disk-controller limited on the AMD platform, it seems. The Core i5-750 struggles a little bit here, curiously enough, compared to the typically slower Core 2 Quad processors.
The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.
In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. So this time around, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.
Rough day for the Phenom II, eh?
x264 HD benchmark
This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.
The first pass really only uses four threads effectively, but the second one is more widely multithreaded, which gives the Core i7-870 a nice boost.
Windows Live Movie Maker 14 video encoding
For this test, I used Windows Live Movie Maker to transcode a 30-minute TV show, recorded in 720p .wtv format on my Windows 7 Media Center system, into a 320×240 WMV-format video appropriate for mobile devices.
The Lynnfield chips continue to impress. Not much more I can say about that.
Windows Media Encoder video encoding
Roxio VideoWave Movie Creator
I’ve included these last two video-related tests for the sake of completeness, because they contribute to the WorldBench overall score. They are, however, older applications that are obviously not multithreaded, which any video editing app worth its salt would be these days. I wouldn’t read too much into these results.
LAME MT audio encoding
LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.
Something interesting to note here: the Core i7-870 ties the Core i7-975 Extreme for the fastest single-threaded encode time with both compiled versions of the app. Those processors share a top Turbo Boost speed of 3.6GHz with one core active.
The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.
Turbo Boost grants the Core i7-870 the second-best score in the single-threaded rendering test, and then Hyper-Threading gives it the third-best showing in the multithreaded rendering test, right behind the Core i7-950.
Here’s something interesting to think about.
We typically think in terms of performance scaling, from one to many cores, when talking about processors. Generally, a bigger performance boost going from one thread to many is considered a positive sign. Yet the aggressive Turbo Boost function in Lynnfield pulls in the other direction by improving single-threaded performance. In terms of performance gained from one thread to many, the Core i5-750 finishes last among the quad-core processors herenot that there’s anything wrong with that. We just need to recalibrate our thinking. Conversely, Hyper-Threading opens up the possibility of much better performance scaling than one might expect, as the over 400% improvement in the Core i7-975’s performance indicates. Of course, the bottom line in any case is absolute performance.
We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support. Some of the beta 64-bit executables have been quite a bit slower than the 3.6 release, but this should give us a decent look at comparative performance, regardless.
In these rendering tests, the Phenom II X4 processors are giving the Core i5-750 stiff competition.
3ds max modeling and rendering
Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into games like Half-Life 2.
The Core i7-870 finishes these last two rendering tests in style, separating again from the Core i5-750 thanks to Hyper-Threading.
Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.
The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.
notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.
On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.
This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.
The Hyper-Threaded processors don’t fare well in the individual tests since they’re being asked to run eight threads simultaneously throughout. Once we reach our final points per day projection, though, the i7-870 cleans up. The Core i5-750, meanwhile, is only 20 points ahead of its closest Phenom II rival.
Power consumption and efficiency
Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire systemthe CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.
All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows’ “Balanced” power options profile.
We can slice up these raw data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.
This is where Lynnfield’s integration pays off. The Core i5-750 and i7-870 systems idle at roughly 20W lower than the Core 2 Quad or Phenom II X4 systems. The reduction from the Bloomfield system is even more dramatic.
Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.
The Lynnfield chips’ power consumption under load isn’t quite so tamethere’s quite a bit of dynamic range in these designs in terms of power draw. Still, you can probably already surmise that no competing processor offers as much performance for the power consumed.
We can quantify power efficiency by looking at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.
We can quantify efficiency in an even more focused manner by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.
The Core i5-750 and Core i7-870 trade places at the top of the leaderboard in these last couple of tests. However you choose to quantify it, though, these Lynnfield processors are the most power-efficient desktop CPUs around (outside of low-power specials, of course), and overall, it’s not even close.
The Lynnfield chips’ combination of price, performance, and power efficiency effectively clears the field in the desktop CPU market, leaving little room for competition from the Phenom II or older, cheaper Core 2 Quad processorsor even faster, pricier Core i7s.
Not only does the Core i5-750 outperform its like-priced would-be competitor, the Phenom II X4 955, but it also beats out the Phenom II X4 965 overall. That reality hit home most acutely, perhaps, in our gaming tests, where the Lynnfield chips simply excelled. Nah, you don’t need the fastest CPU to run most games well these days, but Intel’s new processors have a distinct advantage on this front. AMD will have to slash its prices further to remain competitive from a price-performance standpoint, but even then, the Phenom II X4 965 has a 140W TDP and the i5-750 has a 95W TDP. That 45W difference is reflected almost precisely in our peak system power draw results. At idle, the Phenom II X4 965 draws 22W more, as well. AMD is unlikely to field a truly attractive alternative to this $199 processor without dipping below the $150 mark. Otherwise, how could one avoid the temptation to step up?
Meanwhile, the Core i7-870 performs at least as well as the Core i7-950 overall, and it does so on a cheaper, more power-efficient platform. I could see Intel killing off everything in the Core i7-900 series except for the 975 Extreme, leaving the LGA1366 socket as an ultra-high-end, boutique kind of offering. I doubt anyone would mind. The Core i7-870 is all the processor any enthusiast needs, except for the crazy people with credit card limits much higher than their IQs. (No offense, crazy guys. Just joshing, you know. No stalky-stalky, please.)
Speaking of crazy things, after seeing our test results, I’m puzzled by the fact that Intel didn’t choose to put its best foot forward by offering us a peek at the Core i7-860. I think its higher Turbo Boost speeds, faster uncore clock, 1600MHz memory capability, Hyper-Threading, and $285 price tag are likely to make it the best overall value of the nascent Lynnfield lineup. One way or another, we’ll have to get our hands on one soon. You may have to, as well, if you know what’s good for you.