Intel’s Core i7 processors

Those of us who are conversant with technology are more or less conditioned to accept and even expect change as a natural part of the course of things. New gadgets and gizmos debut regularly, each one offering some set of advantages or refinements over the prior generation. As a result, well, you folks are a rather difficult lot to impress, frankly speaking. But today is a day when one should sit up and take notice. I’ve been reviewing processors for nearly ten years now, and the Core i7 processors we’re examining here represent one of the most consequential shifts in the industry during that entire span.

Intel, as you know, has been leading its smaller rival AMD in the performance sweeps for some time now, with a virtually unbroken lead since the debut of the first Core 2 processors more than two years ago. Even so, AMD has retained a theoretical (and sometimes practical) advantage in terms of basic system architecture throughout that time, thanks to the changes it introduced with its original K8 (Athlon 64 and Opteron) processors five years back. Those changes included the integration of the memory controller onto the CPU die, the elimination of the front-side bus, and its replacement with a fast, narrow chip-to-chip interconnect known as HyperTransport. This system architecture has served AMD quite well, particularly in multi-socket servers, where the Opteron became a formidable player in very short order and has retained a foothold even with AMD’s recent struggles.

Now, Intel aims to rob AMD of that advantage by introducing a new system architecture of its own, one that mirror’s AMD’s in key respects but is intended to be newer, faster, and better. At the heart of this project is a new microprocessor, code-named Nehalem during its development and now officially christened as the Core i7.

Yeah, I dunno about the name, either. Let’s just roll with it.

The Core i7 design is based on current Core 2 processors but has been widely revised, from its front end to its memory and I/O interfaces and nearly everywhere in between. The Core i7 integrates four cores into a single chip, brings the memory controller onboard, and introduces a low-latency point-to-point interconnect called QuickPath to replace the front-side bus. Intel has modified the chip to take advantage of this new system infrastructure, tweaking it throughout to accommodate the increased flow of data and instructions through its four cores. The memory subsystem and cache hierarchy have been redesigned, and simultaneous multithreading—better known by its marketing name, Hyper-Threading—makes its return, as well. The end result blurs the line between an evolutionary new product and a revolutionary one, with vastly more bandwidth and performance potential than we’ve ever seen in a single CPU socket.

How well does the Core i7 deliver on that potential? Let’s find out.

An overview of the Core i7

The Core i7 modifies the landscape quite a bit, but much of what you need to know about it is apparent in the picture of the processor die below, with the major components labeled.

The Core i7 die and major components. Source: Intel.

What you’re seeing, incidentally, is a pretty good-sized chip—an estimated 731 million transistors arranged into a 263 mm² area via the same 45nm, high-k fabrication process used to produce “Penryn” Core 2 chips. Penryn has roughly 410 million transistors and a die area of 107 mm², but of course, it takes two Penryn dies to make one quad-core product. Meanwhile, AMD’s native quad-core Phenom chips have 463 million transistors but occupy a larger die area of 283 mm² because they’re made on a 65nm process and have a higher ratio of (less dense) logic to (denser) cache transistors. Then again, size is to some degree relative; the GeForce GTX 280 GPU is over twice the size of a Core i7 or Phenom.

Nehalem’s four cores are readily apparent across the center of the chip in the image above, as are the other components (Intel calls these, collectively, the “uncore”) around the periphery. The uncore occupies a substantial portion of the die area, most of which goes to the large, shared L3 cache.

This L3 cache is the last level of a fundamentally reworked cache hierarchy. Although not clearly marked in the image above, inside of each core is a 32 kB L1 instruction cache, a 32 kB L1 data cache (it’s 8-way set associative), and a dedicated 256 kB L2 cache (also 8-way set associative). Outside of the cores is the L3, which is much larger at 8 MB and smarter (16-way associative) than the L2s. This basic arrangement may be familiar from AMD’s native quad-core Phenom processors, and as with the Phenom, the Core i7’s L3 cache serves as the primary means of passing data between its four cores. The Core i7’s cache setup differs from the Phenom’s in key respects, though, including the fact that it’s inclusive—that is, it replicates the contents of the higher level caches—and runs at higher clock frequencies. As a result of these and other design differences, including a revamped TLB hierarchy, the Core i7’s cache latencies are much lower than the Phenom’s, even though its L3 cache is four times the size.

One mechanism Intel uses to make its caches more effective is prefetching, in which the hardware examines memory access patterns and attempts to fill the caches speculatively with data that’s likely to be requested soon. Intel claims the Core i7’s prefetching algorithm is both more efficient than Penryn’s—some server admins wound up disabling hardware prefetch in Xeons because it harmed performance with certain workloads, a measure Intel says should no longer be needed—and more aggressive, as well.

The Core i7 can get to main memory very quickly, too, thanks to its integrated memory controller, which eliminates the chip-to-chip “hop” required when going over a front-side bus to an external north bridge. Again, this is a familiar page from AMD’s template, but Intel has raised the stakes by incorporating support for three channels of DDR3 memory. Officially, the maximum memory speed supported by the first Core i7 processors is 1066 MHz, which is a little conservative for DDR3, but frequencies of 1333, 1600, and 2000 MHz are possible with the most expensive Core i7, the 965 Extreme Edition. In fact, we tested it with 1600 MHz memory, since this is a more likely configuration for a thousand-dollar processor.

For a CPU, the bandwidth numbers involved here are considerable. Three channels of memory at 1066 MHz can achieve an aggregate of 25.6 GB/s of bandwidth. At 1333 MHz, you’re looking at 32 GB/s. At 1600 MHz, the peak would be 38.4 GB/s, and at 2000 MHz, 48 GB/s. By contrast, the peak effective memory bandwidth on a Core 2 system would be 12.8 GB/s, limited by the throughput of a 1600MHz front-side bus. With dual channels of DDR2 memory at 1066MHz, the Phenom’s peak would be 17.1 GB/s. The Core i7 is simply in another league. In fact, our Core i7-965 Extreme test rig with 1600MHz memory has the same total bus width (192 bits) and theoretical memory bandwidth as a GeForce 9600 GSO graphics card.

With the memory controller onboard and the front-side bus gone, the Core i7 communicates with the rest of the system via the QuickPath interconnect, or QPI. QuickPath is Intel’s answer to HyperTransport, a high-speed, narrow, packet-based, point-to-point interconnect between the processor and the I/O chip (or other CPUs in multi-socket systems.) The QPI link on the Core i7-965 Extreme operates at 6.4 GT/s. At 16 bits per transfer, that adds up to 12.8 GB/s, and since QPI links involve dedicated bidirectional pairs, the total bandwidth is 25.6 GB/s. Lower-end Core i7 processors have 4.8 GT/s QPI links with up to 19.2 GB/s of bandwidth. Obviously, these are both just starting points, and Intel will likely ramp up QPI speeds from here in successive product generations. Still, both are somewhat faster than the HyperTransport 3 interconnects in today’s Phenoms, which peak at either 16 or 14.4 GB/s, depending on the chip.

A block diagram of the Core i7 system architecture. Source: Intel.

This first, high-end desktop implementation of Nehalem is code-named Bloomfield, and it’s essentially the same silicon that should go into two-socket servers eventually. As a result, Bloomfield chips come with two QPI links onboard, as the die shot above indicates. However, the second QPI link is unused. In 2P servers based on this architecture, that second interconnect will link the two sockets, and over it, the CPUs will share cache coherency messages (using a new protocol) and data (since the memory subsystem will be NUMA)—again, very similar to the Opteron.

In order to take advantage of this radically modified system architecture, the design team tweaked Nehalem’s processor cores in a range of ways big and small. Although the Core 2’s basic four-issue-wide design and execution resources remain more or less unchanged, almost everything around the execution units has been altered to keep them more fully occupied. The instruction decoder can fuse more types of x86 instructions together and, unlike Core 2, it can do so when running in 64-bit mode. The branch predictor’s accuracy has been enhanced, too. Many of the changes involve the memory subsystem—not just the caches and memory controller, which we’ve already discussed, but inside the core itself. The load and store buffers have been increased in size, for instance.

These modifications make sense in light of the Core i7’s much higher system-level throughput, but they also help make another new mechanism in the chip work better: the resurrected Hyper-Threading, or simultaneous multithreading (SMT). Each core in Nehalem can track two independent hardware threads, much like some other Intel processors, including later versions of the Pentium 4 and, more recently, the Atom. SMT takes advantage of the explicit parallelism built into multithreaded software to keep the CPU’s execution units more fully occupied, and done well, it can be a clear win, delivering solid performance gains at very little cost in terms of additional die area or power use. Intel architect Ronak Singhal outlined how Nehalem’s implementation of Hyper-Threading works at this past Fall IDF. Some hardware, such as the registers, must be duplicated for each thread, but much of it can be shared. Nehalem’s load, store, and reorder buffers are statically partitioned between the two threads, for example, while the reservation station and caches are shared dynamically based on demand. The execution units themselves don’t need to be altered at all.

The upshot of all of this is that a single Core i7 processor supports a total of eight threads, which makes for a pretty wicked looking Task Manager window. Because of the resource sharing involved, of course, Hyper-Threading won’t likely double performance, even the best-case scenario. We’ll look at its precise impact on performance in the following pages.

The changes to Nehalem’s cores don’t stop there, either. Intel has improved the performance of the synchronization primitives used by multithreaded applications, added a handful of instructions known as SSE 4.2—including some for string handling, cyclic redundancy checks, and popcount—and introduced enhancements for hardware-assisted virtualization. There’s too much to cover here, really. If you want more detailed information, I suggest you check out Singhal’s IDF presentation or David Kanter’s Nehalem overview.

Power management and, uh, forced induction

Like AMD’s native quad-core Phenom, the Core i7 can raise and lower the clock speed of each of its processor cores independently and dynamically in response to demand. Unlike the Phenom, though, the Core i7 doesn’t use separate power planes for the cores and the “uncore.” Instead, Intel has put a switch between each core and the voltage regulator output, and power can be shut off to any individual core that goes into the deepest idle state, C6, transparently to software and to the other cores. Because power to the core is shut off, Intel claims even leakage power is eliminated, making that core’s power consumption approximately zero. In the event that all four cores become idle, then the uncore can go into a C6 state, as well, in which most uncore logic is stopped and I/O drops into a low-power mode.

Controlling all of this wizardry in the Core i7 is a dedicated, on-chip microcontroller for power management. This microcontroller is programmable via firmware and can be made to use different algorithms to optimize for, say, the lowest possible power use or for low latencies when stepping up from low-power states. No doubt Intel will use this capability to tune products for diverse segments, giving mobile processors different behaviors than, say, high-performance desktop parts like the ones we’re reviewing here.

One trick that this microcontroller enables is the oh-so creatively named “Turbo mode” built into the Core i7. This feature pushes the active cores beyond their baseline clock frequencies when the CPU isn’t at full utilization. Turbo mode operates according to some simple rules. In the event that a single-threaded application is occupying one core while the rest are idle, Turbo mode will raise clock speeds by as much as two full “ticks” beyond the baseline. For instance, for our Core i7-965 Extreme processor, Turbo mode could raise the multiplier from 24 to 26, or the core clocks from 3.2 GHz to 3.46 GHz, since the base clock in Core i7 systems runs at 133 MHz. With two or more threads active, Turbo mode will only raise clock speeds by one tick. All of this happens automatically using the same basic P-state mechanism as SpeedStep.

The additional clock frequency headroom comes from the fact that a less-than-fully-occupied Core i7 may not run up against the limits imposed by its thermal design power, or TDP—the chip’s specified power envelope. We’ve seen a processor running eight instances of Prime95 stay at “one tick up” for a sustained period of time with good cooling. Then again, Intel has set CPU core voltages individually at the factory for some time now, and it’s quite possible that some chips may not be able to sustain Turbo acceleration within their specified power envelopes for any length of time. As I understand it, that may simply be the luck of the draw, with only the baseline clock speed guaranteed.

Interestingly enough, because the Core i7-965 Extreme Edition doesn’t have a locked upper multiplier, the CPU can be overclocked by tweaking the Turbo mode settings in the BIOS. Intel’s DX58S0 “Smackover” (uh huh) motherboard exposes control over the maximum clock multipliers for one, two, three, and four occupied cores, as well as the ability to adjust the TDP limit in watts and the current limit in amps. You’ll probably want a good aftermarket cooler if you plan to play with these settings. If that’s too fancy for your tastes, one may also choose to disable Turbo mode and overclock via the usual ways, as well—either by raising the multiplier on an Extreme Edition or by cranking up the base clock on any Core i7.

Pricing and availability

Although we are publishing our review of the Core i7 today, products won’t be selling to consumers immediately. Instead, Intel has given us the nebulous target of “in November” for product availability. Beyond that, I have no more information than you about when to expect these things in stores. I can, however, give you pricing and model information. Like this:

Model Clock speed North
bridge speed
QPI
speed
TDP Price
Core
i7-965 Extreme
3.2
GHz
3.2 GHz 6.4
GT/s
130
W
$999
Core
i7-940
2.93
GHz
2.13 GHz 4.8
GT/s
130
W
$562
Core
i7-920
2.66
GHz
2.13 GHz 4.8
GT/s
130
W
$284

All three of the Core i7 processors coming this month are “Bloomfield” chips, so they all have quad cores, three memory channels, and 8 MB of L3 cache onboard. As you can see, though, they do differ in terms of the clock speed of the L3 cache and of what I’ve labeled the “North bridge speed.” That’s basically the clock speed of the “uncore,” but things get a little hairy from there. The uncore includes several elements, including the QPI links, the L3 cache, and the memory controller. Each of these elements may run at different multipliers from the base clock. For the Core i7-965 Extreme, the relationship is straightforward: everything runs at 3.2 GHz, including the QPI link, hence its 6.4 GT/s data rate. In the 940 and 920, the cores run at one speed, the QPI link at another (2.4 GHz), and, as I understand it, the memory controller and L3 cache both run at 2.13 GHz.

One of the implications of the slower memory controller frequency in the Core i7-920 and -940 is that, at least on our Intel “Smackover” board, one cannot achieve DDR3 memory speeds beyond 1066 MHz without overclocking the base system clock, which presents a real risk of instability. The multipliers just aren’t available in the BIOS to go beyond that. We’ll have to see how that works out in practice with enthusiast motherboards from the big names that, uh, aren’t Intel, but it appears DDR 1066 MHz may be a practical limit without overclocking for the 920 and 940, which is a shame.

Expect to see even more variety from Nehalem-derived processors in the future, because the architecture is designed to be modular. Intel may vary the core count, cache sizes, number of QPI links, the presence of Hyper-Threading, and the number of memory channels in future products. We also expect them to integrate a graphics core into some parts. Given what we’ve learned about uncore clocking flexibility, I’d expect some variance there, too. Intel may choose to, say, clock down various parts of the uncore, such as the L3 cache, in lower end or mobile products in order to save on power or to improve yields.

Unfortunately, more affordable variants of Nehalem may be a long time in coming. We know that mainstream desktop Nehalem derivatives are expected to have only two DDR3 memory channels and possible integrated graphics, but those products may not arrive until well into next year. Until then, the Core i7 may remain a rather pricey option, because even the 920 is wedded to motherboards based on the premium X58 chipset. You may, though, want to check out our review of two of the first X58 boards right here.

A new socket, package, and chipset

Obviously, with all of the changes built into the Core i7, retaining compatibility with Intel’s existing LGA775 socket was out of the question. In its place, Intel has introduced the new LGA1366-style socket with, tada, more pins. Betcha can’t guess how many.

Anyhow, this new chip socket and package demands a few pictures, so here you are…


The Core i7 processor


From left to right: An LGA775-style Core 2 processor, a Core i7, and a Socket AM2-based Phenom

A Core i7 mounted in Intel’s DX58S0 motherboard

A close-up of the new LGA1366 socket

As you can see, the Core i7’s new package is relatively large, as these things go. I’d expect a different, smaller socket and package for future mainstream Core i7 derivates.

Matchups to watch

Before we move on to our test results, we should pause to consider several of the key matchups. The most obvious of those is the battle at 3.2GHz, where we pit the Core i7-965 Extreme against the fastest single-socket Core 2 processor, the Core 2 Extreme QX9770. This is, more or less, the clock-for-clock matchup between old and new generations that you’ll want to watch. Only it’s sort of bogus, since Turbo mode means the Core i7-965 Extreme typically runs at 3.33GHz or more.

Also contending at 3.2GHz: a dual-socket rig, the “Skulltrail” system with repurposed Xeons branded as Core 2 Extreme QX9775 processors. We threw this one in for fun, to see how this “ultimate” and “extreme” system would match up against the fastest Core i7. Of course, it’s not a fair fight, but it sure is a fun one.

One of the most intriguing matchups may be the Core i7 versus itself. We’ve tested the 965 Extreme with and without Hyper-Threading enabled, throughout our test suite, to see what different this feature makes. Watch for the “No HT” results to see what happens when Hyper-Threading is disabled.

Then there’s the face-off of the value quad cores, all of which have, at one time, occupied the basic price point at which the Core i7-920 now debuts. The Core 2 Quad Q6600 is a first-generation 65nm Core 2 processor and a long-time favorite here at TR. The 45nm Core 2 Quad Q9300 essentially supplanted the Q6600 and found its way into several of our system guide recommended configs during its time. I’m intrigued to see how the Core i7-920’s performance and value proposition matches up to these two economical quad-core CPUs.

Another quad contender with a nice, low price is the Phenom X4 9950. It’s also AMD’s current top-of-the line processor, so we’ve of course included it. However, AMD’s pricing very much reflects its products’ limited performance, so there’s no direct competition right now between even the Core i7-920 and anything AMD has to offer.

Also in the mix for reference are a couple of higher frequency dual-core processors: the Core 2 Duo E8600, which runs at 3.33GHz and promises to perform very well in lightly threaded applications, and the Athlon 64 X2 6400+. At 3.2GHz, the X2 6400+ is AMD’s highest frequency desktop processor, and it may even upstage the Phenom in single- or dual-threaded apps. These days, though, AMD has chosen to fight Intel’s high-frequency dual cores with its triple core Phenom X3 8750, so we’ve included it, as well.

Test notes

We didn’t become entirely aware of the various flexible uncore clock options for Nehalem until the eleventh hour, and as a result, we’ve only just discovered a problem with one of our test setups. You will see in the table below and throughout the review scores for a “Core i7-940” processor that is really just a Core i7-965 Extreme processor underclocked to the 2.93GHz core clock of the 940 model. Generally, simulating a speed grade of a chip like this isn’t a big problem, at least for performance testing if not power consumption. However, it turns out that, in following the guide Intel offered to us for simulating a 940 with a 965, we (and they) missed a key variable: the “uncore” clock. Ours was running at 3.2 GHz when we simulated the 940, when the proper clock speed is 2.13 GHz. That discrepancy potentially made both the memory controller and the L3 cache quicker than they would probably be in the actual product. We’ve decided to leave the numbers for the 940 in the review, but please realize that they may overstate its performance somewhat. We will try to follow up with more exact numbers in a future article or update.

Special thanks to Corsair for equipping us with all-new memory we used in testing. The most impressive DIMMs they supplied us were part of a special Core i7-tailored three-module kit, pictured above. These puppies ran happily at their rated 8-8-8-24 timings, with a 1T command rate, at 1600 MHz and only 1.65V. The Core i7 memory controller apparently may not deal well with higher voltages, but we found they weren’t necessary with these DIMMs.

Also, thanks to Asus for bringing our Phenom testbed up to date with this M3A79-T Deluxe mobo. We sought this one out because it has a 790FX north bridge combined with AMD’s new SB750 south bridge. Oh, yeah, and check out that CPU cooler, which I was too lazy to remove for the picture (and doing so would have decreased its awesomeness). Don’t put your finger in the fan, folks.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Quad Q6600 2.4 GHz
Core
2 Duo E8600 3.33 GHz

Core 2 Quad Q9300 2.5 GHz

Core
2 Extreme QX9770 3.2 GHz
Dual
Core
2 Extreme QX9775 3.2 GHz
Core
i7-920 2.66 GHz

Core i7-940 2.93 GHz

Core
i7-965 Extreme 3.2 GHz
Athlon
64 X2 6400+ 3.2 GHz
Phenom
X3 8750 2.4 GHz

Phenom X4 9950

Black 2.6 GHz

System bus 1066
MT/s

(266 MHz)

1333
MT/s

(333 MHz)

1600
MT/s

(400 MHz)

1600
MT/s

(400 MHz)

QPI
4.8 GT/s

(2.4 GHz)

QPI
6.4 GT/s

(3.2 GHz)

HT
2.0 GT/s

(1.0 GHz)

HT
3.6 GT/s (1.8 GHz)
HT
4.0 GT/s (2.0 GHz)
Motherboard Asus
P5E3 Premium
Asus
P5E3 Premium
Asus
P5E3 Premium
Intel
D5400XS
Intel
DX58SO
Intel
DX58SO
Asus
M3A79-T Deluxe
Asus
M3A79-T Deluxe
BIOS revision 0605 0605 0605 XS54010J.86A.1149.

2008.0825.2339

SOX5810J.86A.2260.

2008.0918.1758

SOX5810J.86A.2260.

2008.0918.1758

0403 0403
North bridge X48
Express MCH
X48
Express MCH
X48
Express MCH
5400
MCH
X58
IOH
X58
IOH
790FX 790FX
South bridge ICH9R ICH9R ICH9R 6321ESB ICH ICH10R ICH10R SB750 SB750
Chipset drivers INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF Update
9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
update 9.1.0.1007

Matrix Storage Manager 8.5.0.1032

INF
update 9.1.0.1007

Matrix Storage Manager 8.5.0.1032

AHCI
controller 3.1.1540.61
AHCI
controller 3.1.1540.61
Memory size 4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
6GB
(3 DIMMs)
6GB
(3 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
Memory type Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Micron
ECC DDR2-800

FB-DIMM

Corsair
TR3X6G1600C8D

DDR3 SDRAM

Corsair
TR3X6G1600C8D

DDR3 SDRAM

Corsair
TWIN4X4096-8500C5DF

DDR2 SDRAM 

Corsair
TWIN4X4096-8500C5DF

DDR2 SDRAM

Memory
speed (Effective)
1066
MHz
1333
MHz
1600
MHz
800
MHz
1066
MHz
1600
MHz
800
MHz
1066
MHz
CAS latency (CL) 7 8 8 5 7 8 4 5
RAS to CAS delay (tRCD) 7 8 8 5 7 8 4 5
RAS precharge (tRP) 7 8 8 5 7 8 4 5
Cycle time (tRAS) 20 20 24 18 20 24 12 15
Command
rate
2T 2T 2T 2T 2T 1T 2T 2T
Audio Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
6321ESB/STAC9274D5

with SigmaTel 6.10.5713.7 drivers

Integrated
ICH10R/ALC889

with Realtek 6.0.1.5704 drivers

Integrated
ICH10R/ALC889

with Realtek 6.0.1.5704 drivers

Integrated
SB750/AD2000B

with SoundMAX 6.10.2.6480 drivers

Integrated
SB750/AD2000B

with SoundMAX 6.10.2.6480 drivers

Hard drive WD Caviar SE16 320GB SATA
Graphics Radeon
HD 4870 512MB PCIe with Catalyst 8.55.4-081009a-070794E-ATI
drivers
OS Windows Vista Ultimate x64 Edition
OS updates Service
Pack 1, DirectX redist update August 2008

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket system was powered by a PC Power & Cooling Turbo-Cool 1KW-SR power supply. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

So how does the Core i7’s overhauled cache and memory subsystem perform? We can measure it in various ways to find out. Here are a few synthetic benchmarks designed to do just that.

Whoa. The Core i7, she is fast, no? The Core i7-965 Extreme achieves nearly three times the throughput of the fastest single-socket Core 2 processor, the QX9770. With slower 1066 MHz memory, the Core i7-920 and 940 don’t quite reach the same heights, but they’re still much, much faster than anything else.

The Phenoms aren’t performing quite as well here as one might hope, and part of the reason may be because we ran the Phenom’s memory controller in dual 64-bit “unganged” mode rather than 128-bit mode. The 128-bit mode may produce somewhat higher scores in synthetic tests, but we chose to test with unganged mode because its all-around performance could potentially be superior.

The results from this test visually illustrate the throughput of the various levels of the memory hierarchy, and we find that the Core i7’s caches are all quite fast. Even at the 512 kB and 1 MB test blocks, where presumably we’re well into the L3 cache, the Core i7s achieve considerably more throughput than the Penryn-based QX9770.

The results without Hyper-Threading are curious: higher performance in the L1/L2 cache ranges, but lower performance in the L3 range.

Since it’s difficult to see the results once we get into main memory, let’s take a closer look at the 256 MB block size:

Among the Intel processors, these results are relatively similar to what we saw in Sandra’s first memory bandwidth test at the top of the page, though the numbers are lower. However, not only do the AMD processors perform relatively better, but their measured throughput is actually higher here. Still, the Phenom X4 9950 is not even close to the Core i7-920, let alone the faster options.

These results come from a little cachemem-like latency test program included with earlier versions of CPU-Z, and they give us a sense of what the Core i7’s integrated memory controller and revamped cache hierarchy bring to the table. (I’ve assumed “one tick up” Turbo clock speeds for the Core i7 processors in calculating access times.) Despite having a third cache level and a much larger total cache size, the 965 Extreme gets out to main memory as quickly as an Athlon X2 6400+, our previous champ. Remarkable. The Core i7-920, with its slower “uncore” clocks and 1066 MHz memory, is still quicker than most Core 2 chips.

If you think we’ve already geeked out beyond all reasonable hope, don’t scroll down any further. What you’ll see below are 3D graphs of memory access latencies at various block and step size for some of the most interesting processors we tested. We’ve color coded them just as a guide, although it doesn’t mean much. Yellow roughly corresponds to the chip’s L1 cache size, light orange to the L2 cache, red to the L3 cache, and dark orange to main memory.

Intel seems to have better managed the problem of L3 cache latency than AMD did with the Phenom, especially in the 965 Extreme, which runs its L3 cache at a full 3.2GHz.

Crysis Warhead

We measured Warhead performance using the FRAPS frame-rate recording tool and playing over the same 60-second section of the game five times on each processor. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

We tested at at relatively modest graphics settings, 1024×768 resolution with the game’s “Mainstream” quality settings, because we didn’t want our graphics card to be the performance-limiting factor. This is, after all, a CPU test.

When I first set out to put together our CPU test suite, I honestly wondered whether we could find any games that are really CPU-limited these days. Many of them are console ports and simply don’t require much CPU power to run well. This game is an exception, obviously. Like most games, however, Warhead doesn’t look to be heavily multithreaded, since our two dual-core processors perform relatively well here compared to their lower-speed quad-core siblings.

The top two spots are occupied by the Core i7-965 Extreme, with the non-Hyper-Threaded config proving to be a little faster—no surprise given this game’s lack of robust multithreading. Turning off HT and doing away with its partitioning of some on-chip resources does seem to offer a bit of a performance boost in the right situation.

Far Cry 2: Far Cry-ier

After playing around with Far Cry 2, I decided to test it a little bit differently by recording frame rates during the jeep ride sequence at the very beginning of the game. I found that frame rates during this sequence were generally similar to those when running around elsewhere in the game, and after all, playing Far Cry 2 involves quite a bit of driving around. Since this sequence was repeatable, I just captured results from three 90-second sessions.

Again, I didn’t want the graphics card to be our primary performance constraint, so although I tested at fairly high visual quality levels, I used a relatively low 1024×768 display resolution and DirectX 9.

The 965 Extreme again takes the top spots, but the Core i7-920 finishes in mid-pack, behind the Core 2 Quad 9300.

Unreal Tournament 3

As you saw on the preceding page, I did manage to find a couple of CPU-limited games to use in testing. I decided to try to concoct another interesting scenario by setting up a 24-player CTF game on UT3’s epic Facing Worlds map, in which I was the only human player. The rest? Bots controlled by the CPU. I racked up frags like mad while capturing five 60-second gameplay sessions for each processor.

Oh, and the screen resolution was set to 1280×1024 for testing, with UT3’s default quality options and “framerate smoothing” disabled.

We’re looking at playable frame rates with pretty much every processor tested, but we do seem to have sorted out the faster CPUs from the slower ones. Notice that the dual-core processors don’t fare as well here; some degree of multithreading seems to be at work.

All of the Core i7 processors finish strong, even the 920. However, the 940’s victory over the 965 Extreme is a reminder of how much variability is possible when testing in this manner.

Half Life 2: Episode Two

Our next test is a good, old custom-recorded in-game timedemo, precisely repeatable.

Ok, so we have frame rates well into the hundreds, but at least Episode Two‘s ceiling is high enough to show us the differences between the CPUs. Clock for clock, the Core i7 doesn’t look to be much faster than the Core 2 here.

Source engine particle simulation

Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.

This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

Chalk up a win for Hyper-Threading now that we have a nicely multithreaded application, and consider the Core i7’s dominance here. Even the 920 is faster than the Skulltrail dual-QX9775 system with its eight Penryn cores.

WorldBench

WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.

Here’s a nice indication that the Core i7 offers a fairly general increase in performance. The 965 Extreme beats out the Core 2 Extreme QX9770 by 13 points in WorldBench’s overall index, which is a formidable margin. We’ll look at the individual results in the next few pages to see how the Core i7 did it.

Productivity and general use software

MS Office productivity

Firefox web browsing

Multitasking – Firefox and Windows Media Encoder

WinZip file compression

Nero CD authoring

Two of WorldBench’s tests above, MS Office and the Firefox/Windows Media Encoder combo, are noteworthy because they test a user multitasking scenario, during which multiple applications are running concurrently. In both cases, the Core i7 processors are among the fastest.

Meanwhile, the Nero test leans heavily on the disk controller, and you can see the distinct separation between the different chipsets we used.

Image processing

Photoshop

The Core i7 performs well here, but the Core 2 Duo E8600’s strong showing serves as a reminder that only one or two fast cores are necessary to ace this test.

The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

The Core i7 stretches into new performance territory here, with the 965 Extreme once more embarrassing the dual-socket Skulltrail rig.

picCOLOR image analysis

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Many of the individual functions that make up the test are multithreaded.

The Core i7-920 has quietly racked up a string of performances in our image processing tests that place it well ahead of the mid-range quad-core processors, the Q6600 and Q9300, that it supplants.

Media encoding and editing

x264 HD benchmark

This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.

The Core i7 chips perform well enough during pass one, but it’s during pass two (which seems to use more threads) that they really shine.

Windows Media Encoder x64shine Edition video encoding

Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

The Core i7 delivers a bit of a clock-for-clock performance gain over the Core 2 here, even though it’s handicapped by the fact that the app only uses four threads.

Windows Media Encoder video encoding

Roxio VideoWave Movie Creator

Make of these two WorldBench tests what you will. I prefer our other video encoding benchmarks instead.

LAME MT audio encoding

LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.

No real performance gains to report here.

3D modeling and rendering

Cinebench rendering

Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.

Now here is a truly impressive performance from the Core i7. Even the Core i7-920 trounces the QX9770, thanks in part to Hyper-Threading. Let’s look at a few more tests, and we’ll discuss the results at the bottom of the page.

POV-Ray rendering

We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support. Some of the beta 64-bit executables have been quite a bit slower than the 3.6 release, but this should give us a decent look at comparative performance, regardless.

3ds max modeling and rendering

Valve VRAD map compilation

This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into games like Half-Life 2.

In each of the three fully multithreaded rendering tests above—the POV-Ray chess scene, 3ds max rendering, and Valve VRAD—the Core i7 brings major performance gains over the Core 2. Even the Core i7-920 is consistently faster than the Core 2 Extreme QX9770.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

Because of the presence of Hyper-Threading, you have to look at that final graph to make sense of these results. The benchmark keeps eight threads active all of the time on the Core i7, which reduces per-thread performance. Once we get to the end of the road, though, and estimate the total projected points per day, both the Core i7 and Hyper-Threading prove to be winners. Without Hyper-Threading, the Core i7-965 Extreme is only marginally faster than the QX9770, but with it, the contest becomes a rout.

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.
In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

The Core i7-965 Extreme performs in 60 seconds what the Core 2 Extreme QX9770 requires 100 seconds to complete. This sort of thorny, bandwidth-intensive application benefits greatly from the Core i7’s architectural innovations. Here’s another case where even the dual Core 2 Extreme QX9775 processors in the Skulltrail system can’t keep up with the Core i7.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.
The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

I believe we have a new world record in this benchmark, and it comes from a single-socket Core i7 system. The dual Core 2 Extreme QX9775 system—essentially a 2P Xeon in disguise—is beaten by even the Core i7-920.

Power consumption and efficiency

Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile.

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

Wow, the Core i7’s idle power consumption is very reasonable, especially considering it has a third DIMM in the system that the others don’t.

Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.

The Core i7’s peak power use is definitely up from the quad-core Penryns, as one might expect from a larger chip with a design focused on keeping execution units more fully occupied. Peak power draw is only part of the story, though.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

The mix of reasonably low idle power draw and relatively short render times adds up to a moderate amount energy consumed by the Core i7 systems over the duration of our test period.

We can quantify efficiency even better by considering specifically the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

Although the Core i7 systems tend to consume a little more power at peak than the quad-core Penryns, they also tend to finish rendering much sooner. Their more efficient execution means the Core i7 processors require less energy to complete the task of rendering our sample scene. These results also illustrate why Intel claims Hyper-Threading improves power efficiency—because it can.

Overclocking

Ok, so I haven’t yet gotten around to overclocking my Core i7 processors, but I asked Geoff to give it a try, and here’s what he managed to do with his Core i7-920.

Although the 920’s upper multiplier is locked, he was able to increase the base system clock—which CPU-Z labels as “bus speed”—in order to raise the core speed. Using that method, he made it to 3.3 GHz, which isn’t too shabby.

Then again, I think better things are possible, but I need to play with these chips a little more. Come back later, because I’ll update this page once I have something to report. Shouldn’t take too long.

Conclusions

The Core i7-965 Extreme is, by far, the fastest processor we’ve ever tested, and it seems clear the Core i7 architecture brings with it a general performance increase over the 45nm Core 2 processors it succeeds. We’ve seen that increase in everyday desktop applications, including the WorldBench suite and several of the latest games. In part, the Core i7’s performance gains come from higher clock frequencies due to the “Turbo mode” mechanism. When the Core i7-965 Extreme is operating at 3.33 or 3.46 GHz, it’s going to be somewhat faster than a Core 2 at 3.2GHz. That’s why I’ve been I’ve been hesitant to talk about clock-for-clock performance gains for Core i7, as you may have noticed.

Yet in some cases, the Core i7 undeniably delivers clock-for-clock performance increases over Core 2, along with dramatic gains in absolute performance. We saw the biggest improvements in some specific sorts of workloads, including 3D rendering, scientific computing/HPC applications, and nearly any application that could spawn up to eight threads. More than once, a single Core i7-965 Extreme outran our dual-socket “Skulltrail” system by a considerable margin. This new system architecture pushes the performance frontiers forward in places where progress had previously been rather halting.

Such things aren’t exactly the material of everyday futzing around on the PC, but we’re long past the point where Microsoft Office is a prime target for performance optimizations. In fact, for the average guy, the secret hero of our test results was the Core 2 Duo E8600. If your main reason for wanting a fast computer is to surf the web and play games, you’re probably better off getting a fast dual-core like the E8600 than you are picking up a Core i7-920 or any quad-core processor. Game developers keep threatening to really make use of more than two cores, but it just hasn’t happened yet.

Even so, one has to appreciate what Intel has accomplished here. The Core i7 is another solid step beyond its last two product generations, the 45nm and 65nm versions of Core 2. As our power testing showed, the larger Core i7’s power draw at idle is similar to a quad-core Penryn’s. Although its peak power draw is higher, the Core i7 can use less energy to complete a given task, as it did in our Cinebench rendering example. And the new system architecture established by the Core i7 will likely be the basis for Intel systems for the next five years, at least. On all fronts, progress.

One question that remains: Has Intel now built an insurmountable lead over AMD? Almost seems like it. But one never knows. AMD’s 45nm quad cores are coming soon. Perhaps they’ll have a few surprises in store for us.

Comments closed
    • Azmount
    • 11 years ago

    Care to elaborate on why Core i7 isn’t all that when tested by normal people? I would want to know what everyone thinks about the following:

    l[< http://www.amdzone.com/phpbb3/download/file.php?id=86< ]l l[<http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=135855<]l

      • indeego
      • 10 years ago

      haha. The price a year later is 1/3 what it was when this system first came out. Man those early adopters REALLY pay up the noseg{<.<}g

    • indeego
    • 11 years ago

    /[https://techreport.com/articles.x/15818/15<]§ stilll waitingg{....}g :)

      • 5150
      • 11 years ago

      God you don’t let anything go do you!?

    • lycium
    • 11 years ago

    got mine today, w00t 🙂 now just waiting on the triple-pack memory…

    does anyone know if it’s ok to use high speed, 1.9v ddr3 memory (specifically CM3X2048-1600C7DHX) safely with an asus p6t motherboard? it should default to 1.5v and require setting to 1.65 in the bios.

    • Bensam123
    • 11 years ago

    Keeps threatening indeed…

    So what exactly happened to those things AMD and Intel were promising to break up threads into more then one and help multi-thread things on the hardware level. I remember reading about plans for them here around the AMD64 and P4D time period.

      • IntelMole
      • 11 years ago

      Such a thing would be a logistical and technological nightmare. And I would imagine that testing it would be damn near impossible. We’re already at the stage where we can only guarantee 90-95% fault coverage on a chip. Testing a chip that could split instructions into dynamic threads would take an axe to that figure most likely.

      It would be far more efficient to build an SMT chip that is twice as wide instead, and even that’s a bit of a dubious advantage. Then, you get a wider execution engine for single threaded tasks that could not be split up and all instruction level parallelism can be extracted here, plus the capability of executing a second thread on this doubly wide chip i.e. as fast as a dual core (assuming no contention/coherency issues).

      Given that Intel’s new interconnect essentially gives them a bandwidth nirvana, I wouldn’t be incredibly surprised to see them try to stuff a 5-decoder design before long, perhaps even clock throttling unused decoder slots and execution resources to keep power under control. They’ve got to do something with those transistors now that caches are getting into the tens of MB stage and this would be a relatively cheap way to extract a bit more single threaded performance and add more “cores” without duplicating all the functionality of another core.

      • UberGerbil
      • 11 years ago

      Well, Intel’s been doing speculative execution in Itanium for a while, and Sun’s got “scout” threads going in the Rock prototype. But while those can offer some advantages in the best cases — warming up the cache, getting work done — they are terrible from perf/watt perspective: every time you speculate incorrectly you do a bunch of work that has to be thrown away, so you’re burning a bunch of juice with zero to show for it. Meanwhile Intel is getting a lot of the benefits already with smart pre-fetching, the trace cache, etc., without blowing their power budget.

    • Firestarter
    • 11 years ago

    Hrrm I’d like to see mee some database and webserver (Java/ASP?) benchmarks, in VMs and out. Not that I’d actually know how to interpret them or that I’d want to buy one for those purposes, but I guess that is where all this performance could actually earn some hard (soft?) dollars.

    For instance, how would a single socket i7 with tons of RAM match up against a dual socket 8 core machine from AMD?

    I guess I’ll have to wait for the Xeons 🙂

    • DrDillyBar
    • 11 years ago

    Alright Software Developers, GO!

    • derFunkenstein
    • 11 years ago

    I dunno – I read the article and I was thoroughly non-plussed, but I am optimistic for the future. Developers have had THREE YEARS to get their proverbial stuff together for multi-core CPUs, going back to the first X2 and the Pentium D. They haven’t yet. Hopefully as Intel’s compiler technology (or Microsoft’s) gets better at spawning multiple, useful threads, that’ll change. Right now, it’s just more CPU than I know what to do with. Good thing it’s out of my price range. 😆

      • danny e.
      • 11 years ago

      i’m not sure why everyone always thinks multi-threaded apps should be so easy. there are cases where it is almost impossible.

        • Prototyped
        • 11 years ago

        Because most people aren’t programmers, so they buy into the semiconductor industry’s hype.

        • UberGerbil
        • 11 years ago

        And cases where there’s nothing for additional threads to do. Your word processor is already running a thread in the background to do the spellcheck and grammer check squiggles, as well as repagination. What exactly would you expect additional threads — no matter how easily generated by the developer tools — to do when your app is just sitting there waiting for you to type?

      • boing
      • 11 years ago

      It’s basically the same hype surrounding 64 bit. 5 years have passed and only a handful of games have been released that actually use x64. And even less applications.

        • UberGerbil
        • 11 years ago

        Actually, there are a lot of server applications that use — and benefit from — x64. Even 32bit applications benefit if they are Large Address Aware (this includes Photoshop and Premiere as well as a variety of workstation apps). And of course if you have 4GB of RAM you want x64 to make full use of it, even if you’re just running a bunch of 32bit apps.

    • ludi
    • 11 years ago

    Thread summary:

    (a) i7 spits on the floor and then wipes it shiny with the competition.

    (b) The competition is still capable of wiping the floor shiny on its own time, provided an i7 is unavailable to strawboss.

    (c) …and that’s a very good thing, since the i7 will initially be priced as an accessory for an HM Embody chair.

    (d) In order to maintain the balance of the forum universe, it is essential to madly debate all of the above anyway.

    Carry on!

      • MadManOriginal
      • 11 years ago

      You left something off of d) ….because what else would we do with all the processing power available in our computers!?

    • henfactor
    • 11 years ago

    Makes my 4200+ feel a bit obsolete, and I’m still stuck with the AMD upgrade route 🙁

    • AKing
    • 11 years ago

    Good review. How much difference does the extra 2gb of RAM do though?

    • Prototyped
    • 11 years ago

    This review is the reason I read TR. Best I’ve seen thus far. That said:

    g[

    • TravelMug
    • 11 years ago

    What’s the story with the L3 clockspeed? The article says 940 and 920 have it running at 2.13Ghz and Anandtech in their article says it’s running at 2.66Ghz? Any way to get some confirmation from Intel?

    • kilkennycat
    • 11 years ago

    Where are the AMD 45nm benchmarks for comparison? Isn’t AMD due to ship 45nm processors this month?? Core i7/Nehalem benchmarks in one form or another have been available for the past 6 months. Has AMD something to hide? Reminds me of those movies that the distributors won’t allow the critics to see before public release… for very good reasons. There is a word that describes a traditional Thanksgiving dish that adequately fits here…

      • HurgyMcGurgyGurg
      • 11 years ago

      Well AMD was also pretty silent about the 4000 series before it came out. So maybe they do have something to hide, also Scott ends with:

      “AMD’s 45nm quad cores are coming soon. Perhaps they’ll have a few surprises in store for us.”

      So, I don’t know, the good reason as to why they did not compare to them is AMD hasn’t shipped the test samples yet, I think there was a news article a few days ago saying they were almost ready.

        • kilkennycat
        • 11 years ago

        l[

        • Silus
        • 11 years ago

        If you think a simple die shrink will save Phenom, you need a reality check. Current Pheonoms barely compete with the old Q6600 and have a lot to catch up with current 45 nm Core 2 Duos. 45 nm Phenoms won’t touch Core I7’s. At most they’ll match Penrym’s and that is still unsure at this point.

    • Palek
    • 11 years ago

    Typo on page 4 in the system summary table:

    Core *[

    • Scrotos
    • 11 years ago

    Scott, I have to disagree with:

    “This sort of thorny, bandwidth-intensive application benefits greatly from the Core i7’s architectural innovations.”

    Architectual innovations? All they have done is really come up to feature parity with the K8, as far as a point-to-point interconnect, NUMA, and an integrated memory controller? Yeah, there’s some other minor tweaks and yeah, Intel has done it BETTER for the current time, but I don’t really see that as an architectural innovation, per se.

    I’d be more inclined to give kudos to AMD for that, or perhaps to DEC for their EV7 (integrated memory controllers, 12 GBps! and NUMA) and cancelled EV8 (4-thread SMT per core) for being ahead of their time and popularizing the concept among the techies and nerds who lusted after these CPUs back in the late 90’s and early 2000’s.

      • spuppy
      • 11 years ago

      I’d say there’s a fair bit more than “feature parity” going on… Otherwise the slowest Core i7 wouldn’t be twice as fast as the fastest Phenom X4…

        • Scrotos
        • 11 years ago

        It’s possible to do things better and faster without innovating. For instance, Intel with their continual process shrinks brought better and better performance, but was it innovative? AMD switching from a shared FSB and separate memory controller to P2P and integrated memory controller was innovative for that market.

        Until now, Intel has kept the same basic architecture they had going back to the PPro. Shared FSB, off-chip memory controller. They’ve tacked on special instructions like SSE2/SSE3, EMT64 (which was spearheaded by AMD, dare I say an INNOVATION by AMD that Intel was forced to match due to IA64 sucking hard) but kept the same basic architecture.

        Sure, they loaded more cache on, dallied a little with hyperthreading, and switched their plumbing around a bit, but nothing like the large changes that the K8 brought for this CPU segment.

        I will not attempt to say that i7 does not do what K8 does, and does it better. It does. It really does damned well. However, I don’t find it innovative. Intel basically took the main performance-enhancing features from their competitor and adopted them. That’s not innovation by any definition of the word.

          • spuppy
          • 11 years ago

          If you don’t think some of the things are innovative, I am guessing that you skipped the first 2 pages of any Core i7 review you might have “read”

            • Scrotos
            • 11 years ago

            Naw, check my replies to both Ludi and MadMan. That should clarify it a bit more. The fact that I quoted from something near the middle-end of the review should have tipped you off that I read past the first two pages, though. For shame!

            BTW: “I’d say there’s a fair bit more than ‘feature parity’ going on… Otherwise the slowest Core i7 wouldn’t be twice as fast as the fastest Phenom X4…”

            So be fair, I didn’t really address that directly. Ok, here’s the short version. I think the Core 2 architecture has been bandwidth starved for YEARS and Intel finally fixed that issue with the integration of the memory controller. So the beast has been unleashed, the potential is finally let out. I think the C2 has always been a stronger CPU than the Phenom/K8/K10, but it has been hampered by the memory system so it has been closer in performance than it needed to be. Now that that issue is sorted out, the strength of the CPU can really shine.

            Now maybe it really is all about the bandwidth and maybe a 45nm Phenom with 3 channels of DDR3 will post comparable benchmarks, but I can’t even begin to guess if that’s the case. I hope we’ll be able to find out in the next few months! Regardless, I’m glad Intel finally got rid of that gimpy FSB and went P2P; the integrated memory controller is just the performance icing on the… um… silicon cake? Yeah, I dunno either. I shouldn’t have run with that one.

            • MadManOriginal
            • 11 years ago

            Thanks for the replies, it was long but I read the whole thing. I think you might agree that large seemingly out of nowhere innovations are rare and most advancement is based upon small steps – it’s built upon the shoulder of preceding giants.

            Bandwidth starving of C2Ds doesn’t seem to be a huge issue based upon testing with higher than 1:1 memory ratios where throwing bandwidth at the C2D does not give much of an increase. That really speaks to the large L2 caches and Intel’s excellent prefetch algorithms, the much lower L2 of A64s, Phenoms and now i7s points toward that. I suspect the bandwidth is there for multi-socket systems. On the other hand the lack of increase in C2Ds could be due to inefficiencies with off memory ratios or some other wierd and hard to test bottleneck in the memory system. When it comes to dual-die quad cores I’d also say that the FSB system and the need to use it for die-die communication, while it obviously worked, probably held back those CPUs more than it did single die CPUs. That’s certainly one likely bottleneck i7 has eliminated.

          • MadManOriginal
          • 11 years ago

          Your view of innovative isn’t invalid but there’s the matter of perspective. You’re sort of taking the view from the stratosphere by saying that the basic architecture is the same. I could take the view from Mars that they are still using transistor ICs so nothing has really changes since the first IC CPU and wouldn’t be wrong within the confines of that statement either but it would certainly not reflect the realities of progress.

            • Scrotos
            • 11 years ago

            My reply to Ludi goes into more clarification, but yeah, I don’t mean to take it to extremes. The thing is that the #1 biggest contributor to i7’s great performance is the integrated memory controller, which has been shipping in a competitor’s product for a few years now. That’s not innovation. The #2 biggest contributor to i7’s performance boost is hyperthreading, which Intel already implemented years ago, so I view this as more of a refinement of an existing technique rather than some innovation.

            The power management, the tweaked prefetch and branch prediction, the automatic overclocking, even the P2P replacement for the FSB are all neat and some innovative, but they don’t contribute nearly as much to the i7’s great performance. Now if this were a 2P or 4P system, yeah, I’d give more kudos to the P2P interface but again wouldn’t call it particularly innovative in light of how long hypertransport and the K8’s NUMA system’s been in consumer hands.

            So I’m not trying for a “it’s silicon, people used silicon before so this isn’t new!” kinda thing, more like “the major features that have the most impact on the performance increase on the i7 compared to the C2 have existed for some time and on a competitor’s product, ergo I disagree with Scott’s label that the i7’s innovations are the reason for the performance increase because I dispute that they can be labeled ‘innovations’ to begin with.”

            Man, that was a mouthful!

          • ludi
          • 11 years ago

          Somehow, I don’t think you’re really appreciating the innovation required to bring all of these things under the roof of one die — particularly the new transparent per-core power-management and auto-overclocking that allows the CPU to make the most of its TDP when running lightly-threaded loads — and have it all /[

            • Scrotos
            • 11 years ago

            Well, the i7 is a beast of a performer. Now I am going to refer to that quote of Scott’s where he talks about how bandwidth-intensive applications really like the i7’s innovations. Looking at the benchmarks and what I know of CPU tech in general, I’d say that bandwidth-intensive applications are probably loving the i7 mostly due to the integrated memory controller. The synthetic benchmarks show how much of a difference in available bandwidth there is, no?

            The per-core power management doesn’t necessarily lend itself to this huge spike in performance, right? And the automatic overclocking, while neat, doesn’t account for the huge gap between the i7 and a 3.2 GHz C2. 3.3 GHz vs. 3.2 GHz is 3.125% more speed in clock alone, but the performance delta is much greater than a few measley percent.

            Hyperthreading? How can that be an innovation? They already had that with their P4 line. Sure, it’s a refinement as they are reintroducing it here, but not an innovation.

            I know I’m nitpicking, and I don’t really mean to come off as a whiner, but I really view innovation as something fresh, something new, something unexpected. i7 took a bunch of concepts that were already implemented and implemented them BETTER. I think it’s danged AWESOME. But I don’t find it particularly innovative. Transmeta’s slow, crappy-performing CPU was innovative with their code morphing stuff. PowerVR’s consumer GPUs with their tile-based rendering scheme were innovative, even though they had a ton of visual problems due to driver issues. That Lucid multi-GPU tech? It doesn’t use SLI or AFR, it’s pretty innovative even if it never sees light of day.

            I hope this clarifies where I’m coming from with this. I know this response is longer than most would want to read, but just know that I’m not knocking i7 or Intel’s work, I just find issue with Scott’s label of getting (basically) feature parity with the K8’s integrated memory controller as innovative. C2 really REALLY likes that bandwidth and Intel has been starving it for years now so I’m glad that Intel’s got a way to keep the CPU fed. However, it’s not a new concept and someone else had been doing it in a shipping product for a while. That’s all.

            • ludi
            • 11 years ago

            That’s a legitimate perspective, but if you’re going that route, the K8 wasn’t especially innovative either, since the whole IMC business (as has been pointed out already) was handed down from DEC, as was HyperTransport, and x86-64 was essentially the same philosophy as the previous 16 to 32 bit transition applied to another product generation.

            In fact, when K8 was introduced, I seem to remember several long discussions on either RealWorldTech or (the now-defunct) Ace’s Hardware arguing that K8’s gains had relatively little to do with the core, because the interesting-but-less-than-innovative IMC was pulling the grunt work! How the wheel turns.

            I figure if a company can pull a whole lot of technology together into one very good product, and not screw it up as there are multiple variables working against each other, it counts for something quite a bit more than “Well, we went to a smaller process on schedule, and took advantage of that to increase the cache again, and we bumped the clock speed another 200-400MHz, and we liked the results, so here you go. Large check, please” — which is fine, but is more progressive than innovative.

            • Scrotos
            • 11 years ago

            <3 Ace’s Hardware

            I know AMD got the EV6 bus from DEC for the Athlon, but I thought that Hypertransport was their own thing? A quick, lazy google doesn’t seem to indicate either way aside from there being a HT consortium to handle licensing and the standard and whatever, I guess like PCI-SIG.

            • NovaX
            • 11 years ago

            Search for “Lightning Data Transport”, which was DEC’s name for the bus before licensing it to AMD.

            • Scrotos
            • 11 years ago

            Aha, thanks!

      • Nitrodist
      • 11 years ago

      in⋅no⋅vate
         /ˈɪnəˌveɪt/ Show Spelled Pronunciation [in-uh-veyt] Show IPA Pronunciation
      verb, -vat⋅ed, -vat⋅ing.
      –verb (used without object)
      1. to introduce something new; make changes in anything established.
      –verb (used with object)
      2. to introduce (something new) for or as if for the first time: to innovate a computer operating system.
      3. Archaic. to alter.

      Seems to fit.

        • Scrotos
        • 11 years ago

        Is it still innovation if someone else does it first? It seems you’re content to restrain innovation to just Intel’s product line but I’m trying to apply it as a concept to consumer CPUs in general, or at least mass-produced CPUs.

        I’ll give you that. Compared to Intel’s product lineup, i7 innovated by integrating the memory controller and had some of the typical tweaks you get with a new CPU (power management, prefetch, etc.) However, they didn’t innovate with hyperthreading since they had already introduced and released that.

        Compared to other CPU products already on the market, I don’t feel Intel innovated all that much. The integrated memory controller was already being used in shipping products and hyperthreading was already being used by Intel and popularized a bit before by DEC.

        Intel did a damn good job, I just don’t find it all that innovative. And that explanation should jive with your cut-and-paste from dictionary.com or wherever you snagged that.

          • Meadows
          • 11 years ago

          He pointed out that innovating can mean making a change in anything established. They made more than one with Core i7 compared to Core 2 which is, let’s face it, very well established.

    • blubje
    • 11 years ago

    Can’t find this in the review: Does Core i7 support ECC memory?

    Thank you,

    (Apologize for using a bugmenot account, I dont have access to my email at the moment).

      • Krogoth
      • 11 years ago

      I cannot say for certain that Bloomfields have ECC support. It is a safe-bet that Xeon-flavors will.

    • anand
    • 11 years ago

    Any ideas if LGA775 heatsinks will work with these new chips? I see the familiar set of 4 holes at each corner so at least it looks like it will use the same (god forsaken) mounting system.

    I want to buy a new heat sink or my E4500. If I knew I’d be able to carry it over to a future I7 build would help me justify getting one of the higher end ones.

      • Kent_dieGo
      • 11 years ago

      No way. The hole pattern is much larger. You will need special socket 1366 rated HS

      • Forge
      • 11 years ago

      The holes are the same, and the angles are the same, but the holes have shifted from roughly 60mm apart to roughly 80mm apart. Some sinks might get adapter brackets, but the contact patch on most HSFs is too small for i7. The IHS on the new CPU is quite a bit bigger, and needs to be covered with HSF.

    • IntelMole
    • 11 years ago

    Buying this chip to put in a desktop PC would be something of a commitment – you’re going to have to spend quite a while figuring out what to DO with all those cores 🙂

    My dual core penryn laptop will do 2-pass H.264 transcodes slightly faster than “real time”. So that one of these babies will encode a 2 hour film in about 45 minutes. When you can do that in about as long as it takes you make a good meal, you need something else to do with your hardware.

      • Forge
      • 11 years ago

      Odd. I’m assuming your encodes are SD and low/mid quality encoder settings?

      I’m doing 1080p->720p encodes on a regular basis (my HTPCs like 720p MKV files better than raw BDroms), and my initial pass is about 2 hours for an average movie (roughly realtime). The second pass which occupies all four cores to 100% throughout is roughly half that speed.

      While it’s an awful lot of CPU time, I can use those same settings to take an average SD movie down to under 700MB with ease. I can even keep 5.1 audio on most flicks while making a 700MB encode with good quality or a 1400MB movie with near-transparency video.

        • IntelMole
        • 11 years ago

        DVD MPEG-2->h.264 transcodes to ~1GB/hour bitrate.

        How long would your six hour job take on this CPU (given reasonable assumptions about threading and per-clock improvements)? I’m going to go for 5 hours.

        Even that job is done in an evening on this CPU. Before long, even us enthusiasts will need a new killer app for all those cores.

        Tens of folding clients maybe 🙂

      • indeego
      • 11 years ago

      Why do people encode films that are already encoded for themg{

        • MadManOriginal
        • 11 years ago

        What else is there to do that really pushes and thus justifies having fancy hardware?

        • IntelMole
        • 11 years ago

        Because hard drive space isn’t infinite, so given the choice I’d rather transcode my films to a format that takes up half the space for the same quality.

        Initially it began because I needed external storage for my rather feeble old laptop, and then needed a way to transport my DVDs and other media back home gradually from uni but still be able to use them. Now though, I rather like being able to pick a film (and just the film, no ads, no copyright garbage) without getting up.

        Not so long into the future, I can imagine leaving optical media behind completely (in the UK, we’re not so terribly far away from being capable of streaming full HD media over our Internet connection). Already done that for my music collection – I haven’t bought a CD in about a year. When that happens, I don’t want to still have to cart my DVDs about. Given that they are encoding in real time at the moment… you do the math on putting 80 films plus a few boxsets on there 🙂

        Then I can have all of my films on one hard drive (backed up of course). Not so long ago a friend of mine invited me round to his place, and suggested I bring some films we could watch. Instead of carting DVDs I would probably not use, or a giant case full of CDs for the train journey, I brought an external hard drive (approximately the size of 3 DVD cases) with 80 films on it instead. All of my films are quite happily stored elsewhere.

        That’s my reason anyways. I’m hoping that by the time I’ve filled this drive up (400B, they’re at, what, 320GB mobile these days?), I can find a 2 1/2″ hard drive to replace it. Then I won’t need an external power source and it becomes even more compact.

    • Crayon Shin Chan
    • 11 years ago

    Shanghai making up the difference? Not likely. We’ll have to wait for Bulldozer to see any potential here.

    • Krogoth
    • 11 years ago

    Geez, it is too bad that you didn’t test Supreme Commander. 😉

    It is perhaps the most CPU-bounded game to date. I would hope that Nehalem would have be a decent improvement over Core 2.

    I am going to bet that Lynnfield is going to be almost as fast at most stuff, while the platform is a lot more affordable.

    • d0g_p00p
    • 11 years ago

    Any chance of getting a download link for the Damage Labs panorama pics?

      • Flying Fox
      • 11 years ago

      I second that. We need that panoramic Damage Labs pic!

    • spuppy
    • 11 years ago

    Scott you’ll probably come across this later, but the Extreme doesn’t actually have a fully unlocked multiplier… Only the TURBO MODE multiplier can go above 24x. So while OCing is as easy as raising a multiplier (like with Black Edition CPUs and previous EE’s), there is still all that TDP/voltage monitoring going on, and overclocking will only go as far as the CPU allows it based on all that.

      • Meadows
      • 11 years ago

      To me that sounds like intel don’t know what they’re doing and they’re really afraid of having their chips blow up.

        • UberGerbil
        • 11 years ago

        Or Intel knows exactly what they’re doing and they don’t want to allow abusive practices to damage chips and then get the blame for it. More likely, since this is fundamentally intended to be a workstation/small server chip, they’re leaving all the airbags in place. Later on they’ll offer the real “extreme” version that they can use to extract the maximum amount of cash from the OC nutcases.

    • desertfox84
    • 11 years ago

    Does this mean that the current Core 2 Quad processors will drop below $266? I’d love to grab a Q9550 for my P35 motherboard. I don’t think I’ll need to jump on the Core i7 bandwagon until 2010.

    • krazyredboy
    • 11 years ago

    Yeah…well how would it stack up against a Northwood P4, 2.6HT processor…clocked at 2.6ghz? Huh.

    That’s what I thought…

    …that I’m living in the past and need an upgrade badly, that’s what.

      • kitsunegari
      • 11 years ago

      lol…that was awesome….

      …but intentionally or not, I think you’re hitting upon a key point of order regarding the i7 and this homogeneous power drive amongst IT consumers. While we “power users” are gifted with the capacity to appreciate the performance gains: the vast majority (I said Majority now 😉 of Even Us Tech Report readers potentially don’t REQUIRE much more than the system outlined above.

      Hell my primary rig is still an intel 875p chipset going on 4 1/2 years that I’ve been updating; and one which continues to meet virtually all of my computing needs. While I’m as enthusiastic and anxious to see the latest byproducts of Moore’s law (i.e. technological progression) as any of you, I cannot help but think of the american automotive industry when reading about this irrepressible drive/desire for excessive amounts of power in -what will ultimately become- a consumer level desktop used to surf the web, use office, and play WOW.

      I’m ecstatic to see the efficiency gains yielded by the i7; and intel’s continued R&D will lead to unseen innovation that will trickle down to every market segment but my question is whether or not the future really lies in a “top to bottom approach” of R&D as symbolized by i7 as opposed to the “bottom up” movement as characterized by something like ATOM.

      I guess what I’m trying to say is that

      electricity = oil

      =)

      but at the same time…I can’t wait to build one =b

    • Shining Arcanine
    • 11 years ago

    Intel’s processors have been the highest performing general purpose processors for some time now, so seeing them beating their predecessors is not very interesting.

    However, seeing how competitive Intel’s newest processors are in market segments in which they are not expected to compete would be very interesting. One such example would be software rendering.

    §[< http://www.beyond3d.com/content/news/618<]§ Seeing a comparison between an GeForce4 Ti 4800 and the Core i7-965 Extreme running SwiftShader would be interesting.

    • PRIME1
    • 11 years ago

    The folding benchmarks were a nice addition. Although I just skimmed the article I need to read it in full along with the Motherboard article.

      • Flying Fox
      • 11 years ago

      They have been doing that for quite a while already.

    • Bombadil
    • 11 years ago

    Nothing to interest me with these. Sounds like the new Apple Mac Pro platform. Lots of performance for stuff most people don’t do on the desktop. Is Intel abandonding mainstream desktop chips?

    • Nitrodist
    • 11 years ago

    Core i7 review… say hello to renewed AMD vs Intel bickering with the introduction of the Core 2 faction.

    AMD fanboys vs Intel fanboys vs Intel Core 2 Fanboys!

      • srg86
      • 11 years ago

      I do think that the “Intel Core 2 fanboys” as you call them do have a valid point in that most people on here play games, which are happy with the Core 2 for now.

    • forthefirsttime
    • 11 years ago

    excellent review, Scott.

    Is the stars euler benchmark ‘higher is better’? I didn’t see that on that graph..

    Also, I just want to echo the requests for seeing how this new platform holds up in beefier graphics setups in typical gaming resolutions.

    • S_D
    • 11 years ago

    Great review.

    I’d really love to see how this CPU benches with Microsoft’s Flight Simulator X though, as I know a number of other sim’ers are too. FSX really is still CPU limited, so any chance of running some tests at some point (with FSX SP2 or Acceleration)

    Thanks!

    • tfp
    • 11 years ago

    Is there away to put the Clock speed in the graphics? It would be nice to know because I don’t remember clock speed vs model number a few pages after it is listed.

    • srg86
    • 11 years ago

    Is it me, or did whoever designed the placement of the components on the bottom of the packaging try to draw a picture of the Core i7 die shot? Maybe those components need to be placed in those orientation

    • TurtlePerson2
    • 11 years ago

    It would be nice if there was at least one benchmark that showed the difference between using a cheap processor and one of these in a game that wasn’t CPU bound, running at a high resolution. I’m kind of curious to know if there would be any difference in FPS or if it would just be very small.

      • MadManOriginal
      • 11 years ago

      Check the i7 motherboard review. Whlie there aren’t any detailed numbers they say that at higher resolutions and settings games are GPU bound as common sense would dictate.

        • Usacomp2k3
        • 11 years ago

        Well they were comparing 2 motherboards with the same processor, not different processors.

    • axeman
    • 11 years ago

    Looks like the re-introduction of HT is still a hit-and-miss thing. I’ve never really been all that impressed with any SMT implementation to be honest.
    That, combined with the high cost of this new platform for now doesn’t make it all that interesting. Clock for clock it doesn’t look like intel has done anything revolutionary; the most interesting thing is the turbo mode stuff and the accomplishment of quad-core with clockspeeds above 3.46Ghz, but Intel’s manufacturing process has almost always been top-notch even with the designs weren’t. For once I’ll agree with Meadows, this isn’t impressive enough for a “new” generation product.

      • shank15217
      • 11 years ago

      You cant make something out of nothing. What were you expecting?

      • srg86
      • 11 years ago

      hmmm Well appart from games, Core i7 seems very impressive to me, but for the expensive platform, though that’s hardly surprising as it’s brand new. I would agree that the performance jump is more evolutionary than revolutionary most of the time on the desktop.

      • UberGerbil
      • 11 years ago

      If you’re a gamer, there’s no reason to be impressed (especially if you’re a gamer with any kind of sane budget, when you factor in total platform costs). If you’re a workstation user, this is a very impressive CPU.

    • Jigar
    • 11 years ago

    Impressive, but not enough… I will gladly use my Q6600 @ 3.7 till some thing really shinning comes.

      • Fighterpilot
      • 11 years ago

      Apparently the Core i7 really shines with a multi GPU Crossfire or SLI setup.
      l[http://www.guru3d.com/article/core-i7-multigpu-sli-crossfire-game-performance-review/<]§

        • wingless
        • 11 years ago

        +1! I just edited my comment to include this link too. It is a monster with a couple of 4870X2s (and that makes for one expensive system…).

        • MadManOriginal
        • 11 years ago

        That article doesn’t have a very good test setup for comparison. They test the quad core i7 965 versus a dual core e8400. Maybe the games just scale with more than 2 cores with that much graphics power? Notice how not all the games scale well, or at all, and the difference gets smaller at higher resolutions for most all the games. They really should have compared it to a quad core C2D.

    • wingless
    • 11 years ago

    l[http://www.guru3d.com/article/core-i7-multigpu-sli-crossfire-game-performance-review/1<]§ The Core i7 is built to drive multi-gpu configs! If you have a 4870X2 or more then you'll see decent gains with this CPU. I'm really impressed now.

      • srg86
      • 11 years ago

      Wow, well there definitely going to be some gamers that love the i7, just give it multiple GPUs.

      • zimpdagreene
      • 11 years ago

      WOW that multi-gpu is something! But on the CPU review is awesome here also. I guess like someone said what will the new re badged Vista do with that or any quad core.

    • donkeycrock
    • 11 years ago

    Did i miss the part that said the avalibility

      • UberGerbil
      • 11 years ago

      Yes, you did. It was in the first page introduction, and noted Intel’s vagueness.

        • indeego
        • 11 years ago

        Hey, kinda like the paper launch of SSD’sg{

    • bogbox
    • 11 years ago

    Nehalem is more a server part than a desktop.(like opterons)
    Why test games at 1024 x800 or 1280×1024 ? Do you play games at that resolution? No. I play games at 1024 ? No.Who plays games at 1024?

    I understand that GPU is pounding the CPU at high rez but still is no problem for, me, if you use the same GPU. Maybe at high rez all the CPUs have the same result. So what? Or Nehalem is like Phenom weak .

    The turbo thing was not working or what?

    • flip-mode
    • 11 years ago

    A wonderful CPU for processing images, video, and rendering. If that’s your bag then the i7’s are worth every penny.

    Somewhat less exciting in terms of gaming – prolly stick with an overclocked Q-C2.

    File compression shows no gain.

    It seems like a great processor. I feel that the quad core 2 is still quite compelling due to price and overclocability. The Phenom X4 is woefully inadequate.

    In my mind, the biggest knock against the i7 is that, for home users, processors are already fast enough, and it’s hard to justify spending the extra cash. For content creators, the value of the i7 is undeniable. For typical business desktops, a slow Athlon X2 or Core 2 is more than sufficient.

      • ssidbroadcast
      • 11 years ago

      Pretty much what this guy said. Great review, Scott.

    • Meadows
    • 11 years ago

    Not impressive enough, but good for a next generation.

    While I did piss in my pants after seeing the memory results, the processors were beaten by the previous generation in a few tests which was weird, not to mention some other tests simply hung over the clock speed deficiency of AMD’s anemic Phenoms which should receive their competitive update sooner or later.

    The only thing I gathered from this review is that it can’t be too hard for AMD to compete now if they get their next thing right.

      • DaveJB
      • 11 years ago

      “can’t be too hard for AMD to compete?” I’m not doubting that the 45nm Phenoms are gonna be better than the 65nm versions, but the Ci7 XE was /[

        • Meadows
        • 11 years ago

        It also had over five times the price and a clock speed advantage of at least 600 MHz.

          • DaveJB
          • 11 years ago

          That’s only a 23% clockspeed disadvantage; assuming AMD manage to equal Ci7’s clockspeed and that Intel can’t scale it up until they move to 32nm, that still leaves AMD needing to come up with 20-50% more performance-per-clock depending on the situation. You can see pretty much how much AMD have to make up by comparing the 9950 with the 920.

          As for the price-performance advantage, as I said in my other post I agree that could work out well for AMD… but at the same time, Intel could knock out that advantage by taking the axe to Penryn prices. Really, we need to know if AMD’s 45nm chips are gonna be any good.

    • no51
    • 11 years ago

    I thought these things were 6-core?

      • Meadows
      • 11 years ago

      Somebody haven’t read any news since half a year.

      • UberGerbil
      • 11 years ago

      Dunnington. Core 2-based.

    • Fighterpilot
    • 11 years ago

    Good God….that thing just demolished all the opposition…including Intel’s finest.
    The Core i7 920 @ $284 performs on a par with the hugely expensive QX9770 that retails at over $1400 right now at New Egg.
    Great power envelope and killer performance,congrats to the boys at Intel.

    • deruberhanyok
    • 11 years ago

    Fantastic article, thanks TR. I figured Nehalem would be a jump in performance but I honestly wasn’t expected it to be that big!

    l[

    • dragmor
    • 11 years ago

    Give me a low power optimised dual core version or a SOC version.

    Of course I think the mainstream versions will be a lot lower performance. Less memory channels, slower memory, slower NB (sorry uncore), less cache. The Core2 chips will be the better for a long time.

      • bogbox
      • 11 years ago

      All the Core 2 will be extinct by the time Nehalem mainstrem arrives.

    • Krogoth
    • 11 years ago

    Excellent review as always.

    Nehalem is meant for multi-threaded loads, otherwise a Penyrn-based rig will suffice for majority of users.

    I expect the Xeon flavors of Nehalem to rock the MP world.

    • MadManOriginal
    • 11 years ago

    Impressive CPUs, good review and good job of keeping the conclusion real. Most home users don’t do cinebench and that stuff, video encoding is the closest likely home user application and those older CPUs still do OK. I think I’ll be looking forward to 32nm i7.

    Scott, there’s something I’ve been thinking about for a while but never seen in a good review, and that is CPU speed’s affect for games at resolutions and graphics settings that are more realistic examples of real-world use. Maybe some time you could do an article like that for just a few games or include it as part of the followup to this one.

    • HurgyMcGurgyGurg
    • 11 years ago

    So so many benchmarks, its almost to hard to process all of them to find the overall conclusion :/

    I was a little overwhelmed until I got to the conclusion and was able to make sense of it all.

    I’m sticking with my e8400 at 3.5 GHz though at least until Core i7 hits the value arena. Plus, I could always get a dirt cheap Core 2 Quad a year out if I have to extend my current build to about two years. It’s exciting to see Core i7 is a good improvement, that will probably only get better as developer support comes in.

    So whats next? Will they cram 8 physical cores onto these and make a new socket or do we have to wait till a die shrink to see that?

    On the AMD side of things, at least they have 4000 series graphics cards to foot the bill for the intense R&D needed to catch up, if its possible at all.

    • srg86
    • 11 years ago

    Wow, as a non gamer, I’m very excited by this. I still wish TR would have some code compiling benchmarks, especially with GCC.

    • lycium
    • 11 years ago

    i’m just about to buy a p6t+i7+6gb package, except…

    /[

      • srg86
      • 11 years ago

      He means that the lower end derivatives with have different sockets, rather than these ones changing I think.

      • Forge
      • 11 years ago

      Intel changes their socket more often than most TR readers change socks. Buying for a long future is a fool’s errand, and srg86 has the right of it, AFAICT. Intel is backing LGA1366 for high end and LGA1160 for mainstream. Intel might even have another socket beyond that coming for the Celeron crowd.

        • MadManOriginal
        • 11 years ago

        LGA775 disagrees with you. If you want to talk VRM standards which may be what you meant rather than socket then you might have more of a point however even some old (at this point) motherboards from around the time of C2D launch have updated BIOS to support the newest CPU for that socket.

        • UberGerbil
        • 11 years ago

        If you really believe that, I hope not to be in the room when any TR readers take off their shoes.

        • packfan_dave
        • 11 years ago

        How long were we on socket 775, again? (granted, Intel fussed with voltages a lot) Or socket 478?

        • ludi
        • 11 years ago

        I’m with you. The same physical socket is a small panacaea if the next wave of products will not be electrically compatible, thereby denying an upgrade path on an existing board.

          • MadManOriginal
          • 11 years ago

          Yea the new VRM standards can be annoying but some mobo makers do keep doing updates. Case in point, which I checked for my previous reply to that post, is an Asus P965 P5B Deluxe had an update in August 2008 to support 45nm CPUs. So that mobo, from early on in the C2D era, can support all LGA775 CPUs. I guess the only mobos that don’t have a chance are pre-975 ones but at this point they’d be quite old anyway.

    • ish718
    • 11 years ago

    My god…
    I hope AMD’s 45nm is atleast better than core2s

    • moloch
    • 11 years ago

    Wait for the 45NM phenoms/opterons to even it up a bit, though obviously they will still be a similar situation as today compared to the C2D and C2Q.
    Atleast they can scale up the clockspeed now and lucky for them the Radeon 4XXX series covers a good portion of the market (100 and up pretty much) and has the best cards, so their GPU business will essentialy keep them afloat till they have a competitive cpu. Must be interesting for the former ati employees…

      • Silus
      • 11 years ago

      The graphics cards account for a small business part of AMD. They are important obviously, but they cannot keep AMD “afloat” just by themselves.

    • AMD Damo
    • 11 years ago

    AMD will probably just give up :'(

    Now, what i want to know is when do these things arrive in Australia.

      • wingless
      • 11 years ago

      LOL WTF?! AMD won’t give up. Theres nothing to give up about. You think a few frame per second in games are worth giving up? Intel made the Core i7 to compete against AMD’s superiority in the multi CPU server market. With what we know about how AMD’s L3 cache scaling works on the Phenom, the 45nm enhancements may very well give 20% performance increases where it counts. We’ll see real soon…

      Change your screen name NOW.

      • ludi
      • 11 years ago

      Yeah, I think I hear a car running in the garage. Somebody should check on that, it might be AMD trying to give up again.

      • Lord.Blue
      • 11 years ago

      AMD won’t be giving up any time soon. Wait until we see Shanghai, Bulldozer, and Fusion to make your decisions. /[

    • esterhasz
    • 11 years ago

    Wow, so they’re out. Great review! Itwould have been interesting though to have the Atom in the benchmarks. We’re currently in the strange situation that the performance delta between the top end and the low end in the x86 landscape has expanded to roughly x20 (guessing) which has never been the case before, not even close. Even as a relative power user, I find myself asking “what would I use an i7 for?” And if Windows 7 will really use less resources than Vista I really can’t see the user space (server’s a different matter) drive the high end for very much longer…

      • Forge
      • 11 years ago

      I haven’t heard anyone with a clue saying that Win7 will use less resources than Vista. What I have heard is that it’s using ‘relatively less’. That means that it’s not heavier than Vista the way that Vista is heavier than XP.

      I firmly believe that Win7 is Vista Reloaded. Any non-superficial changes are likely to be minor and subtle. What you will get is a massively respun UI, since that’s what Joe Average sees, and it’s often ALL that Joe sees.

    • ub3r
    • 11 years ago

    poor amd 🙁

      • Ruiner
      • 11 years ago

      Poor Phenom. The Brisbane passed it in more than a couple of benches.

        • poulpy
        • 11 years ago

        The same could apply to Intel’s lineup when raw clockspeed is more important than the number of cores in benchmarks.
        K10 does have better IPC than K8 so it sometimes can make up for the missing Mhz sometimes not.

Pin It on Pinterest

Share This