A closer look at the Core i7-940

When we first reviewed the Core i7 processor, we had two chips on hand: the high-end Core i7-965 Extreme and the more affordable Core i7-920. Sandwiched in between them in Intel’s product lineup is the Core i7-940. Since we didn’t have one of those to test, we employed a trick we sometimes use and turned down the clock speed on our Core i7-965 Extreme from its native 3.2GHz to the 940’s 2.93GHz frequency. Given the breadth of CPU model ranges these days, we find ourselves using this trick fairly often. In fact, in this case, Intel even recommended that reviewers use this method to test Core i7-940 performance and provided the media with instructions for setting the proper clock speeds.

But such an approach always comes with caveats. Although we’re usually confident that the performance results of a “simulated” product model, using the same silicon and the same clock speed, will be true to the original, we’re less than confident that the power consumption results will match the actual chip. In fact, power consumption can vary from one individual chip to the next, along with the voltage, which is set at the factory. Beyond that, there is the slight possibility that our simulated product might somehow not match the original in terms of its configuration—and thus performance.

We discovered such a problem with our “simulated” Core i7-940 just ahead of the publication of our initial Core i7 review. Turns out that both our test configuration and Intel’s reviewer’s guide had overlooked an important characteristic of the Core i7: it has multiple clock domains that govern its processing cores and what Intel calls its “uncore” elements, such as the memory controller, QuickPath interconnect, and L3 cache. We had set the proper CPU core and QPI link speeds for our simulated Core i7-940, but the memory controller and “uncore” clock were incorrect—they were set to run at 3.2GHz, not the 940’s default speed of 2.13GHz for both. Not only that, but unlike the “unlocked” Extreme edition, the 940’s “uncore” clocks are limited to 2.13GHz and cannot go any higher (at least, not without overclocking.)

After the nice EMT people used the paddles to revive me from my minor coronary event, I decided to go ahead with the publication of our Core i7 review with the simulated Core i7-940 performance numbers intact, but I promised in the text of the review to follow up with numbers from a properly configured Core i7-940.

You see, what we didn’t know at that time was the precise impact of the fast L3 cache speed on the Core i7-940’s performance. But we knew from experience with similar architectures, such as the AMD Phenom, that raising the L3 cache speed of the Core i7 from 2.13GHz to 3.2GHz was potentially what we call in the industry A Very Big Deal; it would affect memory latency, cache bandwidth, and even power consumption. Put plainly, our original review of the Core i7 overstated the 940’s performance by some amount, possibly five to ten percent.

In order to rectify this situation, we got our thermal paste-stained hands on the genuine article: a retail boxed Core i7-940 processor, in order to see how the real thing performs. Here’s a nice picture of the soothing blue box:

We popped this puppy (the processor, not the box) into the exact same test rig we’d used for our initial Core i7 testing and ran it through the entirety of our CPU test suite.

In the end, the differences between our original, simulated Core i7-940 and the retail copy of the same weren’t huge, but they were measurable and sometimes fairly significant. Rather than hit you with a full-on deluge of performance data, I’ve selected some of our key tests—including synthetic memory benchmarks, productivity applications, and games—and provided results from them on the following pages. The scores from the two key contenders in this matchup are colored red in the graphs for easy identification. Our simulated Core i7-940 with the too-fast uncore clocks is marked as (sim), while the real McCoy simply goes by its proper name. Read on to see how much (and sometimes how little) difference the actual product’s slower L3 cache and memory controller can make.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Quad Q6600 2.4 GHz
Core
2 Duo E8600 3.33 GHz

Core 2 Quad Q9300 2.5 GHz

Core
2 Extreme QX9770 3.2 GHz
Dual
Core
2 Extreme QX9775 3.2 GHz
Core
i7-920 2.66 GHz

Core i7-940 2.93 GHz

Core
i7-965 Extreme 3.2 GHz
Athlon
64 X2 6400+ 3.2 GHz
Phenom
X3 8750 2.4 GHz

Phenom X4 9950

Black 2.6 GHz

System bus 1066
MT/s

(266 MHz)

1333
MT/s

(333 MHz)

1600
MT/s

(400 MHz)

1600
MT/s

(400 MHz)

QPI
4.8 GT/s

(2.4 GHz)

QPI
6.4 GT/s

(3.2 GHz)

HT
2.0 GT/s

(1.0 GHz)

HT
3.6 GT/s (1.8 GHz)
HT
4.0 GT/s (2.0 GHz)
Motherboard Asus
P5E3 Premium
Asus
P5E3 Premium
Asus
P5E3 Premium
Intel
D5400XS
Intel
DX58SO
Intel
DX58SO
Asus
M3A79-T Deluxe
Asus
M3A79-T Deluxe
BIOS revision 0605 0605 0605 XS54010J.86A.1149.

2008.0825.2339

SOX5810J.86A.2260.

2008.0918.1758

SOX5810J.86A.2260.

2008.0918.1758

0403 0403
North bridge X48
Express MCH
X48
Express MCH
X48
Express MCH
5400
MCH
X58
IOH
X58
IOH
790FX 790FX
South bridge ICH9R ICH9R ICH9R 6321ESB ICH ICH10R ICH10R SB750 SB750
Chipset drivers INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF Update
9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
update 9.1.0.1007

Matrix Storage Manager 8.5.0.1032

INF
update 9.1.0.1007

Matrix Storage Manager 8.5.0.1032

AHCI
controller 3.1.1540.61
AHCI
controller 3.1.1540.61
Memory size 4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
6GB
(3 DIMMs)
6GB
(3 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
Memory type Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Corsair
TW3X4G1800C8DF

DDR3 SDRAM

Micron
ECC DDR2-800

FB-DIMM

Corsair
TR3X6G1600C8D

DDR3 SDRAM

Corsair
TR3X6G1600C8D

DDR3 SDRAM

Corsair
TWIN4X4096-8500C5DF

DDR2 SDRAM 

Corsair
TWIN4X4096-8500C5DF

DDR2 SDRAM

Memory
speed (Effective)
1066
MHz
1333
MHz
1600
MHz
800
MHz
1066
MHz
1600
MHz
800
MHz
1066
MHz
CAS latency (CL) 7 8 8 5 7 8 4 5
RAS to CAS delay (tRCD) 7 8 8 5 7 8 4 5
RAS precharge (tRP) 7 8 8 5 7 8 4 5
Cycle time (tRAS) 20 20 24 18 20 24 12 15
Command
rate
2T 2T 2T 2T 2T 1T 2T 2T
Audio Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
ICH9R/AD1988B

with SoundMAX 6.10.2.6480 drivers

Integrated
6321ESB/STAC9274D5

with SigmaTel 6.10.5713.7 drivers

Integrated
ICH10R/ALC889

with Realtek 6.0.1.5704 drivers

Integrated
ICH10R/ALC889

with Realtek 6.0.1.5704 drivers

Integrated
SB750/AD2000B

with SoundMAX 6.10.2.6480 drivers

Integrated
SB750/AD2000B

with SoundMAX 6.10.2.6480 drivers

Hard drive WD Caviar SE16 320GB SATA
Graphics Radeon
HD 4870 512MB PCIe with Catalyst 8.55.4-081009a-070794E-ATI
drivers
OS Windows Vista Ultimate x64 Edition
OS updates Service
Pack 1, DirectX redist update August 2008

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket system was powered by a PC Power & Cooling Turbo-Cool 1KW-SR power supply. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

Right off the bat, we see that the simulated 940’s faster L3 cache and memory controller speed can make a difference. The retail 940 scores almost identically to the Core i7-920, which shares its 2.13GHz L3 and memory controller clocks.

There’s not much separation here overall, but look closely at the 4MB block size. This is right where the L3 cache on the Core i7 will be getting a workout, and the bandwidth difference between the two 940s is substantial: 55 GB/s for the simulated part, and 37 GB/s for the real deal.

Since it’s even more difficult to see the results once we get into main memory, let’s take a closer look at the 256MB block size:

In this test, once main memory becomes the limiting factor, the simulated and retail 940s perform the same.

Memory access latency is another story, though. The simulated 940 shaves eight nanoseconds off the access time of the real thing, a fact that could have real-world performance implications.

WorldBench

For a good look at general desktop application performance, we’ll focus on WorldBench’s overall score and then its individual components.

The simulated 940 comes out four points ahead of the retail product in WorldBench’s overall index. We can see which tests contributed most to this gap by looking through the results below.

Surprisingly enough, the faster L3 cache of the simulated 940 is good for a slight but broad performance boost in nearly every application here. The biggest differences come in the Firefox web browser test and in the multitasking test that runs Firefox concurrently with Windows Media Encoder.

Crysis Warhead

We measured Warhead performance using the FRAPS frame-rate recording tool and playing over the same 60-second section of the game five times on each processor. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

We tested at relatively modest graphics settings, 1024×768 resolution with the game’s “Mainstream” quality settings, because we didn’t want our graphics card to be the performance-limiting factor. This is, after all, a CPU test.

Far Cry 2

After playing around with Far Cry 2, I decided to test it a little bit differently by recording frame rates during the jeep ride sequence at the very beginning of the game. I found that frame rates during this sequence were generally similar to those when running around elsewhere in the game, and after all, playing Far Cry 2 involves quite a bit of driving around. Since this sequence was repeatable, I just captured results from three 90-second sessions.

Again, I didn’t want the graphics card to be our primary performance constraint, so although I tested at fairly high visual quality levels, I used a relatively low 1024×768 display resolution and DirectX 9.

Performance in both of these games is also affected by the slower cache in the retail chip. We’re not talking about the sort of delta you’re likely to feel by the seat of your pants as you play, but the effect is real, regardless.

Unreal Tournament 3

As you saw on the preceding page, I did manage to find a couple of CPU-limited games to use in testing. I decided to try to concoct another interesting scenario by setting up a 24-player CTF game on UT3’s epic Facing Worlds map, in which I was the only human player. The rest? Bots controlled by the CPU. I racked up frags like mad while capturing five 60-second gameplay sessions for each processor.

Oh, and the screen resolution was set to 1280×1024 for testing, with UT3’s default quality options and “framerate smoothing” disabled.

Testing games manually can be a rather variable affair, especially in a more random scenario like this one, and we cited our simulated 940’s scores in UT3 as proof of that fact in our initial Core i7 review. Here, with the proper L3 cache and memory controller speeds, the retail 940 performs more or less as expected, on average.

Half Life 2: Episode Two

Our next test is a good, old custom-recorded in-game timedemo, precisely repeatable.

Again, a minor but consistent difference.

Source engine particle simulation

Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.

This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

More of the same here.

Power consumption and efficiency

Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile.

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

The lower “uncore” clocks of the real Core i7-940 grant it the advantage of a couple of watts less power consumption at idle—makes sense, when you think about it.

Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.

Here’s the true surprise, though: even with its slower L3 cache and memory controller frequencies, the system equipped with the retail Core i7-940 draws about 15W more than the one with our simulated 940 from our original review. Why? Likely because Intel sorts its chips for the various models based on their characteristics, and the Core i7-965 Extreme chips are the best ones—most able to reach higher clock speeds and to do so with less voltage. Underclocking a 965 Extreme doesn’t give us the best handle on the Core i7-940’s true power consumption because the 965 Extreme is a sweetheart of a chip. The more pedestrian retail Core i7-940 requires more voltage to reach similar clock speeds, and it thus draws more power.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

We can quantify efficiency even better by considering specifically the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

The retail 940’s higher peak power draw and slightly lower performance team up to reduce its overall power efficiency somewhat in our final two tests. The 940 is still very power efficient, of course, but not quite what we first thought.

Conclusions

Now that you’ve seen the performance results, this little exercise may seem like a whole lot of work for a relatively minor issue. But we do strive for accuracy in our testing, and as you can see, our first attempt at measuring the performance of the Core i7-940 did overstate that processor’s performance slightly—but fairly consistently. The peak power consumption numbers for our simulated 940 were also a little tame compared to the real thing. We’ll be following up with additional CPU reviews based on this same performance data, and we’ll soon publish an article with a complete set of results from the retail Core i7-940 included.

Meanwhile, we have learned to be especially wary of the many clock domains inside of new CPUs like the Core i7 and the Phenom. Those secondary clocks can make a difference in overall performance (especially the L3 cache speed), and we expect this issue to become more prominent with time. In fact, we expect Intel to vary the L3 cache, QPI, and memory controller speeds of its Nehalem-derived processors quite a bit as it tailors them for different markets. As a result, it’s possible that a mobile version of the Core i7 (or whatever it’s called once it gets there) simply may not perform as well as the desktop equivalent, even if the core clock speeds are identical.

Overclockers will want to pay attention to these matters, as well. We’ve found that faster memory can make a measurable difference in Core i7 performance in many cases. But some of those performance gains may have been the result of the higher L3 cache frequencies we used (see the table on page 1 there). To some extent, you’re simply forced to turn up the uncore clocks if you want to run faster memory with the Core i7. But if you own a Core i7 and want to tune it for absolutely optimal performance, you might want to push the uncore clock as high as possible, even if it means using lower QPI and memory multipliers to keep things stable.

Comments closed
    • VILLAIN_xx
    • 11 years ago

    I dunno about any one else, but i just dont feel that same impressed feeling that i did with the Core2 generation on their first debut. Im not knocking that these i7s are the fastest to date (cause they are), but the Core2s were leaps and bounds the generation before them. This just felt like a minor tweak to just to stay ahead.

    • brucect
    • 11 years ago

    A closer look at the Core i7

    Intel Core i7 920 Nehalem 299$
    Intel Core i7 920 Nehalem 569$
    Intel Core i7 Extreme Edition 965 Nehalem 1029$ (….)

    Intel Core 2 Duo E8200 Wolfdale 159$
    DDR2 would be fine
    prices from newwegg

    • jonw
    • 11 years ago

    For 3d rendering I have got my 940 running within 15% almost of 2 x E5450.

    • travbrad
    • 11 years ago

    Thanks for this write-up and benchmarks. I knew there were some differences between the internals of the 965 and the 940, but I didn’t expect the differences to be this big. This definitely makes the 920 the most attractive option IMO. It’s hard to justify the extra cost of the 940, with such a small performance difference.

    Just a little note/correction. The idle power consumption chart has the E8600 using a green bar (AMDs color). It doesn’t affect the article at all obviously, just wanted to point that out.

    • aleckermit
    • 11 years ago

    The 920 is a MUCH better deal…

    A slightly overclocked 920 will be able to match/exceed the 940 easily, and save you ~$300.

    • CapnBiggles
    • 11 years ago

    Wow, now I don’t feel so hard-pressed with my o/c e8400. Not that I should anyway because of the pricing of i7 right now anyway, but I feel that I can hold on to this chip for a wee bit longer in regards to my predominant gaming usages.

    • Prototyped
    • 11 years ago

    Thank you for doing this. It’s good to know just how much the difference in the uncore speeds as well as the difference between extreme and performance segment silicon affects performance and power consumption.

    • Anonymous Coward
    • 11 years ago

    I find it interesting that the main memory latency ends up being right in with AMD’s K10 based stuff on the 2.13ghz “uncore” i7 models. I guess AMD didn’t do so bad after all. Actually latency tracks uncore clock speed pretty well, right up the chart.

    Why doesn’t AMD take the uncore to full core clock on BE models? Can their design not handle it? It sure looks like it would pay off. Sometimes 5%, sometimes even 10%… they only wish they could increase the core clock by that much!

      • DaveJB
      • 11 years ago

      I think that that K10 is limited to 2-2.2GHz in its uncore section simply because AMD can’t take it any higher. With Core i7 on the other hand, the uncore can run at full speeds, it’s just that Intel artificially cripples it on anything but the highest-end version.

        • Anonymous Coward
        • 11 years ago

        Its just cache and memory controlelrs right, what can be so hard about clocking it up?

        I find it especially interesting how the high clocked AMD X2 has about the same latency as i7 at 3.2ghz, while the low cocked AMD X3/X4 are hanging out with low uncore clocked i7’s. Perhaps all the people screaming about AMD’s crappy L3 & mem controller (including myself) where entirely wrong, and its all about clock speeds.

      • sdack
      • 11 years ago

      Low latencies are important for running an architecture at low clock speeds and keeps it fit for the fast changing work load types. Architectures with high clock speeds will always have high latencies, too, making them less flexible but more powerful for the straight forward work load types.

      AMD has an advantage in its design as it focuses less on the straight forward work load types, which are typical for multimedia applications. The more GPUs are taking over these tasks the more Intel needs to rethink parts of their design.

      It has been said that the next Phenom II will have a lower cache latency. It makes sense and will help AMD to stay on Intel’s tail, but they will not come close to Intel in any of the straight forward work load types.

        • tfp
        • 11 years ago

        l[

    • dragmor
    • 11 years ago

    Thanks for clearing that up Scott.

    • Xaser04
    • 11 years ago

    The article states on the first page the clock speed of the i7 940 is 2.66ghz. This should read 2.93ghz.

    Also I maybe slightly confused here but to simulate a i7 940 wouldn’t it have been easier (and possibly more accurate) to overclock the i7 920 instead of downclocking the i7 965? As far as I can tell the only difference between the 920 and the 940 is the core clock speed.

      • DaveJB
      • 11 years ago

      That would only have worked if the 920 had an unlocked multiplier. Since it doesn’t, you’d have to increase the QPI bus speed, which would in turn overclock the memory and invalidate the comparison.

        • Xaser04
        • 11 years ago

        Thanks I forgot completely then about the memory implications.

        I wonder though if it would be closer to the actual performance of the i7 940 than the downclocked i7 965. I am merely curious here.

        May have to do some testing of my own on my 920.

    • ssidbroadcast
    • 11 years ago

    You know… sometimes I forget just how truly awesome the _[

      • bogbox
      • 11 years ago

      Or that extra MHz are doing for you.
      Maybe a 5Ghz dual core will be faster than i7 even in multitasking.
      Or maybe applications are not well optimize to handele more than 2-4 cores.

      • Meadows
      • 11 years ago

      Indeed, it highlights how pointless Core i7 is unless you invest a fortune in it.

        • ssidbroadcast
        • 11 years ago

        After researching prices on the Egg, I’ve determined the best route would be to save nearly $100 and buy an E8_[<5<]_00 and then overclock that a bit to simulate the E8600.

          • flip-mode
          • 11 years ago

          Why not the E8400? It can probably OC just as high.

            • Fighterpilot
            • 11 years ago

            Doesn’t the 8600 have a 10 multiplier compared to max 9.5 multi on the 84 and 85?
            That enables FBS speeds to remain reasonable and makes the 8600 the numero uno choice for high overclocking.
            It’s a badass chip 🙂

            • MadManOriginal
            • 11 years ago

            9x multi on the e8400, and you kinda-sorta have a point although the math doesn’t work out in your point’s favor. With RAM sooo dirt cheap now that DDR2-1000 or -1066 kits are cheap and P45 mobos easily able to do 500+ FSB you’re looking at 4.5GHz+ on a 9x multiplier, or 5GHz on a 10x and getting from the former to the latter is not a given and would usually take at least watercooling. The only other difference then is binning but when you’re looking at one CPU sample it’s luck of the draw anyway but overall for typical overclocks 9x multi is enough.

    • crazybus
    • 11 years ago

    Ooooh….pretty box 🙂

    • Pachyuromys
    • 11 years ago

    I just assembled a Ci7 940 system, so I was particularly pleased to see this excellent supplementary report. However, what really stands out is not how much the simulated 940 differed from the real thing, but how much the real thing /[

    • moose17145
    • 11 years ago

    Thank you for the updated results! It’s very much appreciated!

    Sure makes me glad i got the 920 model now… i was seriously considering spending the money for the 940… but these tests definitely showed i did the right thing by sticking with the 920. Now all i need to do is wait for the egg to deliver it… **starts twiddling thumbs**

    • BoBzeBuilder
    • 11 years ago

    Thanks for the update. IMO simulations shouldn’t be done in the future since the performance can vary quiet a bit in some benchmarks and consumers base their purchase decisions on these numbers.

      • axeman
      • 11 years ago

      I don’t know if that’s practical in all cases, but it does prove yet again that the internets is not gospel. -[

      • eitje
      • 11 years ago

      Simulations aren’t perfect, but they do reflect real-world values: What if I buy a top-of-the-line proc, and then opt to downclock it? 🙂

        • Saribro
        • 11 years ago

        You would be reading SPCR, for one :).

      • sdack
      • 11 years ago

      The original i7 review mentions the simulation:

      /[

Pin It on Pinterest

Share This