Exploring the impact of memory speed on Core i7 performance

One of the defining features of Intel’s Core i7 processor is its integrated memory controller. There’s certainly nothing revolutionary about moving this logic onto the processor die—AMD’s been doing it for years, ever since the first Opteron launched way back in 2003. However, with support for three channels of DDR3 memory at speeds up to 1600MHz and beyond, the Core i7’s integrated memory controller is clearly a cut above what’s available in even AMD’s latest Phenom processors.

With three channels of lowly DDR3-1066, you’re looking at a whopping 25.6GB/s of memory bandwidth. Crank the memory clock up to 1600MHz, and bandwidth jumps to an even more impressive 38.4GB/s. This fat pipe, combined with the low access latencies inherent to on-die memory controllers, gives the Core i7 a formidable memory subsystem. It also poses an interesting question: how bound by memory speed is Intel’s new processor architecture?

That might seem like a purely academic question on the surface, the sort of thing an especially geeky hardware reviewer would address to satisfy their own obsessive technical curiosity. But it’s an issue that leads to many more pertinent questions that should be on the minds of prospective Core i7 system builders. For example, is there any tangible performance benefit—beyond higher scores in synthetic memory benchmarks—to pairing a Core i7 with fancy DIMMs with lower access latencies, or are you better off saving a few bucks with more pedestrian DIMMs that run at looser timings? What about memory frequency? Does the Core i7’s performance scale up if you drop a little extra coin on memory capable of running at 1333 or 1600MHz? And while we’re at it, does performance really suffer if you drop down to just two memory channels? Join us as we throw multiple memory configurations at a pair of Core i7 processors to find out.

A multitude of memory configurations

We’ve split testing between a Core i7-920 and 965 Extreme due to the former’s lack of official support for memory faster than 1066MHz. Core i7-920 and 940 processors have a maximum memory speed of 1066MHz that motherboard makers haven’t yet found a way to circumvent, but the 965 Extreme is free to use multipliers that run its memory bus at 1333, 1600, 1866, and even 2133MHz. Finding DIMMs capable of running at those higher speeds may prove difficult, though. Intel’s maximum recommended memory voltage for Core i7 processors is just 1.65V. While that’s a smidgen above the 1.5V called for by the DDR3 spec, it’s well below the voltage required by most high-speed DDR3 modules already on the market. Speedy DDR3 chips that can get by with only 1.65V appear to be in short supply, but Kingston was able to hook us up with a triple-channel trio of KHX14400D3K3/3GX modules rated for operation at up to 1800MHz at 1.65V.

With a fistful of DDR3, we’ve run a number of different configurations on each CPU. We’ll start with straight memory scaling on the 965 Extreme, since that’s the most straightforward. Here we’ve run the system’s memory at 1066, 1333, and 1600MHz. The Kingston DIMMs are rated for 7-7-7-20 timings at 1066 and 1333MHz, and 8-8-8-24 timings at 1600MHz, and we’ve stuck with those values. These modules are also capable of running at 1800MHz with 9-9-9-27 timings, but our Core i7 motherboard only has the necessary multipliers for an 1866MHz memory clock, which proved a little too fast for our modules. Even topping out at 1600MHz, these configurations should give us a good look at how the flagship Core i7 responds to faster RAM.

Next, we turn our attention to the 920 to answer several additional questions about Core i7 memory performance. The first and perhaps most important of these is whether you can get away with budget DIMMs that have looser memory timings. To find out, we’ve tested the 920 with 1066MHz memory at 7-7-7-20 and 9-9-9-27 timings. We’ve also addressed the ultimate cheapskate question of whether you lose much performance pairing the Core i7 with only two DIMMs, leaving one of its memory channels on the table. For this dual-channel config, we’ve stuck with 1066MHz memory at 9-9-9-27 timings.

Despite the 920’s lack of official support for faster memory, we can push its memory clock higher by overclocking the processor’s base clock speed. We’ve done just that, dialing our Core i7-920’s base clock up from 133 to 167 and 200MHz, which allows us to run the memory at 1333 and 1600MHz, respectively. Since we’re focusing on memory performance, we lowered the 920’s core multiplier to 16X at 167MHz and 13X at 200MHz. That gives us the same 2.66GHz core clock with a 167MHz base clock and 2.6GHz at 200MHz, which is close enough. As we did with the 965 Extreme, we’ve stuck with our DIMMs’ default latencies of 7-7-7-20 at 1333MHz and 8-8-8-24 at 1600MHz. These results should let us know what happens to the i7’s performance when you push both its base and memory clocks.

Since we’re trying to isolate the impact of memory performance, we disabled the Core i7’s Turbo mode for testing. As a result, the 920’s core clock won’t exceed 2.66GHz, and our 965 Extreme won’t tick up from 3.2GHz. However, the changes we make to memory and base clock speeds will affect other elements of the Core i7 processor, notably the speed of its memory controller, L3 cache, and QuickPath Interconnect. These so-called uncore elements of the processor can impact performance, so we’ve listed the various speeds in a nifty table below.

Core Memory Memory controller L3 cache QPI
965 – 1066 3.2GHz 1066MHz 2.66GHz 2.66GHz 3.2GHz
965 – 1333 3.2GHz 1333MHz 2.66GHz 2.66GHz 3.2GHz
965 – 1600 3.2GHz 1600MHz 3.2GHz 3.2GHz 3.2GHz
920 – 1066 2.66GHz 1066MHz 2.13GHz 2.13GHz 2.4GHz
920 (16×167) – 1333 2.66GHz 1333MHz 2.66GHz 2.66GHz 3.0GHz
920 (13×200) – 1600 2.6GHz 1600MHz 3.2GHz 3.2GHz 3.6GHz

Our memory controller clocks come from CPU-Z, which lists the speed of the Core i7’s memory controller as the “NB frequency.” Based on discussions we’ve had with Intel, we believe the L3 cache’s clock speed is equal to the memory controller speed in Core i7-920 and 965 Extreme processors. The higher memory controller speeds required to run faster memory must therefore also increase the speed of the L3 cache. Higher L3 speeds are likely to improve performance with data sets that fit within the Core i7’s 8MB of available cache. They can potentially reduce memory access latencies, as well.

Our testing methods

We’ve color-coded our results to make them a little easier to read, putting our 965 configurations in blue and the 920 in gold.

All tests were run three times, and their results were averaged.

Processor

Intel Core i7-920
2.66GHz


Intel Core i7-920
2.66GHz (16x167MHz)


Intel Core i7-920
2.6GHz (13x200MHz)


Intel Core i7-965 Extreme
3.2GHz
System bus QPI 4.8GT/s (2.4GHz)
Motherboard

Asus P6T Deluxe
Bios revision 0703
North bridge Intel X58 Express
South bridge Intel ICH10R
Chipset drivers Chipset: 9.1.0.1007
AHCI:
8.1.5.0.1032
Memory size 2GB (2 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs) 3GB (3 DIMMs)
Memory type

Kingston KHX14400D3K3/3GX
DDR3 SDRAM
Memory bus speed 1066MHz 1066MHz 1066MHz 1333MHz 1600MHz 1066MHz 1333MHz 1600MHz
CAS latency (CL) 9 9 7 7 8 7 7 8
RAS to CAS delay
(tRCD)
9 9 7 7 8 7 7 8
RAS precharge (tRP) 9 9 7 7 8 7 7 8
Cycle time (tRAS) 27 27 20 20 24 20 20 24
Command rate 1T 1T 1T 1T 1T 1T 1T 1T
Audio codec Analog Devices AD2000B
with 6.10.1.6520

drivers
Graphics
2 x

Nvidia GeForce GTX 280 1GB
in SLI with ForceWare 180.44 drivers

Hard drive


Western Digital Raptor WD1500ADFD 150GB
SATA

OS


Windows Vista Ultimate x64
with Service Pack 1

Our test system was powered by a BFG Tech ES-800 800W power supply unit. The ES-800 took home a TR Recommended award in our last PSU round-up, and it has enough power and all the right connectors for our dual GeForce GTX 280 graphics cards in SLI.

Finally, we’d like to thank Western Digital for sending Raptor WD1500ADFD hard drives for our test rigs.

We used the following versions of our test applications:

The test systems’ Windows desktop was set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory performance

Performance in synthetic memory benchmarks doesn’t always carry over into the real world, but given our focus today, it’s good to get an idea of the bandwidth and latency associated with each configuration.

It should come as no great surprise that faster memory yields greater bandwidth in a synthetic memory test. With the 965 Extreme, there’s a healthy jump in bandwidth as we step up the memory speed ladder. With an overclocked base clock, our Core i7-920 keeps up with the Extreme when running memory at 1333 and 1600MHz, too. However, the 920 is notably slower when both processors use 1066MHz RAM, no doubt because the 965 Extreme’s memory controller runs a little faster than that of the 920, as does its L3 cache. This difference in memory controller speed appears to be negated when we overclock the 920’s base clock to 200MHz, which brings the processor’s memory controller and L3 cache up to the same speed as the Extreme’s with 1600MHz memory.

Focusing our attention on the 920 with 1066MHz memory, moving from 7-7-7-20 to 9-9-9-27 timings costs about 1GB/s of bandwidth. Lopping off a memory channel drops bandwidth by a further 4GB/s, which doesn’t bode well for our dual-channel config. However, it is worth noting that our dual-channel config delivers three quarters the memory bandwidth of a triple-channel setup with just two thirds the number of channels.

As you might expect, faster memory also yields lower access latencies. Here our 920’s overclocked base clock isn’t enough to catch the 965 Extreme, despite the fact that both should be running their memory controllers and L3 caches at the same clock speed. The difference in performance between 1066MHz DIMMs with 7-7-7-20 and 9-9-9-27 timings is quite a bit larger here than it was in the bandwidth test, too.

By far the most curious result in this lot is that of our dual-channel config, whose access latencies are quite a bit lower (a nanosecond is a relative eternity in the GHz world of the modern PC) than those of our triple-channel equivalent. There might yet be hope for our dual-channel config.

STARS Euler3d computational fluid dynamics

Few folks run fluid dynamics simulations on their desktops, but we’ve found this multi-threaded test to be particularly demanding of memory subsystems, making it a good link between our memory and application performance tests. We ran this test with eight threads and five iterations.

The Core i7-965 Extreme’s performance scales up nicely with faster memory. The looser timings of our 1600MHz memory config appear to exact a performance penalty here, but it’s at best a small one.

Like its faster sibling, the 920 also makes good use of faster memory, although it doesn’t do so quite as effectively. For example, the 920 is only a little bit faster with 1600MHz memory than it is with DDR3-1333. Both of those configs are quicker than the 920 paired with 1066MHz memory, though. There, the 920 does better with lower access latencies and a full triple-channel config. Again, though, it’s worth noting that the performance of our dual-channel configuration isn’t as low as one might expect, given that we’ve essentially reduced theoretical memory bandwidth (and total system memory) by one third.

MyriMatch proteomics

This MyriMatch benchmark simulates protein analysis with multiple threads, and according to its developers, performance is at least partially bound by memory bandwidth. Perfect. We’ve stuck with an eight-thread run for this test.

Processor speed matters more here than in Euler3D, but we still see consistent gains from faster memory configurations. The 965 Extreme, for example, shaves off two seconds with each step up the memory speed ladder. Those gains aren’t quite as pronounced for the Core i7-920, which sees a jump in performance with our overclocked DDR3-1333 config, but not much of an improvement when we bump the system up to DDR3-1600.

That said, the 920 is more than three seconds faster with 1066MHz memory running at tighter 7-7-7-20 timings than it is with 9-9-9-27 latencies. And it’s one second faster with three memory channels than it is with only two—an admittedly small margin, all things considered.

Cinebench 10

We’ve tapped two Cinebench tests here. The first is a rendering test that should be largely CPU-bound, but the second is an OpenGL modeling test that might benefit from a faster memory subsystem.

There isn’t much to see in the rendering tests, where our various configurations line up according to processor speed.

The results of the modeling test are more mixed, with the 965 Extreme scoring higher with faster memory. Scores for the 920 don’t universally favor quicker memory, though. While our low-latency DDR3-1066 config beats looser timings and our dual-channel setup, the Core i7-920 is actually slower with an overclocked base clock and both 1333 and 1600MHz memory. It is worth noting, though, that the performance gaps we’re seeing here are much smaller than those we observed in MyriMatch and Euler3d.

WorldBench

WorldBench uses scripting to step through a series of tasks in common Windows applications. It then produces an overall score. WorldBench also spits out individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.

Now that’s interesting. Memory speed appears to have only a muted impact in WorldBench, where just a single point separates most configurations running at the same processor speed. The only exception is the dual-channel Core i7-920 setup, which is a full six points slower than the other 920-based systems.

Scores are very close through WorldBench’s multimedia editing and encoding tests, particularly when we look at the Extreme, whose performance doesn’t scale up meaningfully with faster memory. Our 920-based configurations are a little more spread out, but really too close to call, especially given that our 13×200 setup has a slightly lower processor clock than the others.

WorldBench’s office and multitasking tests show a little more preference for faster memory. Both the Core i7-920 and the 965 Extreme turn in quicker times with higher RAM speeds here, although we’re only talking about differences of a few seconds. With 1066MHz memory, there isn’t much to be gained from an extra memory channel or tighter timings.

Much as we saw in Cinebench, Core i7 rendering performance is entirely bound by processor speed. The configs with faster memory perform a little better in the DirectX modeling test, but again, we’re looking at differences of only a few seconds in a test that takes more than five minutes.

WorldBench’s Nero test has proven itself to be somewhat unreliable, possibly in part thanks to Vista’s propensity to aggressively pre-fetch and reorganize data on a system’s hard drive. It’s hard to imagine why our Core i7-920 configurations should be faster here, especially considering that we used a fresh image for each setup.

Fortunately, the WinZip results appear to be more reliable. Here we see processor speed playing a much larger role than memory, with only one exception. Our dual-channel DDR3-1066 setup is by far the slowest in this test, trailing its triple-channel equivalent by more than a minute. Keep in mind that this dual-channel config only has access to 2GB of system memory while the rest get a full 3GB.

Gaming

Everyone knows that performance in today’s games is largely bound by a system’s graphics processor. Or at least everyone should know that. But time and time again, I see fanboys and forum dwellers espousing the benefits of fancy-pants memory modules. To find out if faster memory makes games run better on the Core i7, we’ve run a couple of sets of gaming tests. For the first set, we’ve stuck to a relatively modest resolution of 1024×768 with high in-game detail levels. This should be child’s play for our duo of GeForce GTX 280 1GB graphics cards in SLI.

The performance impact of memory speed varies from one game to the next, but faster memory clearly makes a difference, particularly in Far Cry 2. Most of these games produce frame rates well into the hundreds here, so it’s really only with Far Cry 2 and Crysis that we’re dealing with differences in performance that you might be able to actually see when playing. Both games show steady (if modest) frame rate increases with faster memory configurations. The only odd exception is our dual-channel DDR3-1066 config, which is quicker than expected in Far Cry 2.

Apart from competitive gamers obsessed with imperceptible levels of smoothness, few people actually play at low resolutions with modest in-game detail levels on a high-end system. To see what sort of impact memory speed might have with more realistic settings, we cranked the resolution to the highest level supported by the game with our 19″ CRT monitor and pushed all in-game detail levels to their maximum. We also turned on 16X anisotropic filtering where available and enabled 4X antialiasing.

We’re still looking at frame rates over 100 FPS with three of the five games we used for testing, but the performance picture has changed a little. The gaps are much smaller this time around, and in Crysis and Call of Duty 4, they’re virtually nonexistent.

With both the Core i7-920 and the 965 Extreme, frame rates still improve when moving to faster memory in some games. These improvements are largely incremental, though, so I wouldn’t get too excited.

Conclusions

Although three channels of DDR3 memory might seem excessive, the Core i7 really does seem to make good use of faster memory, at least in synthetic tests. But that was to be expected. The real question is whether those gains translate to real-world applications, and it’s here that the results are more mixed. Certainly, in scientific computing tests like Euler3d and MyriMatch, which we already know to be sensitive to memory subsystem performance, faster memory can provide tangible performance perks. However, common desktop applications like those highlighted by WorldBench don’t benefit much from higher memory clocks or tighter timings. Neither do most games, which at best show minor frame rate improvements that aren’t significant enough for most folks to even notice, let alone appreciate.

Based on these results, the question of whether it’s worth shelling out for fancy memory for a Core i7 build depends largely on which processor you’re going to use and what applications you’ll be running. If you have your heart set on a Core i7-965 Extreme and will be running applications capable of taking advantage of additional memory bandwidth, by all means splurge on some DDR3-1600 DIMMs. Sure, you’ll pay twice the price of pedestrian DDR3-1066 modules, but anyone considering a 965 Extreme clearly doesn’t have budget constraints.

Most enthusiasts do have budgets, though, and we’re generally disinclined to swallow the exorbitant premiums associated with flagship gear, even if it is faster. I suspect most folks rolling their own Core i7 systems will stick with the Core i7-920. Unless you’re going to push the base clock, which requires some cooperation from your motherboard, the 920 is essentially limited to DDR3-1066. At that speed, DIMMs with 7-7-7-20 timings are actually quite affordable and widely available, so there’s really no need to settle for budget modules with looser timings. Our testing shows that you can even get away with a dual-channel config if you happen to already have a couple of DDR3 DIMMs lying around. I wouldn’t skimp on that third memory module if I were building a Core i7 system from scratch, though.

Of course, many enthusiasts who spring for a Core i7-920 will be looking to overclock by increasing processor’s base clock speed. Turning up the base clock allows for faster memory speeds, and given that the 920’s memory bus multipliers are limited to 6X or 8X, you’ll actually need DIMMs capable of running at faster than 1066MHz if you intend to push the base clock above 178MHz. Whether it’s worth springing for DDR3-1600 modules over more affordable DDR3-1333 DIMMs will depend entirely on the applications you use and just how high you intend to push the base clock. You’ll need to reach at least 200MHz to run 1600MHz DIMMs at full speed, and not every Core i7 motherboard is up to that task.

In the end, Core i7 processors will certainly achieve higher levels of performance when paired with faster memory, but you don’t lose all that much—particularly with games and common desktop applications—by running slower, more affordable DIMMs. That’s good to know for folks looking at the relatively high prices of fancy triple-channel DDR3-1600 kits. However, if you’re going to overclock, it’s worth having the extra headroom that faster modules can provide.

0 0 votes
Article Rating
2 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
moshpit
moshpit
13 years ago
Reply to  paralou

No, it is not better to populate all memory slots. In fact, on some boards it can force you to have to run the memory at lower speeds. Not sure if Core i7’s IMC overcomes that problem though.

moshpit
moshpit
13 years ago

Hehehe, I loved this line: “for those folks rolling their own Core i7 systems…”

I thought you wern’t supposed to let the magic smoke escape 😉

Pin It on Pinterest

Share This

Share this post with your friends!