Lower latencies are a good thing, of course, but how much can they really improve system performance? Are exotic, low-latency DIMMs worth the price premium? Read on as we explore the effects of memory latency on Athlon 64 performance in synthetic memory benchmarks, games, and real-world applications.
Before diving into our benchmark results, it’s worth taking a moment to go over how memory access works and where the various latencies come into play. Memory is organized like a spreadsheet, with data stored in cells that can be identified by a corresponding column and row. Spreadsheets can also be made up of multiple sheets, and similarly, memory can be made up of multiple banks. If we want to access a specific cell of memory, the system must first activate the sheet, or bank, containing the desired row. Next, the system sends an active command to the desired row. Once the row is activated, the system can issue read or write commands to specific columns in the row. When reading or writing has been completed, a precharge command is sent to close the row.
There are delays between each of the steps in memory access. These delays are referred to as latencies and expressed as a number of clock cycles. Here’s a brief explanation of some of the most common, and important, memory timing parameters that affect access latencies:
- RAS-to-CAS delay (tRCD) The RAS-to-CAS delay occurs between the time a row is activated and when the first read or write operation is performed.
- CAS latency (CL) CAS latency refers to the delay between when a read operation is issued and when the data returned by that read is considered valid.
- RAS precharge (tRP) The RAS precharge is the delay between when a precharge command is issued to close a row and when the next active command can be issued.
- Active-to-precharge delay (tRAS) This latency actually spans several steps in the memory access process. The active-to-precharge delay refers to the minimum number of cycles that must elapse between an active and precharge command.
Of course, no discussion of memory latency would be complete without mentioning the DRAM command rate. The command rate is the delay between when a memory chip is selected and when the first active command can be issued. The factors that determine whether a memory subsystem can tolerate a 1T command rate are many, including the number of memory banks, the number of DIMMs present, and the quality of the DIMMs. Some memory manufacturers claim that their DIMMs are rated for operation with a one-cycle (1T) command rate.
Since latencies refer to delays, lower is better. That doesn’t mean you should hop into your motherboard’s BIOS and set each memory timing option to its lowest possible value, though. Memory modules are rated for a specific set of latencies at a given clock speed, and they’re generally not stable with lower latencies. A DIMM’s latencies are usually expressed as a series of four hyphenated numbers corresponding to the CAS latency, RAS-to-CAS delay, RAS precharge, and active-to-precharge delay. Low latency DDR400, for example, is generally rated for 2-2-2-5 timings at 400MHz. That refers to two cycles of CAS latency, RAS-to-CAS delay, and RAS precharge, and five cycles of active-to-precharge delay.
Our testing methods
We’ve tested several different memory configurations to illustrate the performance impact the key memory timings settings, including DRAM command rate. Tests were conducted with a set of low-latency OCZ DIMMs rated for 2-2-2-5 timings at 400MHz. We also tested with 2.5-4-4-8 timings to simulate the performance of more affordable “value” memory. Some budget memory is rated with CAS latencies as high as three cycles, but since CAS 2.5 memory is already quite affordable, we’ve limited our testing to CAS 2 and 2.5. In addition to testing system performance with 2-2-2-5 and 2.5-4-4-8 memory timings, we’ve also tested each configuration with both 1T and 2T command rates.
All tests were run three times, and their results were averaged, using the following test system.
|Processor||AMD Athlon 64 FX-53 2.4GHz|
|System bus||HyperTransport 16-bit/1GHz|
|Motherboard||DFI LANParty UT NF4 Ultra-D|
|North bridge||NVIDIA nForce4 Ultra|
|Chipset drivers||ForceWare 6.66|
|Memory size||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)|
|Memory type||OCZ PC3200 EL Platinum Rev 2 DDR SDRAM at 400MHz|
|CAS latency (CL)||2||2||2.5||2.5|
|RAS to CAS delay (tRCD)||2||2||4||4|
|RAS precharge (tRP)||2||2||4||4|
|Cycle time (tRAS)||5||5||8||8|
|Hard drives||Western Digital Raptor WD360GD 37GB SATA|
|Audio driver||Realtek 3.75|
|Graphics||NVIDIA GeForce 6800 GT with ForceWare 77.77 drivers|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 2, DirectX 9.0c|
We used the following versions of our test applications:
- WorldBench 5.0
- Far Cry v1.3
- trdemo2 demos
- Quake 4 with trhangar1 and trtram demos
- Far Cry 1.30 with tr1-volcano demo
- Splinter Cell: Chaos Theory 1.05 with trpenthouse demo
- Battlefield 2 1.03
- FutureMark 3DMark05 Build 120
- FRAPS 2.6.4
- Cinebench 2003
- Sphinx 3.3
- SiSoft Sandra Standard 2005 SR2a
The test systems’ Windows desktop was set at 1280×1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
We begin with some synthetic memory subsystem benchmarks that should easily expose any performance differences between the various settings.
Command rate has a profound impact on memory bandwidth in both Sandra and Cachemem. The difference in performance between 2-2-2-5 and 2.5-4-4-8 timings is much more subtle, though.
Moving to Cachemem’s latency test, we see that the command rate again has a bigger impact on performance than the other memory timings options.
Memory latency doesn’t have much of an impact on Cinebench 2003 rendering performance, but our tighter 2-2-2-5 timings are a little faster in the shading tests. This time around, the command rate’s impact on performance is less than that of the other memory timings. 2-2-2-5 isn’t all that much faster than 2.5-4-4-8, though.
Sphinx is a sucker for fast memory subsystems, so it’s no surprise that quicker latencies translate into better overall performance. Here, our 2-2-2-5 timings are close to 13% faster than more relaxed 2.5-4-4-8 latencies. Moving from a 2T to 1T command rate only improves performance by about 7%, though.
WorldBench overall performance
WorldBench uses scripting to step through a series of tasks in common Windows applications. It then produces an overall score. WorldBench also spits out individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.
Only a single point separates our overall WorldBench scores, with the 2-2-2-5 configurations just edging out our 2.5-4-4-8 timings. Let’s break down WorldBench’s overall score into individual tests results to see if we can find a breakout performance.
Multimedia editing and encoding
Windows Media Encoder
VideoWave Movie Creator
Tighter 2-2-2-5 timings improve performance in several of WorldBench’s multimedia editing and encoding tests, but never by more than a couple of percentage points.
WorldBench’s image processing tests don’t see much benefit from either lower memory latencies or a more aggressive command rate.
Multitasking and office applications
Mozilla and Windows Media Encoder
Mozilla does show a difference between the settings, both on its own and when paired with Windows Media Encoder. Still, the differences in performance between 2-2-2-5 and 2.5-4-4-8 timings, and between the 1T and 2T command rates, are only a couple of percentage points.
Neither Nero nor WinZip shows much preference for quicker timings or a 1T command rate.
We conducted our gaming tests with two sets of in-game quality settings. First, we tested at low resolutions with medium quality levels and antialiasing and anisotropic filtering disabled. We then tested at higher resolutions and detail levels, with antialiasing and aniso, to better reflect how most users would play games on a system of this caliber. The latter settings may bottleneck performance at the graphics card, but that’s how things are with the vast majority of today’s games.
Since you won’t find anyone playing 3DMark05, we’ve limited our testing to the app’s default settings.
Although 3DMark05’s overall score is GPU-bound on our test system, the CPU rendering tests show some preference for a faster command rate and tighter timings.
Lower memory latencies give Far Cry a nice boost at a low resolution and detail level, but there’s virtually no difference in performance at higher resolutions and detail levels.
As we saw in Far Cry, the measurable performance impact of memory timings and the DRAM command rate in DOOM 3 seems to be limited to lower resolutions and detail levels, where the graphics card isn’t a bottleneck.
Timedemos in Quake 4 don’t appear to render all of the game’s eye candy effects, but since we’re only changing memory latencies and command rates, that shouldn’t impact our results.
Again, we only see memory latency and command rates having an impact on performance at lower resolutions and detail levels. Here, CAS latency and its cousins have a more significant impact on performance than the DRAM command rate.
Unreal Tournament 2004
Unreal Tournament favors lower latencies ever so slightly at higher resolutions and detail levels, but the biggest performance impact remains at lower resolutions and detail levels. Even then, there’s only about a 7% difference in frame rates between 2-2-2-5-1T and 2.5-4-4-8-2T.
Splinter Cell: Chaos Theory
Memory latency doesn’t have much of an impact on Splinter Cell: Chaos Theory performance, even at a lower resolution and detail level.
Battlefield 2 performance was tested with FRAPS, and the differences in performance are only a couple of frames per second. I suspect that’s within the margin of error of our manual tests, which were conducted at least five times before the results were averaged.
Although tighter memory timings and a 1T command rate can certainly improve the performance of the Athlon 64’s memory subsystem, that improvement doesn’t always translate to better application performance. In fact, with the exception of the Sphinx speech recognition engine, moving to tighter memory timings or a more aggressive command rate generally didn’t improve performance by more than a few percentage points, if at all, in our tests. Lower latencies only improved WorldBench’s overall score by a single point, and performance gains in games were generally limited to lower resolutions and detail levels.
So how much does the modest performance improvement brought by tighter memory latencies cost? Close to twice as much. As I write, a single 512MB stick of OCZ Value DDR400 memory rated at 2.5-4-4-8 sells for between $45 and $52 online, while a 512MB Platinum Rev 2 2-2-2-5 DDR400 module sells for between $81 and $94. Looking at dual-channel kits, a pair of 512MB OCZ Value DDR400 DIMMS rated for 2.5-4-4-8 timings sells for between $91 and $103 online, while a pair of 512MB Platinum Rev 2 sticks rated for 2-2-2-5 costs between $155 and $191.
OCZ isn’t the only DIMM maker charging that sort of premium for ultra-low-latency modules. In fact, it’s common. To cite another example, a pair of 512MB Corsair Value DDR400 DIMMs rated for 2.5-4-4-8 will set you back between $80 and $159, while a couple of the company’s 512MB TWINX1024-3200XL 2-2-2-5 DDR400 modules run from $189 all the way up to $325.
For most users, the price premium associated with exotic 2-2-2-5 memory won’t be worth the relatively modest performance gains that it offers. Low-latency memory does have an ace up its sleeve for overclockers, though. Most low-latency modules are capable of running at much higher clock speeds if you back off on their latencies a little. We’ve had our OCZ Platinum Rev 2 DIMMs, which are rated for 2-2-2-5 latencies at 400MHz, cranked all the way up to 560MHz with more relaxed 2.5-4-4-8 timings. Overclocking success is never guaranteed, of course, but low-latency memory modules tend to use higher quality chips that respond better to overclocking.
At the end of the day, the appeal of low-latency memory modules may be limited to overclockers and enthusiasts intent on squeezing every last drop of performance from a system. More pedestrian “value” memory should be plenty fast enough for everyone else, especially since you can practically afford twice as much.