Meanwhile, the dual-channel Athlon 64 variants, the FX-51 and FX-53, have been absolutely laying waste to our entire benchmark suite. Those chips are expensive, however, and have a few liabilities. The Athlon 64 FX processors inherit their dual-channel memory capabilities from the Opteron line, and they slide into the same 940-pin socket as the Opteron. Socket 940 processors require registered DIMMs, which are a little odd, a little pricey, and have slightly higher memory access latencies than regular ol’ unbuffered DIMMs. Also, the motherboard options for the Athlon 64 FX have been adequate, but not particularly spectacular, varied, or affordable.
Socket 939 promises to address these problems by allowing motherboard manufacturers to make dual-channel DDR400 motherboards with cheaper four-layer PCBs, and by allowing the use of unbuffered DIMMs. Also, AMD has officially blessed the use of a 5X multiplier on the HyperTransport links connecting Socket 939 processors to the rest of the system, raising the peak effective bandwidth for this link to 8.0GB/s.
Today, AMD lifts the curtains on four new processors, three of them for Socket 939. We’ve had two of those CPUs, the Athlon 64 3800+ and Athlon 64 FX-53 (939-pin edition), in Damage Labs for review. Naturally, we’ve run them through our gauntlet of benchmarks and compared them to nearly 20 different competing processors. Keep reading to find out whether Socket 939 lives up to its promise.
Socket 939 up close
Both of the new AMD CPUs we have for review run at 2.4GHz and drop into a 939-pin socket. In fact, the main difference between the Athlon 64 3800+ and Athlon 64 FX-53 is 512K of on-chip cache. Like its 940-pin counterpart, the Socket 939 Athlon 64 FX has 1MB of L2 cache on chip, while the Athlon 64 3800+ has 512K of L2 cache. Before we get worked into a fit of hysterics trying to explain AMD’s model number schemes, let’s have a look one of the new Socket 939 chips.
The 939-pin Athlon 64 doesn’t look wildly different from its predecessors, and at first glance, one might think that the only difference between the 940- and 939-pin chips was that single missing pin at the lower left corner of the picture above. However, a closer look reveals that AMD has moved the location of some of the “gaps” in the pin configuration where no pins exist. Socket 939 is well and truly incompatible with Socket 940, so there’s no hope of running a 939-pin chip in a 940-pin motherboard. The worlds of the Athlon and the Opteron will be a little more separate from now on.
In addition to the Athlon 64 3800+ and FX-53 (939-pin edition), AMD is introducing two other chips: the 939-pin Athlon 64 3500+ running at 2.2GHz with 512K of L2 cache, and the Athlon 64 3700+ intended for Socket 754, with a 1MB L2 cache and a 2.4GHz clock speed.
Confused yet? If so, you’re probably not alone. AMD’s model numbering methods continue to elude me. Prior to the introduction of these chips, I had thought that AMD was using clock speed increments and cache sizes in a fairly predictable way: adding or removing 200MHz in clock frequency would move the model number up or down by 200, as would the transition from 512K of L2 cache to 1MB (or vice versa). Thus, the Athlon 64 3000+ runs at 2.0GHz with 512K of L2 cache, while the Athlon 64 3200+ runs at 2.0GHz with 1MB of L2 cache. The Athlon 64 3400+ also packs 1MB of L2, but clocks in at 2.2GHz. All neat and logical, right?
In fact, early speculation about Socket 939 chips suggested AMD would use the addition of a second memory channel interchangeably with the cache size and clock speed increments, so that an Athlon 64 clocked at 2.2GHz with 512K of L2 cache and dual-channel memory would earn the rank of 3400+. That would be the neat and logical thing to do, but such is not the case. Instead, AMD has called that product the 3500+. And just to make things more confusing, the new Socket 754 chip that differs from the Athlon 64 3400+ in nothing but 200MHz of clock frequency has been dubbed the 3700+!
What’s changed? Well, perhaps that mythical Athlon “Thunderbird” processor, against whose performance at a given clock speed the model numbers for all subsequent Athlon products are purportedly determined, hit a performance scaling problem at 3.5GHz in AMD’s secret labs. Har har. But the more obvious culprit is Intel’s Pentium 4 “Prescott” processor, which delivers less performance, clock for clock, than previous Pentium 4 chips. No doubt AMD has tweaked its numbering formula to aid comparisons with the latest Pentium 4s.
To help you navigate these troubled waters, let me present to you a partial guide to AMD “Hammer” model numbers, which I cooked up one night while trying to make sense of it all.
It all looks sensible when laid out that way. Until you really think about it.
So what kind of performance can we expect from these new chips? Well, we’ve already reviewed the Athlon 64 FX-53 in its Socket 940 form, and we know that it’s stinking fast. The move to Socket 939 shouldn’t change the math too much, but a couple of factors may work in its favor to some degree. Switching to unbuffered DIMMs should cut memory access latencies ever so slightly. That’s always a good thing.
As for the 5X multiplier on the HyperTransport link, its benefits may be difficult to spot. Juicing up HyperTransportto a new peak of 8.0GB/s at an effective 2GHz clock rateis a forward-looking move that will probably pay off when PCI Express arrives. This link serves as the Athlon 64’s connection to the rest of the system, but with the memory controller onboard the CPU, the HyperTransport link isn’t particularly burdened in current systems. Our review of the VIA K8T800 Pro chipset showed us few instances where faster HyperTransport seemed to register in the benchmark scores.
You can compare the Socket 940 version of the Athlon 64 FX-53 to the 939-pin version when reading our benchmark results, but you need to know a secret code to do it. That secret code is this:
Opteron 150 = Athlon FX-53 (940-pin version)
These two products are essentially the same thing with different names, with the notable exceptions that the Opteron name is more respectable in proper computing circles, while the FX-53 has an unlocked multiplier for us disreputable overclockers to tweak. The performance is identical, so long as there’s no overclocking at work. I’ve left the Opteron 150 label on our graphs because, well, that’s what I used in testing. The Athlon 64 3800+ is just an Athlon 64 FX-53 minus 512K of L2 cache. Sometimes, its smaller cache will slow it down a little bit, but other times, the 3800+ should run neck-and-neck with the FX-53. All depends on whether the application is able to use the FX’s extra L2 cache to its advantage.
But enough gabbing. Let’s see the numbers.
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test systems were configured like so:
|Processor||Athlon XP-M ‘Barton’ 2500+ 2.4GHz||Athlon XP ‘Barton’ 3200+ 2.2GHz||Athlon XP ‘Barton’ 3000+ 2.167GHz||Athlon 64 3000+ 2.0GHz
Athlon 64 3200+ 2.0GHz
Athlon 64 3400+ 2.2GHz
| Opteron 146 2.0GHz
Opteron 148 2.2GHz
Opteron 150 2.4GHz
|Athlon 64 3800+ 2.4GHz
Athlon 64 FX-53 2.4GHz
| Pentium 4 2.8’C’GHz
Pentium 4 3.2GHz
Pentium 4 3.4GHz
Pentium 4 3.2GHz Extreme Edition
Pentium 4 3.4GHz Extreme Edition
Pentium 4 2.8’E’GHz
Pentium 4 3.0’E’GHz
Pentium 4 3.2’E’GHz
Pentium 4 3.4’E’GHz
|Front-side bus||400MHz (200MHz DDR)||400MHz (200MHz DDR)||333MHz (166MHz DDR)||HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
|HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
|HT 16-bit/1GHz downstream
HT 16-bit/1GHz upstream
|800MHz (200MHz quad-pumped)|
|Motherboard||Abit AN7||Asus A7N8X Deluxe v2.0||Asus A7N8X Deluxe v2.0||MSI K8T Neo||MSI 9130||MSI MS-6702E||Abit IC7-G|
|North bridge||nForce2 SPP||nForce2 SPP||nForce2 SPP||K8T800||K8T800||K8T800 Pro||82875P MCH|
|South bridge||nForce2 MCP-T||nForce2 MCP-T||nForce2 MCP-T||VT8237||VT8237||VT8237||82801ER ICH5R|
|Chipset drivers||ForceWare 3.13||ForceWare 3.13||ForceWare 3.13||4-in-1 v.4.51
|INF Update 5.1.1002|
|Memory size||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)||1GB (2 DIMMs)|
|Memory type||Corsair TwinX XMS4000 DDR SDRAM at 400MHz||Corsair TwinX XMS4000 DDR SDRAM at 400MHz||Corsair TwinX XMS4000 DDR SDRAM at 333MHz||Corsair TwinX XMS4000 DDR SDRAM at 400MHz||Corsair CMX512RE-3200LL PC3200 registered DDR SDRAM at 400MHz||Corsair TwinX XMS3200LL DDR SDRAM at 400MHz||Corsair TwinX XMS4000 DDR SDRAM at 400MHz|
|RAS to CAS delay||3||3||3||3||3||3||4|
|Hard drive||Seagate Barracuda V 120GB ATA/100||Seagate Barracuda V 120GB ATA/100||Seagate Barracuda V 120GB ATA/100||Seagate Barracuda V 120GB SATA 150||Seagate Barracuda V 120GB SATA 150||Seagate Barracuda V 120GB SATA 150||Seagate Barracuda V 120GB SATA 150|
|Audio||Creative SoundBlaster Live!|
|Graphics||Radeon 9800 Pro 256MB with CATALYST 4.1 drivers|
|OS||Microsoft Windows XP Professional|
|OS updates||Service Pack 1, DirectX 9.0b|
All tests on the Intel systems were run with Hyper-Threading enabled.
Thanks to Corsair for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- Cachemem 2.65MMX
- SiSoft Sandra 2004 (9.89)
- Compiled binary of C Linpack port from Ace’s Hardware
- Discreet 3ds max 5.1 SP1
- NewTek Lightwave 7.5
- Cinebench 2003
- POV-Ray for Windows v3.5
- PICCOLOR v4.0 build 472
- SPECviewperf 7.1.1
- ScienceMark 2.0 beta (23SEP03 build)
- Sphinx 3.3
- LAME 3.95.1 (build from mitiok.cjb.net)
- Xmpeg 5.0.3 with DivX Video 5.11
- FutureMark 3DMark03 build 340
- Comanche 4 demo
- Quake III Arena v1.31
- Serious Sam SE v1.07
- Splinter Cell v1.2
- Unreal Tournament 2003 demo v.2206
- Wolfenstein: Enemy Territory v2.55
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
With unbuffered DIMMs, the Socket 939 processors are able to achieve higher bandwidth in our synthetic memory benchmarks than the Opteron 150/A64 FX-53. The margin of difference is small, but we have a new champ in Sandra’s bandwidth test.
Linpack gives us a nice visual of the difference between the Athlon 64 FX-53 and the 3800+. The red line for the 3800+ starts to dip at around 512K, as its L1 data and L2 caches begin to fill up. The, er, lavender line for the FX-53 keeps chugging along to about 900K before it begins to slow down. The Pentium 4 Extreme Edition, meanwhile, is just showing off.
Here we can see the latency benefits of unbuffered DIMMs. The Socket 939 processors match their single-channel Socket 754 brethren in memory access latency, shaving a few nanoseconds off the access times recorded for the Socket 940 chips. This is but one sample point, of course, but we can get a fuller picture…
Not only are our 3D graphs indulgent, but they’re useful, too. I’ve arranged them manually in a very rough order from worst to best, for what it’s worth. I’ve also colored the data series according to how they correspond to different parts of the memory subsystem. Yellow is L1 cache, light orange is L2 cache, and orange is main memory. The red series, if present, represents L3 cache. Of course, caches sometimes overlap, so the colors are just an interesting visual guide.
Yow. That is one squat graph for the FX-53. Compare the orange bars, which represent accesses to main memory, against the Opteron 150, and you can see clearly how using unbuffered DIMMs can help. Again, it’s not a blockbuster improvement, but the change is tangible.
And the beatings begin. Only the Pentium 4 Extreme Edition keeps Intel in the game here. As expected, the 3800+ runs a little behind the FX-53. Quake III Arena
The move to Socket 939 is just enough to give AMD the lead in Quake III performance, amazingly enough. Wolfenstein: Enemy Territory
Wolf: ET doesn’t seem to benefit much from the FX’s extra 512K of cache, but both of the S939 chips perform very well here, too.
More of the same in Comanche 4… Serious Sam SE
..and in Serious Sam, as well, where the Pentium 4s have traditionally not fared too well. 3DMark03
3DMark03’s overall score is so heavily constrained by our Radeon 9800 Pro 256MB card’s performance that the processor speed barely matters.
However, 3DMark’s CPU tests are another story. A familiar story, in which the Socket 939 processors rank near the top of the charts.
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine that needs the latest computer hardware to run at speeds close to real-time processing. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.
There are two goals with Sphinx. The first is to run it faster than real time, so real-time speech recognition is possible. The second, more ambitious goal is to run it at about 0.8 times real time, where additional CPU overhead is available for other sorts of processing, enabling Sphinx-driven real-time applications.
Sphinx has always responded well to faster memory subsystems, and the lower latencies of the dual-channel Socket 939 systems are no exception. The Pentium 4 Prescott had been the old king in Sphinx, but the new Athlon 64s just edge it out. LAME MP3 encoding
We used LAME to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:
lame –alt-preset extreme file.wav file.mp3
DivX video encoding
This new version of XMPEG includes a benchmark feature, so we’re reporting scores in frames per second now.
The Socket 939 processors hold their own, but media encoding is still the Pentium 4’s greatest strength.
We begin our 3D rendering tests with Discreet’s 3ds max, one of the best known 3D animation tools around. 3ds max is both multithreaded and optimized for SSE2. We rendered a couple of different scenes at 1024×751 resolution, including the Island scene shown below. Our testing techniques were very similar to those described in this article by Greg Hess. In all cases, the “Enable SSE” box was checked in the application’s render dialog.
The new Athlons keep up with the top Pentium 4s, but they don’t overpower the P4s like the model numbers would suggest.
NewTek’s Lightwave is another popular 3D animation package that includes support for multiple processors and is highly optimized for SSE2. Lightwave can render very complex scenes with realism, as you can see from the sample scene, “A5 Concept,” below.
We’ve tested the Hyper-Threaded processors with one, two, and four rendering threads. For non-Hyper-Threaded processors, we just tested with one and two threads. For the line graphs, we’ve tried to pick results from the optimal number of threads to represent each processor.
You may have noticed that cache size differences between the FX-53 and 3800+ didn’t matter a whit in 3ds max. Lightwave is another example of the same. Once more, AMD’s new processors perform well, but don’t exactly overpower the Pentium 4 chips running at 3.4GHz.
POV-Ray is the granddaddy of PC ray-tracing renderers, and it’s not multithreaded in the least, because it’s designed to be a cross-platform application. POV-Ray also relies more heavily on x87 FPU instructions to do its work, and it contains only minor SIMD optimizations.
Cache size differences don’t do a thing for POV-Ray, but the Athlon 64’s FPU alone is plenty sufficient.
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration. Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading, as you can see in the results.
The Pentium 4 wins one outright in Cinebench, in part thanks to its excellent performance with Hyper-Threading in use.
The new Athlon 64s breeze through the rest of the Cinebench tests.
SPECviewperf simulates the graphics loads generated by various professional design, modeling, and engineering applications.
Wow, who could have seen that coming?
ScienceMark is optimized for SSE, SSE2, 3DNow! and is multithreaded, as well. In the interest of full disclosure, I should mention that Tim Wilkens, one of the originators of ScienceMark, now works at AMD. However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website. We are sufficiently satisfied with his efforts, and impressed with the enhancements to the 2.0 beta revision of the application, to continue using ScienceMark in our testing.
The difference between the Athlon 64 3800+ and the FX-53, 512K of L2 cache versus 1MB, doesn’t affect ScienceMark’s pure number-crunching simulations much at all. Still, the new Athlon 64s are both very fast.
The Pentium 4 excels in ScienceMark’s matrix multiplication tests, given the right SSE code path and vectorized data. The Socket 939 processors both deliver even performance, and decent scores, with SSE and with x87 assembly. They also outperform the Pentium 4s with the compiled C code path.
We thank Dr. Reinert Muller with the FIBUS Institute for pointing us toward his picCOLOR benchmark. This image analysis and processing tool is partially multithreaded, and it shows us the results of a number of simple image manipulation calculations.
The Pentium 4 Prescott shows up big in picCOLOR, outpacing the Athlon 64 3800+ by a hair.
Of picCOLOR’s individual tests, only AddressMem is affected significantly by the cache size difference between the FX-53 and 3800+. Even then, the difference is fairly minor.
Well, you’ve seen the benchmark results, and there’s not much I can add. The Athlon 64 FX-53 was already brutally fast, and the new 939-pin version is a tad bit faster still. The Athlon 64 3800+ is darn near as fast as the FX-53, despite having half the L2 cache. If you have the means, I highly suggest picking one up. The question of means, however, is no small thing. AMD obviously has the performance lead over Intel, and the fastest Athlon 64s are priced accordingly. The FX-53 will list for $799cheaper than the Pentium 4 Extreme Edition, but more than, say, a GeForce 6800 Ultra or Radeon X800 XT Platinum Edition, or perhaps a tropical vacation. At $720, the Athlon 64 3800+ won’t be much cheaper, nor will the Socket 754-based 3700+ at $710. The price for the Socket 939-based Athlon 64 3500+ gets closer to reasonable territory at $500. I only wish AMD had kicked one in for review, because I might be recommending it right now.
Unfortunately, that’s the sum total of AMD’s Socket 939 lineup, as far as I know. I would expect AMD to introduce lower cost options as time passes, but for now, Socket 754 looks to have some life left in it as a mainstream platform. Socket 939 is the future, but alas, so are many things that haven’t arrived yet.
First and foremost among them is PCI Express, the new PC expansion standard that will cast yet another Osborne Effect field over the early Socket 939 motherboards and systems. Intel is preparing to unleash PCI-E and a whole gaggle of related standards, including High Definition Audio and the BTX form factor. These things may give Intel the technology lead without the performance lead, creating an unusual dilemma for the would-be system buyer or builder. However, chipset manufacturers are already preparing PCI Express core logic for the Athlon 64 that should support many of these standards, and I’d expect to see the first of these motherboards becoming available in the next few months.
For now, if Doom 3 is looming and you just can’t wait, going Socket 939 will get you a very fast system that should last a quite a while, despite more new goodies on the horizon.