AMD’s Athlon 64 3800+ and FX-53 processors

I KNOW FOR A FACT that more than one PC enthusiast has been eyeing the Athlon 64 with anticipation, but has been paralyzed by the Osborne Effect field surrounding the Socket 754 versions of the processor. You see, back when the Athlon 64 was launched, AMD promised the world—right out of the gate—a new, 939-pin socket that would bring dual-channel memory to every Athlon 64. Since then, the single-channel versions of the Athlon 64 have been handing out beatings to the Pentium 4 in most gaming benchmarks. Enthusiasts have been torn between grabbing for the extra performance now or holding out for the big prize, the Socket 939 versions of the Athlon 64.

Meanwhile, the dual-channel Athlon 64 variants, the FX-51 and FX-53, have been absolutely laying waste to our entire benchmark suite. Those chips are expensive, however, and have a few liabilities. The Athlon 64 FX processors inherit their dual-channel memory capabilities from the Opteron line, and they slide into the same 940-pin socket as the Opteron. Socket 940 processors require registered DIMMs, which are a little odd, a little pricey, and have slightly higher memory access latencies than regular ol’ unbuffered DIMMs. Also, the motherboard options for the Athlon 64 FX have been adequate, but not particularly spectacular, varied, or affordable.

Socket 939 promises to address these problems by allowing motherboard manufacturers to make dual-channel DDR400 motherboards with cheaper four-layer PCBs, and by allowing the use of unbuffered DIMMs. Also, AMD has officially blessed the use of a 5X multiplier on the HyperTransport links connecting Socket 939 processors to the rest of the system, raising the peak effective bandwidth for this link to 8.0GB/s.

Today, AMD lifts the curtains on four new processors, three of them for Socket 939. We’ve had two of those CPUs, the Athlon 64 3800+ and Athlon 64 FX-53 (939-pin edition), in Damage Labs for review. Naturally, we’ve run them through our gauntlet of benchmarks and compared them to nearly 20 different competing processors. Keep reading to find out whether Socket 939 lives up to its promise.

Socket 939 up close
Both of the new AMD CPUs we have for review run at 2.4GHz and drop into a 939-pin socket. In fact, the main difference between the Athlon 64 3800+ and Athlon 64 FX-53 is 512K of on-chip cache. Like its 940-pin counterpart, the Socket 939 Athlon 64 FX has 1MB of L2 cache on chip, while the Athlon 64 3800+ has 512K of L2 cache. Before we get worked into a fit of hysterics trying to explain AMD’s model number schemes, let’s have a look one of the new Socket 939 chips.


The Athlon 64 3800+


The 939-pin underside of the Athlon 64 3800+


The 940-pin belly of the Opteron 150

The 939-pin Athlon 64 doesn’t look wildly different from its predecessors, and at first glance, one might think that the only difference between the 940- and 939-pin chips was that single missing pin at the lower left corner of the picture above. However, a closer look reveals that AMD has moved the location of some of the “gaps” in the pin configuration where no pins exist. Socket 939 is well and truly incompatible with Socket 940, so there’s no hope of running a 939-pin chip in a 940-pin motherboard. The worlds of the Athlon and the Opteron will be a little more separate from now on.

Performance prospects and model numbers
In addition to the Athlon 64 3800+ and FX-53 (939-pin edition), AMD is introducing two other chips: the 939-pin Athlon 64 3500+ running at 2.2GHz with 512K of L2 cache, and the Athlon 64 3700+ intended for Socket 754, with a 1MB L2 cache and a 2.4GHz clock speed.

Confused yet? If so, you’re probably not alone. AMD’s model numbering methods continue to elude me. Prior to the introduction of these chips, I had thought that AMD was using clock speed increments and cache sizes in a fairly predictable way: adding or removing 200MHz in clock frequency would move the model number up or down by 200, as would the transition from 512K of L2 cache to 1MB (or vice versa). Thus, the Athlon 64 3000+ runs at 2.0GHz with 512K of L2 cache, while the Athlon 64 3200+ runs at 2.0GHz with 1MB of L2 cache. The Athlon 64 3400+ also packs 1MB of L2, but clocks in at 2.2GHz. All neat and logical, right?

In fact, early speculation about Socket 939 chips suggested AMD would use the addition of a second memory channel interchangeably with the cache size and clock speed increments, so that an Athlon 64 clocked at 2.2GHz with 512K of L2 cache and dual-channel memory would earn the rank of 3400+. That would be the neat and logical thing to do, but such is not the case. Instead, AMD has called that product the 3500+. And just to make things more confusing, the new Socket 754 chip that differs from the Athlon 64 3400+ in nothing but 200MHz of clock frequency has been dubbed the 3700+!

What’s changed? Well, perhaps that mythical Athlon “Thunderbird” processor, against whose performance at a given clock speed the model numbers for all subsequent Athlon products are purportedly determined, hit a performance scaling problem at 3.5GHz in AMD’s secret labs. Har har. But the more obvious culprit is Intel’s Pentium 4 “Prescott” processor, which delivers less performance, clock for clock, than previous Pentium 4 chips. No doubt AMD has tweaked its numbering formula to aid comparisons with the latest Pentium 4s.

To help you navigate these troubled waters, let me present to you a partial guide to AMD “Hammer” model numbers, which I cooked up one night while trying to make sense of it all.

It all looks sensible when laid out that way. Until you really think about it.


Socket 939 gets the extreme close-up treatment

So what kind of performance can we expect from these new chips? Well, we’ve already reviewed the Athlon 64 FX-53 in its Socket 940 form, and we know that it’s stinking fast. The move to Socket 939 shouldn’t change the math too much, but a couple of factors may work in its favor to some degree. Switching to unbuffered DIMMs should cut memory access latencies ever so slightly. That’s always a good thing.

As for the 5X multiplier on the HyperTransport link, its benefits may be difficult to spot. Juicing up HyperTransport—to a new peak of 8.0GB/s at an effective 2GHz clock rate—is a forward-looking move that will probably pay off when PCI Express arrives. This link serves as the Athlon 64’s connection to the rest of the system, but with the memory controller onboard the CPU, the HyperTransport link isn’t particularly burdened in current systems. Our review of the VIA K8T800 Pro chipset showed us few instances where faster HyperTransport seemed to register in the benchmark scores.

You can compare the Socket 940 version of the Athlon 64 FX-53 to the 939-pin version when reading our benchmark results, but you need to know a secret code to do it. That secret code is this:

Opteron 150 = Athlon FX-53 (940-pin version)

These two products are essentially the same thing with different names, with the notable exceptions that the Opteron name is more respectable in proper computing circles, while the FX-53 has an unlocked multiplier for us disreputable overclockers to tweak. The performance is identical, so long as there’s no overclocking at work. I’ve left the Opteron 150 label on our graphs because, well, that’s what I used in testing. The Athlon 64 3800+ is just an Athlon 64 FX-53 minus 512K of L2 cache. Sometimes, its smaller cache will slow it down a little bit, but other times, the 3800+ should run neck-and-neck with the FX-53. All depends on whether the application is able to use the FX’s extra L2 cache to its advantage.

But enough gabbing. Let’s see the numbers.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test systems were configured like so:

Processor Athlon XP-M ‘Barton’ 2500+ 2.4GHz Athlon XP ‘Barton’ 3200+ 2.2GHz Athlon XP ‘Barton’ 3000+ 2.167GHz Athlon 64 3000+ 2.0GHz
Athlon 64 3200+ 2.0GHz
Athlon 64 3400+ 2.2GHz
Opteron 146 2.0GHz
Opteron 148 2.2GHz
Opteron 150 2.4GHz
Athlon 64 3800+ 2.4GHz
Athlon 64 FX-53 2.4GHz
Pentium 4 2.8’C’GHz
Pentium 4 3.2GHz
Pentium 4 3.4GHz
Pentium 4 3.2GHz Extreme Edition
Pentium 4 3.4GHz Extreme Edition
Pentium 4 2.8’E’GHz
Pentium 4 3.0’E’GHz
Pentium 4 3.2’E’GHz
Pentium 4 3.4’E’GHz
Front-side bus 400MHz (200MHz DDR) 400MHz (200MHz DDR) 333MHz (166MHz DDR) HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
HT 16-bit/800MHz downstream
HT 16-bit/800MHz upstream
HT 16-bit/1GHz downstream
HT 16-bit/1GHz upstream
800MHz (200MHz quad-pumped)
Motherboard Abit AN7 Asus A7N8X Deluxe v2.0 Asus A7N8X Deluxe v2.0 MSI K8T Neo MSI 9130 MSI MS-6702E Abit IC7-G
BIOS revision 1.4 C1007 C1007 1.1 1.31 3.0B10 IC7_21.B00
North bridge nForce2 SPP nForce2 SPP nForce2 SPP K8T800 K8T800 K8T800 Pro 82875P MCH
South bridge nForce2 MCP-T nForce2 MCP-T nForce2 MCP-T VT8237 VT8237 VT8237 82801ER ICH5R
Chipset drivers ForceWare 3.13 ForceWare 3.13 ForceWare 3.13 4-in-1 v.4.51
ATA 5.1.2600.220
4-in-1 v.4.51
ATA 5.1.2600.220
4-in-1 v.4.51
ATA 5.1.2600.220
INF Update 5.1.1002
Memory size 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs) 1GB (2 DIMMs)
Memory type Corsair TwinX XMS4000 DDR SDRAM at 400MHz Corsair TwinX XMS4000 DDR SDRAM at 400MHz Corsair TwinX XMS4000 DDR SDRAM at 333MHz Corsair TwinX XMS4000 DDR SDRAM at 400MHz Corsair CMX512RE-3200LL PC3200 registered DDR SDRAM at 400MHz Corsair TwinX XMS3200LL DDR SDRAM at 400MHz Corsair TwinX XMS4000 DDR SDRAM at 400MHz
CAS latency 2 2 2 2 2 2 2
Cycle time 6 6 5 5 6 6 6
RAS to CAS delay 3 3 3 3 3 3 4
RAS precharge 3 2 2 3 2 3 4
Hard drive Seagate Barracuda V 120GB ATA/100 Seagate Barracuda V 120GB ATA/100 Seagate Barracuda V 120GB ATA/100 Seagate Barracuda V 120GB SATA 150 Seagate Barracuda V 120GB SATA 150 Seagate Barracuda V 120GB SATA 150 Seagate Barracuda V 120GB SATA 150
Audio Creative SoundBlaster Live!
Graphics Radeon 9800 Pro 256MB with CATALYST 4.1 drivers
OS Microsoft Windows XP Professional
OS updates Service Pack 1, DirectX 9.0b

All tests on the Intel systems were run with Hyper-Threading enabled.

Thanks to Corsair for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering.

The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Benchmark results
Memory performance

With unbuffered DIMMs, the Socket 939 processors are able to achieve higher bandwidth in our synthetic memory benchmarks than the Opteron 150/A64 FX-53. The margin of difference is small, but we have a new champ in Sandra’s bandwidth test.

Linpack gives us a nice visual of the difference between the Athlon 64 FX-53 and the 3800+. The red line for the 3800+ starts to dip at around 512K, as its L1 data and L2 caches begin to fill up. The, er, lavender line for the FX-53 keeps chugging along to about 900K before it begins to slow down. The Pentium 4 Extreme Edition, meanwhile, is just showing off.

Here we can see the latency benefits of unbuffered DIMMs. The Socket 939 processors match their single-channel Socket 754 brethren in memory access latency, shaving a few nanoseconds off the access times recorded for the Socket 940 chips. This is but one sample point, of course, but we can get a fuller picture…

Memory performance (continued)
Not only are our 3D graphs indulgent, but they’re useful, too. I’ve arranged them manually in a very rough order from worst to best, for what it’s worth. I’ve also colored the data series according to how they correspond to different parts of the memory subsystem. Yellow is L1 cache, light orange is L2 cache, and orange is main memory. The red series, if present, represents L3 cache. Of course, caches sometimes overlap, so the colors are just an interesting visual guide.

Yow. That is one squat graph for the FX-53. Compare the orange bars, which represent accesses to main memory, against the Opteron 150, and you can see clearly how using unbuffered DIMMs can help. Again, it’s not a blockbuster improvement, but the change is tangible.

Unreal Tournament 2003

And the beatings begin. Only the Pentium 4 Extreme Edition keeps Intel in the game here. As expected, the 3800+ runs a little behind the FX-53. Quake III Arena

The move to Socket 939 is just enough to give AMD the lead in Quake III performance, amazingly enough. Wolfenstein: Enemy Territory

Wolf: ET doesn’t seem to benefit much from the FX’s extra 512K of cache, but both of the S939 chips perform very well here, too.

Comanche 4

More of the same in Comanche 4… Serious Sam SE

..and in Serious Sam, as well, where the Pentium 4s have traditionally not fared too well. 3DMark03

3DMark03’s overall score is so heavily constrained by our Radeon 9800 Pro 256MB card’s performance that the processor speed barely matters.

However, 3DMark’s CPU tests are another story. A familiar story, in which the Socket 939 processors rank near the top of the charts.

Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine that needs the latest computer hardware to run at speeds close to real-time processing. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance.

There are two goals with Sphinx. The first is to run it faster than real time, so real-time speech recognition is possible. The second, more ambitious goal is to run it at about 0.8 times real time, where additional CPU overhead is available for other sorts of processing, enabling Sphinx-driven real-time applications.

Sphinx has always responded well to faster memory subsystems, and the lower latencies of the dual-channel Socket 939 systems are no exception. The Pentium 4 Prescott had been the old king in Sphinx, but the new Athlon 64s just edge it out. LAME MP3 encoding
We used LAME to encode a 101MB 16-bit, 44KHz audio file into a very high-quality MP3. The exact command-line options we used were:

lame –alt-preset extreme file.wav file.mp3

DivX video encoding
This new version of XMPEG includes a benchmark feature, so we’re reporting scores in frames per second now.

The Socket 939 processors hold their own, but media encoding is still the Pentium 4’s greatest strength.

3ds max rendering
We begin our 3D rendering tests with Discreet’s 3ds max, one of the best known 3D animation tools around. 3ds max is both multithreaded and optimized for SSE2. We rendered a couple of different scenes at 1024×751 resolution, including the Island scene shown below. Our testing techniques were very similar to those described in this article by Greg Hess. In all cases, the “Enable SSE” box was checked in the application’s render dialog.

The new Athlons keep up with the top Pentium 4s, but they don’t overpower the P4s like the model numbers would suggest.

Lightwave rendering
NewTek’s Lightwave is another popular 3D animation package that includes support for multiple processors and is highly optimized for SSE2. Lightwave can render very complex scenes with realism, as you can see from the sample scene, “A5 Concept,” below.

We’ve tested the Hyper-Threaded processors with one, two, and four rendering threads. For non-Hyper-Threaded processors, we just tested with one and two threads. For the line graphs, we’ve tried to pick results from the optimal number of threads to represent each processor.

You may have noticed that cache size differences between the FX-53 and 3800+ didn’t matter a whit in 3ds max. Lightwave is another example of the same. Once more, AMD’s new processors perform well, but don’t exactly overpower the Pentium 4 chips running at 3.4GHz.

POV-Ray rendering
POV-Ray is the granddaddy of PC ray-tracing renderers, and it’s not multithreaded in the least, because it’s designed to be a cross-platform application. POV-Ray also relies more heavily on x87 FPU instructions to do its work, and it contains only minor SIMD optimizations.

Cache size differences don’t do a thing for POV-Ray, but the Athlon 64’s FPU alone is plenty sufficient.

Cinebench 2003 rendering and shading
Cinebench is based on Maxon’s Cinema 4D modeling, rendering, and animation app. This revision of Cinebench measures performance in a number of ways, including 3D rendering, software shading, and OpenGL shading with and without hardware acceleration. Cinema 4D’s renderer is multithreaded, so it takes advantage of Hyper-Threading, as you can see in the results.

The Pentium 4 wins one outright in Cinebench, in part thanks to its excellent performance with Hyper-Threading in use.

The new Athlon 64s breeze through the rest of the Cinebench tests.

SPECviewperf workstation graphics
SPECviewperf simulates the graphics loads generated by various professional design, modeling, and engineering applications.

Wow, who could have seen that coming?

ScienceMark
ScienceMark is optimized for SSE, SSE2, 3DNow! and is multithreaded, as well. In the interest of full disclosure, I should mention that Tim Wilkens, one of the originators of ScienceMark, now works at AMD. However, Tim has sought to keep ScienceMark independent by diversifying the development team and by publishing much of the source code for the benchmarks at the ScienceMark website. We are sufficiently satisfied with his efforts, and impressed with the enhancements to the 2.0 beta revision of the application, to continue using ScienceMark in our testing.

The difference between the Athlon 64 3800+ and the FX-53, 512K of L2 cache versus 1MB, doesn’t affect ScienceMark’s pure number-crunching simulations much at all. Still, the new Athlon 64s are both very fast.

The Pentium 4 excels in ScienceMark’s matrix multiplication tests, given the right SSE code path and vectorized data. The Socket 939 processors both deliver even performance, and decent scores, with SSE and with x87 assembly. They also outperform the Pentium 4s with the compiled C code path.

picCOLOR image analysis
We thank Dr. Reinert Muller with the FIBUS Institute for pointing us toward his picCOLOR benchmark. This image analysis and processing tool is partially multithreaded, and it shows us the results of a number of simple image manipulation calculations.

The Pentium 4 Prescott shows up big in picCOLOR, outpacing the Athlon 64 3800+ by a hair.

Of picCOLOR’s individual tests, only AddressMem is affected significantly by the cache size difference between the FX-53 and 3800+. Even then, the difference is fairly minor.

Conclusions
Well, you’ve seen the benchmark results, and there’s not much I can add. The Athlon 64 FX-53 was already brutally fast, and the new 939-pin version is a tad bit faster still. The Athlon 64 3800+ is darn near as fast as the FX-53, despite having half the L2 cache. If you have the means, I highly suggest picking one up. The question of means, however, is no small thing. AMD obviously has the performance lead over Intel, and the fastest Athlon 64s are priced accordingly. The FX-53 will list for $799—cheaper than the Pentium 4 Extreme Edition, but more than, say, a GeForce 6800 Ultra or Radeon X800 XT Platinum Edition, or perhaps a tropical vacation. At $720, the Athlon 64 3800+ won’t be much cheaper, nor will the Socket 754-based 3700+ at $710. The price for the Socket 939-based Athlon 64 3500+ gets closer to reasonable territory at $500. I only wish AMD had kicked one in for review, because I might be recommending it right now.

Unfortunately, that’s the sum total of AMD’s Socket 939 lineup, as far as I know. I would expect AMD to introduce lower cost options as time passes, but for now, Socket 754 looks to have some life left in it as a mainstream platform. Socket 939 is the future, but alas, so are many things that haven’t arrived yet.

First and foremost among them is PCI Express, the new PC expansion standard that will cast yet another Osborne Effect field over the early Socket 939 motherboards and systems. Intel is preparing to unleash PCI-E and a whole gaggle of related standards, including High Definition Audio and the BTX form factor. These things may give Intel the technology lead without the performance lead, creating an unusual dilemma for the would-be system buyer or builder. However, chipset manufacturers are already preparing PCI Express core logic for the Athlon 64 that should support many of these standards, and I’d expect to see the first of these motherboards becoming available in the next few months.

For now, if Doom 3 is looming and you just can’t wait, going Socket 939 will get you a very fast system that should last a quite a while, despite more new goodies on the horizon.

0 0 votes
Article Rating
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
BoBzeBuilder
BoBzeBuilder
12 years ago

Holy CRAP MAN, CPUs were expensive back then. AMD was milking the crowd.
I’m thankful I can buy a Quad-core processor for less than $100 and build a powerful PC for the price of one Athlon 64 3500+.

Times have changed.

Pin It on Pinterest

Share This

Share this post with your friends!