Single page Print

AMD's Athlon XP 3000+ processor


Barton caches in
— 12:00 AM on February 10, 2003

AMD'S POSITION IN the desktop processor performance race has been tenuous in the past few months. Last October, we found the Athlon XP 2800+ to be a very competent performer, but even today, those chips are as rare a GeForce FX. Since then, Intel has introduced its Pentium 4 3.06GHz chip, which is fast all by itself, and gets a further boost from Intel's Hyper-Threading technology. We declared the P4 3.06GHz "the fastest PC processor available" upon its introduction in November.

"Not to worry," AMD fans told us, "Hammer is coming." The desktop version of AMD's eighth-generation processor core, known as Athlon 64, was due to arrive Any Day Now, and this new CPU would help AMD fight off Intel's challenge with the Pentium 4. Or so we thought. AMD announced recently that the Athlon 64 is delayed again, this time until September of 2003. AMD will still introduce a Hammer-based CPU soon, but only the Opteron variant, which is destined for servers.

Until September, good ol' Athlon XP will have to fend off the Pentium 4 all by itself. In order to aid in this mission, AMD has outfitted the newest incarnation of the Athlon XP, code-named Barton, with twice the level 2 cache of previous models—up from 256K to 512K. All other things being equal, raising cache size from 256K to 512K can provide a healthy performance boost. We saw it happen when Intel introduced the "Northwood" variant of the Pentium 4; since then, the P4 has been a better, more balanced performer.

Getting to know Barton
Thanks to its larger cache, Barton is a bigger chip than the "Thoroughbred" core that precedes it. Thoroughbred was made up of 37.6 million transistors, and Barton is a heftier 54.3 million. The Pentium 4 "Northwood" is 55 million transistors, so the Athlon XP has finally caught up with the P4 in transistor count. (The newest graphics processors are well over 100 million transistors, to give you some perspective.) Size-wise, Barton is a just little longer than the T-bred, which raises its total surface area from 84 mm2 to 101 mm2. A lot of cache transistors can really get packed into a small space, so it's no surprise Barton isn't a huge chip. The P4 Northwood is still the big kid on the block at 145 mm2.

All Athlons have 128K of L1 cache memory onboard, split evenly between instruction and data caches. Because the Athlon's L2 cache is exclusive—that is, it doesn't replicate the contents of the L1 cache—the effective total cache size is a little higher than the Pentium 4. Taken together, the Athlon Thoroughbred's 64K L1 data cache and 256K L2 cache team up to offer an effective 320K of on-chip cache. Similarly, Barton's 64K L1 data cache and 512K L2 cache give it an effective cache size of 576K.

Exciting, huh?

The thing is, cache size alone isn't a great indicator of anything much. We can't compare the cache sizes of the Pentium 4 and Athlon XP against one another and draw too many conclusions about performance on that basis alone. The Athlon XP and Pentium 4 are very different animals.

However, we can probably predict how more cache will affect the Athlon XP's performance, in very general terms. Let me oversimplify it for you. The point of having a cache is to eliminate slow and costly accesses to main memory. Main memory has less bandwidth than an on-chip cache, and more importantly, accessing memory takes, as they say in the industry, a Very Long Time. Cache reduces the use of main memory by storing data or instructions likely to be used soon. Modern processors use algorithms to determine what gets stored in a cache, and the newest x86 processors (including the P4 and Athlon XP) attempt to anticipate what data will be needed next and "pre-fetch" this data into their L2 caches. When all goes as planned, the data and instructions for many common program loops can run in cache without having to access main memory. Not all of them will fit, however, into a relatively small 320K cache.

Barton's larger cache will likely allow some new program loops to run without hitting main memory so often. In those cases, Barton will outrun Thoroughbred. You'll see it in the benchmarks. In other cases, Barton's 576K of cache won't help at all. You'll see that in the benchmarks, too.