As I've said, the Athlon 64 X2 is the same basic chip as the dual-core Opteron, so it shares the same internal architecture. Here is a fancy looking but wildly simplified block diagram of that design.
The X2's two CPU cores share a single, unified system request queue and a crossbar that connects them to the on-chip memory controller and HyperTransport link for I/O. This arrangement should allow the processor's two cores optimal use of available resources without too much contention. The cores themselves are able to communicate with one another through the system request interface. Cache coherency updates and any data transfers between the two cores' caches will happen over this high-speed, on-chip data path.
Despite what you see in the diagram above, the Athlon 64 X2 has only one HyperTransport link, because it will only be used in single-socket systems. The pricier Opterons get more than one link for use in multi-socket configs. That leaves the Athlon 64 X2 with 6.4GB/s of peak theoretical memory bandwidth and 8GB/s of peak theoretical I/O throughput. At 14.4GB/s total, that's well more than the 6.4GB/s peak throughput of Intel's 800MHz front-side bus.
Because Intel's dual-core Smithfield chip has no internal data links between its two cores, all memory accesses, system I/O, and cache coherency updates must happen over its shared front-side bus. That leaves the Athlon 64 X2 with a sizeable bandwidth advantage, at least in theory.
The two chips are very comparable in terms of size and transistor count, though. With 1MB of L2 cache per core, the X2code-named "Toledo" on AMD's roadmapspacks roughly 230 million transistors into a die that's 199 mm2. Intel's Smithfield is strikingly similar at about 233 million transistors and 206 mm2.
The Athlon 64 X2's two cores are both endowed with all of the updates that AMD included in its recent revision E of the K8 architecture. According to AMD, the changes in the E-step chips include the addition of SSE3 instructions, the ability to host mismatched DIMMs on a memory channel with little performance penalty, better memory loading so that a full house of DIMMs won't be a drag on performance, and improved memory mapping.
I don't think that's the whole story, however.
In our testing, we've found that AMD's 90nm chips have faster L2 caches, as demonstrated here. We've also found that the revision E cores perform quite a bit better clock for clock, especially in memory-intensive tasks. That leads me to believe that AMD has implemented some of the other features expected to come along with SSE3 support, perhaps including enhanced data prefetch, additional write-combining buffers, and the ability to convert the LEA instruction to an ADD in certain cases. I tried to shake some more details about these changes out of AMD but wasn't able to get many specifics. You'll see the effect in our benchmark results, though, when the older-rev Athlon 64 FX-55, with a markedly faster memory subsystem, struggles to keep pace with revision E Opteron 152. The X2 chips also perform relatively better clock for clock in single-threaded apps than one might otherwise expect.
Like the revision E chips, the Athlon 64 X2 is manufactured with AMD's 90nm process using silicon-on-insulator (SOI) technology. In addition, AMD has optimized its (much larger) dual-core chips to consume no more power and generate no more heat than its single core parts by tweaking manufacturing techniques. The relatively lower power consumption comes at the expense of clock speed, but obviously the tradeoff isn't huge, since the X2 tops out at just 200MHz less than the Athlon 64 FX series currently does. Between the rev-E performance increases, the power optimizations, and the presence of two cores on one chip, the Athlon 64 X2's performance per clock and per watt should be a sizeable advance over the CPUs AMD was selling just months ago.