We've spent the past few weeks testing Hammer-based chips, including the Athlon 64 and its new big brother, the mighty Athlon 64 FX-51. Not only that, but Intel slipped us a Pentium 4 3.2GHz Extreme Edition processor at the last minute, and we've benchmarked it, as well2MB L3 cache and all. We can honestly say we were blown away by the performance of these new chips. The future is now. Read up to see how good it can be.
Hammer comes to the desktop
The Hammer CPU core is an evolutionary design based AMD's K7 microarchitecture. Nonetheless, Hammer is revolutionary, not so much because of what goes on inside the chip itself, but because of how it talks to the rest of the computer. We have dedicated most of our time and effort in preparing this review to empirical testing, so we're not able to cover Hammer's architectural innovations in as much detail as we'd like. Still, we'll hit some of the major points that make AMD's new processor distinctive. Among them:
Beyond the basic performance benefits, the movement of the memory controller on die has implications for the organization of the entire Hammer platform. Core logic chipsets no longer need to provide memory controllers, and the Hammer, strictly speaking, has no traditional front-side bus. Even more mind-bendingly, multiprocessor Hammer systems have individual banks of memory for each processor, so they should scale very well as processors are added.
Hammer systems use HyperTransport for several things. In all Hammer systems, one of the CPUs (or the only CPU) talks to the rest of the system over a HyperTransport link. Traditional chipset services like AGP, PCI, and south bridge I/O are delivered over this link much like VPN tunnels are delivered over TCP/IP connections in a computer network. Done right, HyperTransport should simplify motherboard design by replacing slower and wider connections that require more traces to achieve similar results. In multiprocessor implementations, HyperTransport links between processors allow for inter-chip communications, as well.
Hammer's L1 cache sizes are unchanged from K7 at 64K for instructions and 64K for data. AMD's caches tend to be exclusive, and that's the case with Hammer; these caches don't replicate the contents of the L1 cache. With the L1 data and L2 caches combined, the Hammer chips' total effective data cache size is 1088K.
We've seen many times before the impact larger caches have on performance. Generally, more cache is better, but many tasks pull through too much data to derive any benefit from extra cache, so the benefits are uneven.
AMD's move to 64 bits accomplishes several things. First, it eliminates the barrier of 4GB of addressable memory in 32-bit systems. 4GB may sound like a lot today, but as an upper limit, 4GB could become a nasty constraint, even on common desktop systems, in the next few years.
Second, by adding 32-bit extensions to the x86 ISA, AMD has created an evolutionary alternative to Intel's Itanium chips, which break almost entirely with the industry-standard x86 software infrastructure. Naturally, code will have to be recompiled for AMD64, but AMD64 is familiar enough that retooling compilers for it should be relatively painless.
Finally, AMD64's additional registers, which are present in Hammer, promise better performance on recompiled code. (Registers are essentially temporary local storage slots on a processor. More of them means less storing data in cache or memory.) Addressing memory in 64-bit chunks won't, by itself, necessarily improve performance. The Hammer has eight new 64-bit integer registers and eight new 128-bit SSE/SSE2 registers to help.
The move to SOI is crucial because AMD's enhancements to Hammer add up to a whole lot more transistors per chip than the K7. The last revision of the Athlon XP, code-named Barton, had 54.3 million transistors and a die size of 101 square millimeters. The Northwood Pentium 4 has 55 million transistors on a die that's 145 square millimeters. By contrast, the Athlon 64 packs 105.9 million transistors onto a 192 square millimeter die.
|Here's another reason the GeForce GTX 970 is slower than the GTX 980||11|
|This might be why Windows 10 isn't called Windows 9||50|
|The Windows 10 Technical Preview is available now||35|
|ARM announces OS, server tools for the Internet of things||10|
|Borderlands 2 comes to SteamOS, and The Pre-Sequel will follow||15|
|Haswell duallie infiltrates Zotac Nano XS mini PC||5|
|Mozilla unveils $25 Matchstick HDMI dongle||13|
|Self-destruct sequence fractures the NAND in ultra-secure SSD||17|