Tick-tock. Tick-tock. The sound of Intel's ongoing CPU development cycle has been constantly in the backdrop for its biggest competitor, AMD, ever since the world's largest chipmaker set an aggressive cadence for itself more than five years ago. Since then, Intel has turned over new manufacturing technologies followed by extensively revised CPU architectures in relentless succession. The introduction of Sandy Bridge processors at the beginning of this year put Intel firmly in the lead in terms of overall performance, power efficiency, and the value proposition offered to consumers.
Being the perennial number-two CPU maker in such a competitive context can't be easy, but AMD hasn't taken the challenge lightly. In fact, the firm has been working for several years on a brand-new breed of PC processors based on a fresh microarchitecture, code-named "Bulldozer," that aims to restore some competitive balance. Nearly every CPU AMD has made for the past decade-plus (with the exception of the low-power Ontario/Zacate E-series APUs) has been derived from the original K7, the chip first known as Athlon. Bulldozer draws on that tradition in various ways, but it is a novel, clean-sheet design intended to take AMD processors into their next era.
To that end, Bulldozer introduces some unorthodox concepts into the PC processor space. The first of those is a dual-core "module" as a fundamental building block. To date, we've seen x86-compatible CPU cores capable of tracking and executing two threads via a feature known as simultaneous multithreading (SMT), better known by its Intel marketing name, Hyper-Threading, and we've had a number of chips with multiple cores onboard in a chip-level multiprocessor (CMP) configuration, tracing back to the original Athlon 64 X2. The Bulldozer module is sort of a mid-point between those two familiar arrangements. AMD says the module has "two tightly coupled integer cores" with some sharing of resources—including, notably, the FPU. The idea is to save space on the silicon die by pooling resources where possible while still offering "robust" performance on both threads, with fewer of the performance hazards created by SMT or Hyper-Threading.
At the same time, Bulldozer resurrects a concept that's fallen out of favor in PC processors in recent years: it's a "speed demon," optimized for higher clock frequencies rather than maximum instruction throughput in each clock cycle. The Pentium 4 "Netburst" microarchitecture—particularly in its troubled "Prescott" incarnation—gave frequency-optimized designs a reputation for high power draw and iffy performance. Yet Chief Architect Mike Butler told us the engineering team's goal with Bulldozer was to "hold the line" on instructions per clock (presumably at about the same rate as the Phenom II) and to "aggressively pursue higher frequencies." Speed demons have typically reduced the amount of work done at each stage of the pipeline in order to simplify logic and thus enable higher operating frequencies, but this approach can also theoretically help manage power consumption. The rationale, if we understand Butler correctly, is that a design with a relatively low number of gates flipping at each pipeline stage may require less voltage to operate at a given frequency. Chip power consumption has three main determinants: clock speed, the number of transistors flipping, and the square of the voltage. That voltage squared term is, obviously, the single biggest factor in the power equation, so a design capable of keeping voltage in check could make some sense for today's power-constrained world.
We don't know precisely how aggressively AMD has pursued the speed-demon approach. When we asked, AMD declined to tell us the number of stages in Bulldozer's main pipeline. This behavior seems unusually guarded. We've been writing about these things for over a decade, and an outright refusal to disclose pipeline depth in a major x86 processor is very rare. Our sense is that it's somewhere between the 12-14 stages of contemporary Core- and Phenom-branded chips and the astounding 31 stages in Prescott. I expect we'll learn more about Bulldozer's inner workings as time passes.
At any rate, here's the big picture. The first incarnation of the Bulldozer architecture is a formidable chip, with four modules onboard. That gives it a total of eight integer cores, with four floating-point units. Each module has 2MB of L2 cache, and there's a shared third-level cache of 8MB. This chip retains compatibility with AMD's existing system architecture, so it has an integrated memory controller with support for dual channels of DDR3. Also present are four HyperTransport links, only one of which will be used in desktop products.