Somewhere around mid-morning this past Friday, a rather large package made its way into the depths of Damage Labs.
Inside was a server containing something very special: a pair of AMD's new quad-core Opteron processors. The chip code-named "Barcelona" has been something of an enigma during its development, both because of questions about exactly when it would arrive and how it would perform when it did. After a long, hot weekend of non-stop testing, we have some answers to those questions. AMD is formally introducing its Barcelona-based Opteron 2300-series processors today, so the time is now. As for the performance, well, keep reading to see exactly how the new Opterons compare to Intel's quad-core Xeons.
Introducing the Opteron 2300 series
As I said, we received AMD's new Opterons just this past Friday. I've concentrated my efforts since then on testing the heck out of them, so you're going to be spared my attempts to summarize this new CPU architecture in any kind of depth. If you're unfamiliar with AMD's K10 architecture and want an in-depth look at how it works, let me suggest reading David Kanter's excellent overview of Barcelona. I will give you some basics, though.
Barcelona is a single-chip, native quad-core design. Each of those cores have been substantially revised to improve performance per clock cycle through a variety of tweaks, some big and some small. The cores now have a wider, 32-byte instruction fetch, and the floating-point units can execute 128-bit SSE operations in a single clock cycle (including the Supplemental SSE3 instructions Intel included in its Core-based Xeons). Accordingly, the Barcelona core has more bandwidth throughout in order to accommodate higher throughputinternally between units on the chip, between the L1 and L2 caches, and between the L2 cache and the north bridge/memory controller.
AMD has also added an L3 cache to the chip. That results in a cache hierarchy that includes 64KB of dedicated L1 cache and 512KB of dedicated L2 cache per core, bolstered by a 2MB L3 cache that's shared dynamically between all four cores. The total effective cache size is still much smaller than Intel's Xeons, but AMD claims its mix of dedicated and shared caches can avoid contention problems that Intel's large, shared L2 might have.
Behind this L3 cache sits an improved memory controller, still integrated into the CPU as with previous Opterons. AMD claims this memory controller is better able to take advantage of the higher bandwidth offered by DDR2 memory thanks to a number of enhancements, including buffers that are between 2X and 4X the size of those in previous Opterons and an improved prefetch mechanism. Perhaps most notably, the new controller can access each 64-bit memory channel independently, reading from one while writing to another, instead of just treating dual memory channels as a single 128-bit device.
Throughout Barcelona, from this memory controller to the CPU cores, AMD has made revisions with power-efficiency in mind. That starts with clock gating, whereby portions of the chip not presently in use are temporarily deactivated. AMD says it has improved its clock gating on both coarse- and fine-grained scales, combining the ability to turn off, say, the entire floating-point unit when running integer-heavy code with the ability to put smaller logic blocks on the chip to sleep when they're not needed. Even the memory controller will turn off its write logic during reads and vice-versa.
Clock gating is a commonly used technique these days, but some of Barcelona's tricks are more novel. Unlike other x86 multicore processors, each of Barcelona's CPU cores is clocked independently, so that each one can raise and lower its clock speed (via PowerNow) dynamically in response to demand. (In Intel's current Xeons, one core at high utilization means the other core on that chip must run at a higher clock speed, as well.) Barcelona's CPU voltage is still dependent on power state of the core with highest utilization, but AMD has separated the power plane for the chip's CPU core from the power plane for its memory controller. As a result, the memory controller and CPU cores can each draw only the power they need.
All told, these modifications led to a chip comprised of approximately 463 million transistors. As manufactured on AMD's 65nm SOI process, Barcelona measures 285mm².
The obvious goals for Barcelona included several key things: doubling the number of CPU cores per socket, raising the number of instructions each core can execute per clock, keeping power use relatively low by taking advantage of opportunities for dynamic scaling, and in doing so, achieving vastly improved performance per watt. AMD also sought to extend its excellent HyperTransport-based system architecture, although many of those improvements will have to wait for platform and chipset updates. The most urgent overarching goal, though, was undoubtedly restoring AMD's competitive position compared to Intel's Xeons based on the formidable Core microarchitecture.
The nuts and bolts of the quad-core Opterons
AMD continues its tradition with these new Opterons of making them drop-in replacements for the existing infrastructure. In this case, that infrastructure involves Socket F-class servers and workstations. With only a BIOS update, these systems can move from dual-core to quad, without need for a change in motherboards, cooling solutions, or power suppliesnot a bad proposition at all. That upgrade proposition does come with a caveat, though: older motherboards that don't support Barcelona's split power planes will suffer a performance hit with certain Opteron 2300 models. For example, the Opteron 2350's default memory controller clock is 1.8GHz. Without separate voltage domain, though, the 2350's memory controller drops to 1.6GHz. That matters quite a bit more than you might think, in part because the L3 cache uses the same clock.
AMD is introducing another innovation of sorts with Barcelona in the form of a new power rating, dubbed ACP for "average CPU power." Differences in describing a processor's maximum power and thermal envelope, known as Thermal Design Power, have long been a source of contention between Intel and AMD. For ages, AMD has argued that its TDP ratings are an absolute maximum while Intel's are something less than that, andhey, not fair! At the same time, AMD hasn't had the same class of dynamic thermal throttling that Intel's chips have, so it's had to make do with more conservative estimates. The problem, according to AMD, is that its numbers were being compared directly to Intel's, which could be misleadingparticularly since its processors incorporate a north bridge, as well.
At long last, AMD is looking to sidestep this issue by creating a new power rating for its CPUs. Despite the name, ACP is not so much about "average" power use but about power use during high-utilization workloads. AMD has a methodology for defining a processor's ACP that involves real-world testing with such workloads, and the company will apparently be using ACP as the primary way to talk about its CPUs' power use going forward, though it will still disclose max power, as well. To give you a sense of things, standard Opterons with a 95W max power rating will have a 75W ACP. This move may be controversial, but personally, I think it's probably justifiable given the power draw profiles we've seen from Opterons. I'm not especially excited about it one way or another since we spend hours measuring CPU power use around here. We'll show you numbers shortly, and you can decide what to think about them.
Now that you know what ACP means, here's a look at the initial Opteron 2300 lineup, complete with ACP and TDP numbers for each part.
|Opteron 2347 HE||1.9GHz||1.6GHz||55W||68W||$377|
|Opteron 2346 HE||1.8GHz||1.6GHz||55W||68W||$255|
|Opteron 2344 HE||1.7GHz||1.4GHz||55W||68W||$209|
These chips fit into the same basic power envelopes as current Opterons, obviously, and AMD continues to offer HE models with higher power efficiency for a slight price premium. These first chips run at rather low clock frequencies, with even lower memory controller/L3 cache speeds. Fortunately, AMD does plan to ramp up clock speeds. To demonstrate that, they shipped us a pair of 2.5GHz Barcelona engineering samples at the eleventh hour, which they later christened as the Opteron 2360 SE. These higher-frequency products won't be available until some time in the fourth quarter of this year, but we can give you a preview of their performance today. Naturally, we've run them through our full gamut of tests, along with a pair of 2GHz Opteron 2350s. We also have a pair of Opteron 2347 HEs, but we've had to defer that review to another day due to time constraints.