The recent advent of Intel's "Nehalem" Xeons had a bit of an apocalyptic feeling to it, when one considered the implications for AMD. Despite strong showings from the past few generations of Xeons and some unfortunate problems for the first quad-core Opterons, Intel never really seemed to open up an insurmountable lead in the two-socket server and workstation spaces. The Opteron's power efficiency was consistently strong, at least, and its outright performance wasn't too far behind the curve. The Nehalem-based Xeons, though, reached dizzying new performance heights with comparatively modest power consumption. One was left to wonder how on earth AMD would respond.
Now we have an answer, and it's an interesting one, to say the least. The newest Opteron, code-named Istanbul, packs not four but six cores on a single die, giving it a considerable boost in performance potential. Not only that, but it's hitting the market early. AMD had originally planned to introduce this product in the October time frame, but the first spin of Istanbul silicon came back solid, so the firm pulled the launch forward into June. Even with the accelerated schedule, of course, Istanbul comes not a moment too soon, now that Nehalem Xeons are out in the wild. We've had a pair of Istanbul chips humming away in our labs for the past week. Let's have a look at whether they can restore the Opteron's competitiveness.
The hexapod cometh
In the wake of Intel's introduction of a radically new platform, AMD is emphasizing buttoned-down continuity for its new Opterons. In fact, this continuity may be Istanbul's defining feature. By and large, Istanbul is essentially a quad-core "Shanghai" processor with two additional cores added to the die. Istanbul is compatible with the existing Socket F infrastructure, so it's an easy drop-in upgrade for existing servers. So long as your Socket F motherboard supports dual power planes, all that's required for an Istanbul upgrade is a quick BIOS flash and a chip swap. (In fact, that's exactly how we prepared our test system for this review.) To get the six-core chips even fit into existing power envelopes, AMD has dialed back clock frequencies slightly, which is why the company cites a general performance boost of around 30% when going from a Shanghai Opteron to an Istanbuldepending, of course, on the workload.
Although AMD expresses hope of in-place server upgrades becoming a healthy portion of its business in a down economy, the more likely payoff for Istanbul is with AMD's largest customers: system vendors, who ought to be able to refresh their Opteron-based product lineups with relatively minimal validation efforts. In fact, I'd expect to see quite a few vendors unveiling Istanbul-based systems in the coming weeks, starting today, even though they've just introduced new Xeon-based offerings, as well.
Despite all of this sleepy talk about continuity, Istanbul does have a few new tricks up its sleeve. For one thing, the north bridge and HyperTransport clocks in Istanbul are decoupled, so higher HyperTransport frequencies are possible. The Opterons introduced today all have a HyperTransport clock of 2.4GHz, resulting in a 4.8 GT/s transaction rate. The north bridge clock, which also governs the speed of the L3 cache, runs at 2.2GHz.
The most notable change, though, is probably the addition of a feature AMD calls HT Assist. HT Assist is essentially a probe filter intended to reduce the overhead required for the synchronization of cached data across CPUs in multiple sockets. HT Assist reserves space in each processor's L3 cache, in which it stores an index of where that CPU's cache lines are being used system-wide. The CPU then becomes "host" of the cache lines stored in its directory. If any CPU needs an update about a particular cache line, it will often know which CPU is the correct host to probe for that information. AMD says HT Assist can replace broadcast probe requests (sent to all sockets) with directed requests in 8 of 11 typical CPU-to-CPU transactions. This reduction in probe traffic can yield big gains in available system bandwidth, as we reported when we saw AMD demo a 4P system whose Stream bandwidth increased from roughly 25GB/s to 42GB/s with the addition of Istanbul processors with HT Assist.
Back then, AMD talked of user-configurable HT Assist index sizes that could be set in the BIOS. Since that time, the firm has instead settled on a static index size of 1MB, which it considers the most optimal tradeoff between cache size and index granularity. To keep things simple, including Istanbul validation for system vendors, the index size will not be user-configurable. AMD has also decided not to enabled HT Assist by default on 2P systems, because the reduction in probe traffic on a 2P box isn't worth the loss of 1MB of L3 cache per processor. For what it's worth, our 2P SuperMicro H8DMU+ motherboard does expose a BIOS option to enable this feature, and we found that enabling it produced no appreciable increase in Stream bandwidth.
Like Shanghai before it, Istanbul is produced by GlobalFoundries on its 45nm SOI fabrication process. Istanbul weighs in at 904 million transistors, and its six-core die is 346 mm². Compare that to Shanghai, which is 758 million transistors and 258 mm². Istanbul isn't 50% larger by either count, although its core count is up from four to six, because a 6MB L3 cache occupies a large portion of both chips. Intel's Nehalem Xeons, of course, are also 45nm chips, and have dimensions very similar to Shanghai, with roughly 751 million transistors in a 263 mm² die. In other words, even if AMD does match Nehalem with Istanbul, it will be doing so with a considerably larger chip.
The comparison to Nehalem is instructive for many reasons, not least of which is the very different approaches AMD and Intel have taken with their latest CPU architectures. From a certain way of looking at things, they reach similar destinations by different paths. Istanbul, of course, has six execution cores, each of which can issue three instructions per clock. Nehalem has four cores, but they are true four-issue cores, capable of issuing, executing, and retiring four instructions per clock. Chip wide, then, Istanbul can issue 18 instructions per clock, while Nehalem can issue 16closer than one might think, when just considering core counts. Also, thanks to simultaneous multithreading, Nehalem can track eight hardware threads, to Istanbul's six, for greater thread-level parallelism. Perhaps most decisive for many of today's workloads is the fact that Nehalem has three channels of DDR3 memory per socket, versus Istanbul's two channels of DDR2. Despite its larger die size and higher core count, Istanbul isn't necessarily far-and-way superior to Nehalem, even in theory.
That's the match-up in the 2P space, but 4P and better servers may be more hospitable ground for the time being. The Xeon 7400 series processors, better known as Dunnington, have six cores, but are based on Intel's older microarchitecture. AMD expects Istanbul to give it a clear lead in this space, at least until Nehalem-EX arrives later this year with native octal cores and four memory channels per socket.