AUSTIN, TEXAS — With Intel’s Nehalem-based Xeons gathering like a storm on the horizon, AMD today gave the first working demonstration of its potential counterpunch: a six-core Opteron processor code-named “Istanbul.” Istanbul is a fairly straightforward upgrade over current ‘Shanghai’ Opterons: a 45nm processor with 6MB of L3 cache that fits into a Socket F-style motherboards, only with six cores rather than four. As a result, the upcoming Istanbul-based Opterons will serve as drop-in upgrades for existing Socket F systems. The chips will take advantage of the same 2P, 4P, and 8P infrastructure as today’s Opterons, with HyperTransport and two channels of DDR2 memory per socket.
AMD has previously stated that Istanbul processors will become available in the second half of this year, and the firm hasn’t yet provided any more specific guidance about when to expect Istanbul-based systems. However, the presence of working silicon would seem to indicate that Istanbul Opterons could be introduced much earlier in that broad “second half” time-frame than originally anticipated.
AMD showed us several demonstrations of Istanbul silicon in action. The first was a simple showing of Task Manager on the Windows Server 2008 desktop, in which the utility showed activity indicators for each of the 24 cores in a quad-socket system.
Simple, yet impressive for what it indicated. The second demo was conducted on a dual-socket system with 12 cores. The main OS was Windows Server 2008, but the system also hosted three separate virtual machines: one each for Windows Server 2003, Red Hat Linux, and SLES 11 x64. Each VM had four cores dedicated to it.
The third demo was the most interesting for a couple of reasons. First, because it was intended to show how Istanbul can serve as a drop-in upgrade for Socket F systems. The only requirements: the system must support split power planes, and it must have a BIOS upgrade to operate with the new processors. Second, the demo was impressive because it included a performance test. Two otherwise-identical systems were situated side by side: one with a quartet of Shanghai Opterons, the other with four Istanbul chips. Both systems were running with HyperTransport 3 active—a capability coming soon to Shanghai Opterons but not yet available in current products. To illustrate the performance difference between the two boxes, the AMD tech ran a Stream benchmark. The 16-core Shanghai system produced throughput numbers in the range of 25,000 MB/s. The 24-core Istanbul box, by contrast, hit about 42,000 MB/s. The tech then swapped the processor-and-memory daughtercards between the two boxes, and of course, the performance characteristics moved with them.
That’s one heck of an in-place upgrade, but the bigger question may be: Why the huge performance gain with the addition of more cores, given that Stream is typically considered, at least partially, a bandwidth-bound benchmark? And why the magnitude of the gain, with only 50% more cores and (although they were not disclosed) likely lower per-core clock frequencies for Istanbul?
Part of the answer, it seems, may be a feature new to Istanbul that AMD calls HT assist (presumably for HyperTransport assist). This feature is what the company calls a probe filter (and may more commonly be called a snoop filter) that functions to reduce traffic on socket-to-socket HyperTransport links by storing an index of all caches and preventing unnecessary coherency synchronization requests. Current Opteron systems use a broadcast-based probe protocol, sending probe requests to all sockets. Istanbul, instead, either knows that no probes are required or is able to do a directed probe to a single socket. (Although it may still use broadcasts in certain, specific situations.) Istanbul’s probe filter stores its data in the processor’s L3 cache. The amount of cache space dedicated to probe filter storage, AMD says, will be configurable in the BIOS, and the more space dedicated to probe filter storage, the more granular its operation will be.
AMD didn’t handicap the exact performance impact of HT assist for us, but the quad-socket Stream test may have been an extreme case. The probe filter capability will be unique to the six-core Istanbul and will not be incorporated in a future revison of Shanghai. However, with fewer cores, Shanghai Opterons will be able to reach higher clock speeds within the same power envelopes as Istanbul, and AMD expects the clock frequency advantage to somewhat offset the lack of a probe filter. Regardless, one would expect Istanbul to be especially popular in systems with four or more sockets, where coherency traffic is a thornier issue and HyperTransport bandwidth is at a premium.