In conjunction with today's announcement of the Opteron 4100 series of CPUs, AMD has also raised the curtains on a pair of FireStream cards. The FireStream 9370 and 9350 are both based on the Cypress GPU first deployed in the Radeon HD 5870 graphics card, but they're very geared for use in servers as high-performance GPU computing engines.
The FireStream 9370, for instance, fits into a PCI Express x16 slot and has a total of 4GB of GDDR5 memory onboard. It has only a single DisplayPort output, and that may go unused in most cases. New in this generation of FireStreams is a wholesale move to passive coolers, to allow for tighter integration into servers. The cards will rely on chassis-based airflow to keep them cool. Like a high-end graphics card, the 9370's cooler occupies the width of two expansion slots.
The adoption of a Cypress GPU gives the 9370 more than double the peak throughput of the 9270 it replaces, with a single-precision floating-point peak of 2.64 teraflops and a double-precision peak of 528 gigaflops. At least as importantly, Cypress has a handful of new compute-specific features built in, including higher precision computation, better support of atomic operations, and improved data sharing and thread synchronization. We've covered these features in more detail here. The upshot is a GPU that's well suited to the requirements of OpenCL and other emerging tools for GPU computing.
The 9370 is very much a premium product, with a power requirement of "under 225W," according to AMD, and a suggested price just one dollar shy of two grand.
The FireStream 9350 may prove to be more popular; it's a scaled-down variant with 2GB of GDDR5 memory, a single-slot passive cooler, and a power requirement of "under 150W." Lower clock speeds yield a peak SP arithmetic rate of 2.0 teraflops and a peak DP rate of 400 gigaflops. The price is nicer, too, at $799.
AMD expects both products to be available in the third quarter of this year, "a few weeks to a couple of months from now," according to Patti Harrell, AMD's Director of Stream Computing.
The system pictured above will give you a sense of the sort of compute density upcoming integrated FireStream-based solutions could provide. This 1U system from SuperMicro packs in two FireStream 9370 compute accelerators, dual Opteron 6100-series (12-core) CPUs, 16 DDR3 DIMM slots with (if our math is right) eight memory channels, and three hot-swappable SATA/SAS drive bays. The power supply is rated for a staggering 1.4kW. In the back is room for a low-profile PCIe x8 add-on card, possibly an Infiniband interconnect. The GPUs alone should reach over 5.2 teraflops.
Other vendors are working on larger systems with four FireStream 9370s or eight 9350s; an eight-way 9350 system would peak at over 16 teraflops.
The most obvious competitors for the FireStream 9300 series are Nvidia Tesla cards based on the Fermi GPU architecture. AMD cites the Tesla C/M2000-series as a point of comparison, noting that the Tesla tops out at one teraflop for single-precision math and 514 gigflops for double-precision. Both figures are below the FireStream 9370's peak rates. The Teslas require power from an 8-pin connector and a 6-pin one, while the 9370 needs only dual 6-pin power inputs. This comparison won't be unfamiliar to those who know the desktop graphics cards based on these same GPUs. Nvidia has a much larger chip but no real edge in performance.
In the GPU computing space, though, Nvidia has some noteworthy advantages, including ECC protection throughout its memory hierarchy, a true L2 cache, and what is probably a more robust set of development tools than those produced so far by AMD's Stream initiative. ECC support will likely open doors for Nvidia in large supercomputing clusters that AMD can't yet open, and the Fermi architecture's additional computing features could allow it to achieve higher performance and superior efficiency when running certain algorithms.
When we asked Harrell how she would address the tricky question of ECC support with potential customers, she said that AMD tests both software and hardware reliability, down to neutron beam testing in its labs. Those tests have revealed that the memory interface is the most vulnerable point in the system, and GDDR5's error correction adds a measure of protection at that point.
On the topic of software development tools for Stream-capable GPUs, Harrell sounded more confident than in our past forays into this area. She claimed AMD has a substantial investment in tools with third parties now, and she expects it to bear fruit in the form of product announcements later this year.