ATI’s stream computing kickoff
Last Friday, ATI invited a number of journalists and analysts to a short but information-packed event devoted to its new stream computing initiative. ATI is using the phrase “stream computing” to refer to the class of applications more commonly referred to under the GPGPU label, an acronym which refers to general-purpose processing on a graphics processing unit. CEO Dave Orton explained that ATI chose the term stream computing because the class of computing problems the GPU handles well are primarily about data flow, a characteristic that separates these problems from the types of computation at which CPUs have traditionally excelled.
Orton identified a number of specific areas where ATI sees opportunities for GPUs to accelerate computation, including medical research, analysis of video and audio data for security applications (such as facial recognition), financial analysis, seismic modeling for oil and gas exploration, media search applications, physics simulations in video games, and media encoding.
In these areas, he said, the GPU has the potential to be “orders of magnitude” faster than CPUs due to its nature as a highly parallel floating-point processor. Orton pegged the floating-point power of today’s top Radeon GPUs with 48 pixel shader processors at about 375 gigaflops, with 64 GB/s of memory bandwidth. The next generation, he said, could potentially have 96 shader processors and will exceed half a teraflop of computing power.
Orton was quick to emphasize that ATI is not looking to compete directly with CPUs, just to find and address a set of problems that map especially well to the GPU. He described the CPU-GPU relationship as complementary and symbiotic. He also made it clear that the day’s events were not part of a new product launch. ATI is just inaugurating a new direction in seeking out this business, he said, and showcasing some actual applications where the GPU has been fruitfully applied.
Much of the rest of the event was devoted to speakers who had actual stream computing applications to discuss or demo.
First among them was Vijay Pande of Stanford University, Professor of Chemistry and Director of the [email protected] project. TR readers should be very much familiar with Folding, since we field one of the top ten Folding teams in the world. Pande was there to talk about the new beta Folding client that uses the GPU. Currently, it only runs on newer Radeons, where it shows big performance increasesbetween 20 and 40 times the speed of a CPU. Pande said the client is presently achieving around 100 gigaflops per GPU. To give some perspective, he then demonstrated the graphical versions of the CPU and GPU clients side by side, and the GPU version showed constant motion, while the CPU one chunked along at a few frames per second.
This particular implementation of stream computing has now gone live. The FAH project released the first beta of the client to the public earlier this week.
I talked with Pande about the possibility of a Folding client for Nvidia GPUs, and he had some interesting things to say. The Folding team has obviously been working with Nvidia, as well as ATI. In fact, Pande said Nvidia has their code and is running it internally. At present, though, ATI’s GPUs are about eight times as fast as Nvidia’s. He was hopeful Nvidia could close that gap, but noted that even a 4X gap is pretty largeand ATI is getting faster all of the time.
The bottom line for Pande and his colleagues, of course, is how Folding on a GPU can further research about diseases like Parkinson’s and Alzheimer’s. Pande characterized the move to GPU Folding as one that opens new possibilities.
Next up was Michael Mullany, VP of Marketing for PeakStream. This brand-new company has built a set of software tools to serve the high-performance computing (HPC) market, which is where big, high-margin players like oil and gas companies, automakers, and aerospace firms reside. PeakStream believes GPUs can bring strong outright performance, solid performance per watt, and good performance per square foot of space in the data center.
To capitalize on that opportunity, PeakStream’s software platform plugs into standard development tools like gcc and the Intel compilers to allow applications nearly transparent access to GPU computational power. PeakStream’s profiler determines whether the code being executed is a good fit for a particular type of processor, and their virtual machine provides a layer of abstraction from the execution hardware. Code that’s been profiled and fed into the VM may wind up being executed on an x86 processor, Sony’s Cell processor, or a GPU, depending on its needs.
Mullany showed a demo that PeakStream developed while working with Hess, a large U.S. oil and gas producer, on a seismic analysis algorithm. This algorithm analyzes the echoes created by controlled explosions on the surface of the earth in order to determine the shape of the rock layers and other features beneath the ground. The analysis ran about 15 times as fast with the GPU as it did on the CPU alone, which Mullany explained would allow for new levels of resolution or new types of analysis.
Mullany said PeakStream is working with customers on a range of applications, from financial firms wishing to price derivatives to academics simulating fluid dynamics. In one instance, he said, PeakStream stepped into a project where a defense contractor was using a GPU to do signal processing in a mobile application. With its software, PeakStream was able to deliver a five-fold performance improvement.
That example perhaps best illustrates the potential value of PeakStream’s product. ATI has documented some of the workings of its GPUs for developers to use, but doesn’t really provide a robust set of tools that will allow developers to write programs in high-level languages and then compile them for the GPU. Partners like PeakStream will be very important if ATI is to make its stream computing push a success.
Microsoft pledges support
Speaking of important partners, Microsoft sent a rep to the event, as well. Chas Boyd, an Architect in Microsoft’s Graphics Platform Unit, spoke briefly about Microsoft’s support for non-traditional uses of GPUs in Windows. Boyd showed off a Windows Vista image editor that handles image processing operations on the GPU rather than the CPU, making photo editing a much quicker task. He also talked about using GPUs to handle graphical problems in a non-graphical way.
You’ll note that the demo scene above has lots of dense grass in it. This kind of detailed vegetation can cause problems for renderers, because determining which blade of grass is in front of the others is notoriously difficult. Boyd said that by using a prefix sort algorithm running on the GPU, this app is able to determine quickly and correctly the proper polygon depths and render the image correctly. The result is higher image quality, but it comes by using the GPU as a general-purpose processor.
Boyd said more parts of Microsoft are becoming more engaged with GPUs as these sorts of uses expand. The entire Vista Aero user interface now runs on a GPU, and he noted that physics interactions, particles, fluids, and the like are being mapped successfully to GPUs using DirectX. Over time, he claimed, Microsoft will be evolving the DirectX API to facilitate such thingsfrom DirectX 10 forward.
Boyd’s talk of physics being successfully mapped to the GPU using DirectX was surely a reference to Havok’s GPU-based physics engine, Havok FX. Jeff Yates, Havok’s VP of Product Management, followed Boyd on stage with a demo of that physics engine. Havok has shown demos of basic rigid-body physics acceleration running on GPUs in the past, but Yates also showed off a nice demo of cloth or fabric, which tends to require more computing power.
Then he produced a real surprise: Havok FX with “gameplay physics”that is, physics interactions that affect gameplay rather than just being eye candyrunning on the GPU. I wasn’t even aware they had truly interactive GPU-based physics in the works, but here was a working demo.
The demo game, Brick War, is based on a simple premise. Each side has a castle made out of Lego-like snap-together bricks, and the goal is to knock down all of the soliders in the other guy’s castle by hurling cannonballs into it.
The game includes 13,500 objects, with full rigid-body dynamics for each. Havok had the demo running on a dual-GPU system, with graphics being handled by one GPU and physics by the other.
As the player fired cannonballs into his opponent’s castle, the bricks broke apart and portions of the structure crumbled to the ground realistically. Yates pointed out that the GPU-based physics simulation in Brick War is fully interactive, with the collision detection driving the rest of the rigid-body dynamics and also driving sound in the game.
Havok seems to have made quite a bit of progress on Havok FX in the past few months. According to Yates, the product is approaching beta and will soon be in the hands of game developers. When that happens, he said, game developers will need to change the way they think about physics, because the previous limits will be gone.
Yates’ was the last of the formal presentations, and a quick Q&A session followed.
I came away from the ATI event most impressed with the quality and relative maturity of the applications shown by the presenters. Each of them emphasized in his own way that the GPU’s much higher performance in stream computing applications opens up new possibilities for his field, and each one had a demonstration to back it up. Obviously, it’s very early in the game, but ATI has identified an opportunity here and taken the first few steps to make the most of it. As they join up with AMD, the prospects for technology sharing between the two companies look bright.
ATI still faces quite a few hurdles in meeting the needs of non-graphics markets with its GPUs, though. Today’s GPUs, for instance, don’t fully support IEEE-compliant floating-point datatypes, so getting the same results users have come to expect from CPUs may sometimes be difficult or impossible. ATI also hasn’t provided the full range of tools that developers might wantthings like BLAS libraries or even GPU compilers for common high-level languagesand so will have to rely on partners like PeakStream to make those things happen. I’m just guessing here, but I’d bet a software provider that focuses on oil and gas companies doesn’t license those tools for peanuts. If stream computing is to live up to its potential, ATI will eventually have to make some of these programming tools more accessible to the public, as it has done in graphics.
One other interesting footnote. On the eve of ATI’s stream computing event, Nvidia’s PR types arranged a phone conference for me with Andy Keane, one of Nvidia’s GPGPU honchos. (Hard to believe, I know, but Nvidia was acting aggressively.) The purpose of the phone call was apparently just to plant a marker in the ground signaling Nvidia’s intention to do big things in stream computing, as well. Keane talked opaquely about how the current approach to GPGPU is flawed, because people are trying to twist a device into doing something for which it wasn’t designed. They’re using languages like OpenGL and Cg in unintended ways. Very soon, he claimed, Nvidia will be talking about new technology that will change the way people program the GPU, something that is “beyond the current approach.”
That was apparently all he really wanted to say on the subject, but I stepped through several of the possibilities with him, from providing better low-level documentation on the GPU’s internals to providing BLAS libraries and the like. Keane wasn’t willing to divulge exactly what Nvidia is planning, but if I had to guess, I’d say they are working on a new compiler, perhaps a JIT compiler, that will translate programs from high-level languages into code that can run on Nvidia GPUs. If so, and if they deliver it soon, ATI’s apparent lead in this field could evaporate.
For now, though, ATI is playing nice and simply letting its partners speak for it. Based on what those partners have said, the Radeon X1000 series seems better suited to non-graphics applications than Nvidia’s GeForce 7 series for a range of technical reasons, from finer threading granularity to more register space on the chip. I expect we won’t hear too much more from Nvidia on this front until after its next generation of GPUs arrives.