AMD refines its approach to Stream Computing

Just like Nvidia, AMD provides developers with a high-level application programming interface (API) to tap into its latest graphics processors for non-graphics compute applications. Unlike Nvidia, though, AMD doesn’t make very much noise about what it’s doing in that area. Curious, we got on the phone with AMD Stream Computing Director Patti Harrell and asked her to shed a little light on AMD’s Stream Computing initiative.

Harrell started from the beginning, explaining the evolution of AMD’s general-purpose GPU computing APIs:

Initially we came out with Close-to-Metal, which was a very low level interface. You had to know a fair bit about the GPU to use it, [but] you could get very good performance out of it.
What we’ve done in the two years since we came out with CTM, is come out with higher-level tools. So, last November we launched Brook+, and what that is a C-level interface that is quite similar to [Nvidia’s] CUDA and the OpenCL standard that’s been proposed by the Khronos working group. So, it’s kind of the same level, and the effort of Khronos is to try a standard API so that people don’t get locked into one hardware platform or another based on initial software investment.

The other high-level tools that we have in the stack are a version of our AMD Computation and Math Library that is targeted to GPUs for those functions that can take advantage of them. So initially things like SGEMM [single-precision general matrix multiply] and DGEMM [double-precision general matrix multiply] and some of the other standard . . . functions are implemented on the GPU in ACML. And that provides scientists with a tool to program high-level and not even worry about the API, they can use this library and run their functions. And that seems to work very well, we were just in a meeting with a University Professor who did some testing on that and came away really happy about the ease of use of that methodology.

And then there are the third-party tools. We work with about half a dozen different companies on third-party development tools that give you either an alternative path or in some way augment the software stack you see here. The idea of course is to take a really open approach to this and let people approach the hardware in whatever way they like, but provide as many easy-to-use and highly performing tools for people to get good results.

Close-to-Metal came out a few months before Nvidia’s CUDA GPGPU API, and the Stream SDK in its current form followed a few months later. So, I asked, what’s the difference between CUDA and the Stream SDK? Harrell explained:

At their core, they’re essentially a very similar idea. Brook+ was based on a graduate project out of Stanford called Brook, which has been around for years and is designed to target various architectures with a high-level API. And in this case there’s back-ends for GPUs . . . What our engineering team did was take that project and bring that in-house, clean it up, write a new back-end that talks to our lower-level interface, and post the thing back out to open-source in keeping with our open systems philosophy.
Brook looks like C. . . Function calls go to the GPU very much like CUDA. In fact, the guy who was one of the core designers on Brook went to Nvidia and did CUDA. . . . And another guy who recently got his doctorate at Stanford and worked extensively on Brook at Stanford is one of the core Brook+ architects now at AMD. So, they were both born out of the same idea.

In terms of what we do differently, the one thing we’ve tried to do is publish all of our interfaces from top to bottom so that developers can access the technology at whatever level they want. So, underneath Brook we have what we call CAL, Compute Abstraction Layer, which you can think of as an evolution of the original CTM. It provides a run-time, driver layer, as well as an intermediate language. Think of it as analogous to an assembly language. So Brook has a back-end that targets CAL, basically, as does ACML and some of the other third-party tools that we’re working on. . . . From the beginning we published the API for CAL as well as for Brook so people could program at either level. We also published the instruction set architecture . . . so [people] can essentially tune low-level performance however they want. And Brook+ itself is open-source.

To put things in perspective, here’s an AMD slide showing how the different Stream SDK elements fit together:

Source: AMD.

Since CUDA runs on GeForce 8 and GeForce 9-series GPUs, and Nvidia has showed mainstream general-purpose apps like video transcoding running on those cards, I asked Harrell whether AMD was doing anything similar. She replied that the Stream SDK supports Radeon HD 2900- and 3000-series graphics cards, and that mainstream apps are indeed in the pipeline:

We do have people who use [the Stream SDK] for some mainstream applications. Video encode is a really good example. And I think we’re gonna see much more of that in mainstream consumer applications. Video encode, video game physics is another example, and some other consumer-level image processing would all be really well-accelerated on GPUs, and it stands to reason you want to use the GPU you already have in the system on your desk.

Last, but not least, I asked how the Stream SDK fit into AMD’s Fusion initiative, which aims to integrate graphics cores into desktop and mobile microprocessors. Would we see mainstream GPGPU apps run on the integrated GPU core?

[That’s] absolutely the direction you should expect, and in fact one of the big reasons for AMD to invest in this technology… Well, [there are] three reasons. One, we believe that this will be a fundamental requirement for mainstream graphics in the next few years, as we move towards a programming standard you’re gonna see more and more mainstream application developers and ISVs [independent software vendors] want to take advantage of this capability, so we believe it’s gonna be a requirement. Just baseline. On top of that, there certainly is market for incremental GPUs in technical markets and some of the high-end professional markets and we’d certainly like to play in that space.
And, finally, this really helps to set us up and take us down a technology path to prepare for the Fusion program—our Accelerated Computing programs—and there are a couple of ways in which that happens. The software stack, which you see represented here, basically evolves into a more comprehensive and higher-level set of tools that we think the company needs to get to and the industry needs to get to, to enable developers to take advantage of heterogeneous architectures without having to be early adopters who can program anything. So we think this toolkit is a good beginning for what it ultimately needs to be to handle much broader heterogenous architectures like those that we’ll implement in the Fusion project.

Also, we learn a lot dealing with the Stream Computing customers today about the sorts of workloads that accelerate well and the directions we need to move in to design graphics cores that would be part of these future processors. And we’re also working with software partners and ISVs today, who we would expect would want to take advantage of those integrated architectures when they come out. So there’s a lot that we’re doing today that seeds directly into the Accelerated Computing project, and in fact, we are in communication all the time because . . . they are very interested in what we’re doing now that they can then grab and take advantage of in future architectures.

In other words, while AMD may not be anywhere near as chatty about its GPGPU endeavors as its competitor, it definitely isn’t twiddling its thumbs on that front. The company doesn’t seem as tied to the notion of having its own, semi-proprietary API as Nvidia, though, and it has high hopes for the proposed OpenCL standard. If all goes well, OpenCL might allow developers to write GPGPU code that can run on both AMD and Nvidia GPUs, among others.

Comments closed

Pin It on Pinterest

Share This

Share this post with your friends!