AMD refines its approach to Stream Computing

Just like Nvidia, AMD provides developers with a high-level application programming interface (API) to tap into its latest graphics processors for non-graphics compute applications. Unlike Nvidia, though, AMD doesn’t make very much noise about what it’s doing in that area. Curious, we got on the phone with AMD Stream Computing Director Patti Harrell and asked her to shed a little light on AMD’s Stream Computing initiative.

Harrell started from the beginning, explaining the evolution of AMD’s general-purpose GPU computing APIs:

Initially we came out with Close-to-Metal, which was a very low level interface. You had to know a fair bit about the GPU to use it, [but] you could get very good performance out of it.
What we’ve done in the two years since we came out with CTM, is come out with higher-level tools. So, last November we launched Brook+, and what that is a C-level interface that is quite similar to [Nvidia’s] CUDA and the OpenCL standard that’s been proposed by the Khronos working group. So, it’s kind of the same level, and the effort of Khronos is to try a standard API so that people don’t get locked into one hardware platform or another based on initial software investment.

The other high-level tools that we have in the stack are a version of our AMD Computation and Math Library that is targeted to GPUs for those functions that can take advantage of them. So initially things like SGEMM [single-precision general matrix multiply] and DGEMM [double-precision general matrix multiply] and some of the other standard . . . functions are implemented on the GPU in ACML. And that provides scientists with a tool to program high-level and not even worry about the API, they can use this library and run their functions. And that seems to work very well, we were just in a meeting with a University Professor who did some testing on that and came away really happy about the ease of use of that methodology.

And then there are the third-party tools. We work with about half a dozen different companies on third-party development tools that give you either an alternative path or in some way augment the software stack you see here. The idea of course is to take a really open approach to this and let people approach the hardware in whatever way they like, but provide as many easy-to-use and highly performing tools for people to get good results.

Close-to-Metal came out a few months before Nvidia’s CUDA GPGPU API, and the Stream SDK in its current form followed a few months later. So, I asked, what’s the difference between CUDA and the Stream SDK? Harrell explained:

At their core, they’re essentially a very similar idea. Brook+ was based on a graduate project out of Stanford called Brook, which has been around for years and is designed to target various architectures with a high-level API. And in this case there’s back-ends for GPUs . . . What our engineering team did was take that project and bring that in-house, clean it up, write a new back-end that talks to our lower-level interface, and post the thing back out to open-source in keeping with our open systems philosophy.
Brook looks like C. . . Function calls go to the GPU very much like CUDA. In fact, the guy who was one of the core designers on Brook went to Nvidia and did CUDA. . . . And another guy who recently got his doctorate at Stanford and worked extensively on Brook at Stanford is one of the core Brook+ architects now at AMD. So, they were both born out of the same idea.

In terms of what we do differently, the one thing we’ve tried to do is publish all of our interfaces from top to bottom so that developers can access the technology at whatever level they want. So, underneath Brook we have what we call CAL, Compute Abstraction Layer, which you can think of as an evolution of the original CTM. It provides a run-time, driver layer, as well as an intermediate language. Think of it as analogous to an assembly language. So Brook has a back-end that targets CAL, basically, as does ACML and some of the other third-party tools that we’re working on. . . . From the beginning we published the API for CAL as well as for Brook so people could program at either level. We also published the instruction set architecture . . . so [people] can essentially tune low-level performance however they want. And Brook+ itself is open-source.

To put things in perspective, here’s an AMD slide showing how the different Stream SDK elements fit together:

Source: AMD.

Since CUDA runs on GeForce 8 and GeForce 9-series GPUs, and Nvidia has showed mainstream general-purpose apps like video transcoding running on those cards, I asked Harrell whether AMD was doing anything similar. She replied that the Stream SDK supports Radeon HD 2900- and 3000-series graphics cards, and that mainstream apps are indeed in the pipeline:

We do have people who use [the Stream SDK] for some mainstream applications. Video encode is a really good example. And I think we’re gonna see much more of that in mainstream consumer applications. Video encode, video game physics is another example, and some other consumer-level image processing would all be really well-accelerated on GPUs, and it stands to reason you want to use the GPU you already have in the system on your desk.

Last, but not least, I asked how the Stream SDK fit into AMD’s Fusion initiative, which aims to integrate graphics cores into desktop and mobile microprocessors. Would we see mainstream GPGPU apps run on the integrated GPU core?

[That’s] absolutely the direction you should expect, and in fact one of the big reasons for AMD to invest in this technology… Well, [there are] three reasons. One, we believe that this will be a fundamental requirement for mainstream graphics in the next few years, as we move towards a programming standard you’re gonna see more and more mainstream application developers and ISVs [independent software vendors] want to take advantage of this capability, so we believe it’s gonna be a requirement. Just baseline. On top of that, there certainly is market for incremental GPUs in technical markets and some of the high-end professional markets and we’d certainly like to play in that space.
And, finally, this really helps to set us up and take us down a technology path to prepare for the Fusion program—our Accelerated Computing programs—and there are a couple of ways in which that happens. The software stack, which you see represented here, basically evolves into a more comprehensive and higher-level set of tools that we think the company needs to get to and the industry needs to get to, to enable developers to take advantage of heterogeneous architectures without having to be early adopters who can program anything. So we think this toolkit is a good beginning for what it ultimately needs to be to handle much broader heterogenous architectures like those that we’ll implement in the Fusion project.

Also, we learn a lot dealing with the Stream Computing customers today about the sorts of workloads that accelerate well and the directions we need to move in to design graphics cores that would be part of these future processors. And we’re also working with software partners and ISVs today, who we would expect would want to take advantage of those integrated architectures when they come out. So there’s a lot that we’re doing today that seeds directly into the Accelerated Computing project, and in fact, we are in communication all the time because . . . they are very interested in what we’re doing now that they can then grab and take advantage of in future architectures.

In other words, while AMD may not be anywhere near as chatty about its GPGPU endeavors as its competitor, it definitely isn’t twiddling its thumbs on that front. The company doesn’t seem as tied to the notion of having its own, semi-proprietary API as Nvidia, though, and it has high hopes for the proposed OpenCL standard. If all goes well, OpenCL might allow developers to write GPGPU code that can run on both AMD and Nvidia GPUs, among others.

Comments closed
    • axeman
    • 11 years ago

    Oooh, pretty pictures… what’s a stream computor?

    • Krogoth
    • 11 years ago

    Oh boy, GPGPU is really the new name for something old.

    Math co-processor!.

    Seriously, a GPU is nothing more then a massively paralleled and super-scaled FPU.

      • srg86
      • 11 years ago

      Maybe we could come up with a new name for these new Math Copros. Maybe the “Massively Paralleled Math CoProcessor” or something like that.

      GPGPU: Revenge of the Math CoProcessor.

        • willyolio
        • 11 years ago

        the mathemagical coprocessor.

    • mako
    • 11 years ago

    AMD needs to revamp its stream computing website. Nvidia has their “CUDA zone” but googling doesn’t turn up anything similar for ATI.

    • wingless
    • 11 years ago

    I would like to see OpenCL be like the DirectX of GPGPU computing. In the future we may see GPU benchmarks with both synthetic, game, and gpgpu workload tests comparing AMD, Nvidia, and maybe even Intel GPUs.

    I imagine we will see a lot of applications for this new technology in about 2 years.

      • Hattig
      • 11 years ago

      I imagine it would be more like OpenGL, i.e., cross platform. Good to see AMD is working here and achieving things, especially accelerating their mathematical libraries because that will have a noticeable effect on some benchmarking results.

      I wonder when AMD will eventually integrate GPU / OpenCL functionality onto the CPU…

        • Helmore
        • 11 years ago

        AMD will basically integrate these features into CPUs with Fusion. Fusion will initially start as a way to make cheap and efficient notebook chip/system, but this will change over time. There will be quad/octo cores in the future with multiple other non-general purpose cores, GPU cores would be the most viable for consumers and this kind of integration will continue to go on. The discrete graphics chip will never disappear though, which is a good thing as I don’t want one 300 Watts CPU to get a decent system to run Crysis 2. So these extra cores will be used for other purposes, physics and video encoding for example, while a discrete GPU will do graphics rendering and low end systems would not even need a discrete GPU this way.

    • ssidbroadcast
    • 11 years ago

    GPGPU news is neat, but… it’s hard to get excited about it since the applications for it are limited use.

      • mortifiedPenguin
      • 11 years ago

      I suspect the excitement would be more evident in the non-consumer industries (scientific research and the like) where things like Folding are important. Still, the implications for game physics is somewhat exciting, despite the fact that calculating physics at the same time as rendering would drop framerates…

        • ssidbroadcast
        • 11 years ago

        Mm, i dunno. Sounds like a pretty heavy burden on the game engine and the GPU to switch from OpenGL/Direct X API to a totally different API just to integrate physics into a game, when typically physics have been integrated into runtime already with some success.

          • Flying Fox
          • 11 years ago

          Underneath it all is still C/C++. Dealing with yet another additional library is one definition of a developer’s work.

          • BobbinThreadbare
          • 11 years ago

          One graphics card for each purpose would fix that.

      • Helmore
      • 11 years ago

      “Applications are of limited use”, you must be out or your mind. We are only at the start of these kind of things and we haven’t even begun to imagine what this will bring in the future. But some recently called applications are video encoding, physics calculations, maybe AI (although I have no idea what kind of computations that are), photoshopping will get a huge boost and so on.
      Video encoding on a video card for example can easily be 5 times faster than on a CPU, so instead of letting your C2Q@4GHz. do video encoding for 25 minutes for a video that is 10 minutes long, you can now let your GPU do the same thing in 5 minutes.
      The possibilities for physics calculations are also mind blowing, this could create a whole new breed of games. We have games with a little bit of physics like Half-Life 2 and Crysis, but that is nothing compared to what’s possible with physics calculations on a GPU.

    • DrCR
    • 11 years ago

    I’m really liking the desire of AMD to keep things open. Better for the coders, ultimately better for the end-users.

      • TheEmrys
      • 11 years ago

      My only beef about CUDA is that it is proprietary.

Pin It on Pinterest

Share This