Microsoft brings GPU computing to C++ with C++ AMP

Look out, OpenCL. Microsoft has set its sights on the democratization of development for heterogeneous systems, and it’s pulled out the big guns. At the AMD Fusion Developer Summit today, Microsoft’s Herb Sutter announced an extension to the C++ language designed to let programmers tap into any DirectCompute-capable graphics hardware for general-purpose tasks. Microsoft calls the new extension C++ Accelerated Massive Parallelism, or C++ AMP for short, and it aims to make it an open spec that can be implemented on non-Microsoft platforms with non-Microsoft compilers.

Sutter presented C++ AMP as a way to cut through what he called the “jungle of heterogeneity.” The PowerPoint slide below illustrates the full extent of that jungle as Sutter sees it, with processors in increasing order of specialization on the Y axis and memory systems in increasing order of non-uniformity and disjointedness mapped to the X axis. It’s definitely not pretty:

C++ currently gives developers free reign in the bottom left corner of that jungle, Sutter said, but C++ AMP expands the roaming area dramatically. Not only that, but Microsoft hopes to support other specialized processors with future releases of C++ AMP, thus extending its domain over time.

So, what does C++ AMP entail? Sutter bills it as “minimal,” and indeed, the list of additions is a short one:

The additions on which Sutter dwelt the most are array_view and restrict(). The former was described in the PowerPoint presentation as a “portable view that works like an N-dimensional ‘iterator range'” and billed as a way to deal with memory that may not be uniform. The restrict() function was easier to grasp for a non-coder like me. If I understand correctly, it simply allows developers to create functions that execute on DirectCompute devices exclusively—all you have to do is toss “restrict(direct3d)” in there, like so:

(The code on the left is regular C++, while the code on the right is C++ AMP. That man in the bottom right corner is Microsoft’s Daniel Moth, who got into the nitty-gritty details of C++ AMP during a technical session after Sutter’s keynote.)

Now, there will be restrictions on what can go inside, er, restrict()-ed functions, since DirectCompute-capable GPUs can only support a subset of the C++ language. Nevertheless, programs written with C++ AMP will be compiled as single executables capable of making use of DirectCompute-capable hardware if it’s there. (I’m guessing developers will be able to include fallback code paths so systems without DirectCompute GPUs can just use the CPU to do the work.)

Microsoft said it will include C++ AMP support in the next version of Visual Studio. Of course, since C++ AMP is an open spec, Visual Studio won’t be the only way to write C++ AMP code and compile it. In fact, Sutter said Microsoft and AMD are already working together on non-Windows compilers. (Lest you think Nvidia is being left out, a post on Nvidia’s blog says the firm “continues to work closely with Microsoft to help make C++ AMP a success.”)

Comments closed
    • CBHvi7t
    • 8 years ago

    What is an extension?
    What is the differens between:
    a bunch of defines, a library, an API, an extension, a new compiler ?

    Did you notice: [url<]http://ispc.github.com/[/url<] ? yet another way.

    • StashTheVampede
    • 8 years ago

    Introducing these into their compiler is really tipping their hat to where their “next” console will be geared towards.

    • ronch
    • 8 years ago

    So, can I use this to output ‘Hello world’?

      • thesmileman
      • 8 years ago

      I don’t believe they support the printf function. CUDA does though.

        • ronch
        • 8 years ago

        Well, if it’s gonna be general purpose it should let me say hello world. ;p

    • SebbesApa
    • 8 years ago

    The best driver is no driver!

    • ronch
    • 8 years ago

    It’s interesting how VLIW has been relegated to small market niches while RISC got the upper hand over CISC. Now VLIW is finally making a comeback, as AMD’s GPU is a VLIW architecture, letting software do all the work utilizing all available compute resources. That’s what Itanium did in concept.

    Also interesting to note that after the announcement that AMD and ARM are about to collaborate on OpenCL, here is MS promoting its own C++ extension. It gets better. Imagine if you could harness the full power of thousands of ALUs for general computing tasks.

      • sschaem
      • 8 years ago

      ATI used VLIW since the R300, they shipped hundreds of millions of VLIW chips.
      Saying VLIW is not popular and relegated to small market niches doesn’t add up.
      Just in the xbox 360 alone AMD shipped over 50 million VLIW processors.

      And its not making a ‘comeback’, nvidia and intel latest and greatest architecture are not using VLIW , even ATI found that less is more. Newer ATI GPU use SLIW (Somewhat Long Instruction Word)

      SIMT / SIMD is whats IN for parallel computing. And CISC is still ruling the general computing market for decades now.

      If you anticipate ARM to replace X86 in server, desktop , laptop, etc… I think its a wild prediction.

      Micorosoft realize that DirectCompute is to low level , AMP is an elegant answer to this. OpenCL & CUDA been here for a while yet little is using it. AMP will change that (Its also what many been looking for as an alternative to SIMD intrinsics)

        • ronch
        • 8 years ago

        VLIW has made a relatively recent comeback as far as computer architectures go. R300 was fairly recent, considering VLIW has been touted by Intel since Itanium came out in the 1990s with no widespread adoption even today. And with R300 onward, it’s been seeing wide adoption. It may not have Intel and Nvidia pushing it to do graphics, but you can go down to your local parts shop and grab something that does have VLIW in it. That’s a far cry from Itanium. R300 was when, 2007? That’s a fairly recent comeback in my book, if we’re talking something as radical as VLIW which hasn’t enjoyed much widespread adoption before R300. And even with R300 ATI was having trouble making the chips compelling, perhaps not because of VLIW per se, but because of power draw.

        VLIW = Very Long Instruction Word.

        As for CISC ruling the market, x86 processors today support CISC x86 instructions, but internally they are RISC processors with CISC-to-RISC decoding front ends and back ends that translate the results back to CISC. So yes, we are all using RISC nowadays with a coating of CISC. It all started to happen with Intel P6 (1995), AMD K5 (1996) and Nexgen Nx586 (1994). And you won’t find a purely CISC processor still being sold today (as new, that is), except perhaps in very old designs for embedded applications.

        And no, just because I said RISC has overtaken CISC doesn’t mean I expect ARM to overtake x86 in servers or desktops. I was referring to the paragraph before this one.

        My point about AMP is that MS chooses to unveil it just in time when AMD and ARM are talking about working on a competing standard.

          • Namarrgon
          • 8 years ago

          Recent Poulson announcements show Itanium moving *away* from the original VLIW plan, and breaking down the VLIW instructions into individual single instructions before execution.

          [url<]http://realworldtech.com/page.cfm?ArticleID=RWT051811113343[/url<]

    • bcronce
    • 8 years ago

    Here’s the current difference.

    OpenCL requires a driver to be made. nVidia currently has poor OpenCL support and AMD isn’t a whole lot better. You can use *one* OpenCL driver at a time.

    C++ AMP uses DirectCompute and Windows PPL which is 100% supported by everything(on the Windows platform). Not only that, AMP can run on all devices at the same time for a single task, unlike CL.

    OpenCL is awesome because of cross-platform, but relying on driver support is a PITA.

    Edit: AMP allows you to make use of EVERY number crunching device in your system _at the same time_. From your multi-core CPU, to your GPU, to your future co-processor.

      • thesmileman
      • 8 years ago

      Your wrong and this simply is not true. We have products in the field which support using AMD’s, Nvidia’s, Intel’s drivers. Also We run on the CPU and the GPU at the same time.

    • thesmileman
    • 8 years ago

    YES!!! Another extension to C which allows for GPU computing. </sarcasm>

    Does anyone want GPU computing to be mainstream? if so stop creating new format and help the other formats move forward. OpenCL 2.0 could add all these features. This is all going to map back to CUDA at a lower level anyway so I really see this as a dumb move that is going to hurt mass acceptance of GPU computing for the genral consumer.

      • mentaldrano
      • 8 years ago

      Yeah, why C++ AMP necessary? If they wanted certain features, why not build them on an existing platform like OpenCL?

      The whole industry has a “not invented here, so I’m not using it” problem, which leads to orphaned code and duplication of effort. I mean, I know Microsoft hates open standards (want to bet how “open” C++ AMP will really be?), so this doesn’t surprise me at all.

      What a waste.

        • thesmileman
        • 8 years ago

        I see why Microsoft is doing this but it still pisses me off. They want control over the standard. OpenCL is a standard and it means it will slightly be behind. CUDA is a great example of the advancements which can happen when you only have to support yourself. But really Microsoft is going to have a hard time with this because they are trying to support everyone. When you support everyone you make comprimises and if you just want raw power you can’t afford to do that. DX10 support is a clear indicator that this is there strategy. DX10 GPU (Other than NVidias) are not designed to be executing non specific code. Anyway they should have just thrown all there support behind OpenCL. Every other manufacture has AMD, NVIDIA, INTEL, even the mobile GPU guys have supported it.

      • bcronce
      • 8 years ago

      OpenCL requires full control of the GPU and has a high context switch cost for both Linux and Windows. C++ AMP runs on top of DirectCompute which means it negates that switch. Less of an issue for science computing and more an issue for gaming.

      I guess MS could try to make an OpenCL driver that runs on top of DirectCompute, but that would be a driver on top of a driver instead of a library on top of a driver.

      Overall OpenCL is designed for high throughput, high latency science computing, C++ AMP is for fairly high throughput, low latency.

      Different languages for different targets.

      Kind of like OpenGL not having any public formal plan to add Multi-threading support. OpenGL is meant for professional rendering, not video games.

        • thesmileman
        • 8 years ago

        Again you are wrong on a number of these points. OpenCL does not require full control of the CPU. You can display graphics while running OpenCL kernels. With the newer GPUs Fermi and I think the newet ATI cards can run kernels at the same time as other tasks including rendering in both DirectX and OpenGL. If you are using DirectX you have to copy the buffer from OpenCL but games already do that for a number of tasks and is ridiculously fast.

        DirectCompute is no better here except that you avoid the in GPU memory copy which you have to do anytime you move a buffer.

        OpenCL is not designed for high latency scientific computing. It can benifit in large number of different ways and latency is often easily compensated for.

        “Different languages for different targets.”

        Apparently you have never heard of the PS3 because it uses OpenGL for graphics (with some slight customization)

        Have you even worked with OpenCL because it seems you are reading articles and not understanding things like basic optimization and actual working with hardware.

    • ish718
    • 8 years ago

    Ok, so this is supposed to make GPGPU computing easier by using an extension of C++. Unlike some specialized C like language such as CUDA, DirectCompute, or OpenCL. O_O

    And it requires atleast a DX10 GPU.

      • thesmileman
      • 8 years ago

      OpenCL is an extension of the C Language but there are official binding to C++ for OpenCL so I don’t believe this is any easier.

    • Meadows
    • 8 years ago

    There’s a lot of talk about AMD and an honourary mention of NVidia, but intel is eerily left out.

      • Kurotetsu
      • 8 years ago

      Well, yeah. The entire presentation is about running code on GPUs. Intel doesn’t have GPUs to run code on.

        • wibeasley
        • 8 years ago

        Plus they’re competitors in the compiler business. I wonder if AMP will get established enough that it’s eventually supported by the Intel compiler.

      • thesmileman
      • 8 years ago

      While this only requires a DX10 GPU it isn’t going to be tremendously efficient unless you are using DX11 hardware. Does Intel even make a DX11 chip?

        • [+Duracell-]
        • 8 years ago

        The GPU on Sandy Bridge is DX10.1 compatible and Ivy Bridge will bring DX11 compatibility.

          • ronch
          • 8 years ago

          Dude, a bit off topic, but do you have to type the brackets when logging in? 😀

      • ronch
      • 8 years ago

      Well, apart from QuickSync Intel has, as far as I can recall (enlighten us if you know better), not made any announcements regarding general purpose computing with their GPUs. Besides, before they make that jump they might as well make fast GPUs first. One thing at a time.

      Besides, QuickSync isn’t even a general purpose compute engine, being primarily intended for video transcode operations. So, no general purpose, highly parallel compute initiative from Intel at this point.

        • thesmileman
        • 8 years ago

        They support OpenCL for the CPUs and I might <wink wink> have heard from it will be running on their GPUs soon.

    • DancinJack
    • 8 years ago

    I think this is sweet. Being able to use C++ as a base should help a lot of people get started too.

      • thesmileman
      • 8 years ago

      You can use OpenCL in both C and C++

    • sschaem
    • 8 years ago

    AMD Fusion finally make sense. But it might take a while for AMP software to be available…

    I think a few AMP sample benchmarks will help the Fusion branding.

      • thesmileman
      • 8 years ago

      OpenCL already runs on the Fusion platform. This will not be faster (or at least not by much) that OpenCL.

        • Arag0n
        • 8 years ago

        EASY TO USE, it’s the key….if this extensions are included in normal c++ doesn’t matter the compiler becomes a de-facto standart to create paralelized code for GPU’s….

          • thesmileman
          • 8 years ago

          this syntax is slightly more complicate than OpenCL but it is doing the exact same thing.

            • sschaem
            • 8 years ago

            You missed the boat… the issue is not about syntax but easy access.

            A C++ developer will be able to test parallel computing with no efforts. And are guarantied that no matter what the code will run.
            You cant say that with OpenCL. OpenCL is a pain to work with and code execution is no guarantied unless you are on the latest version of OSX.

            Their is no reason to use OpenCL on the windows platform, and if AMP extension are supported by nvidia/amd in GCC, OpenCL is dead.

            • thesmileman
            • 8 years ago

            “And are guarantied that no matter what the code will run.” HAHHHH!!!

            Okay sure, sure.

            • thesmileman
            • 8 years ago

            “OpenCL is a pain to work with and code execution is no guarantied unless you are on the latest version of OSX.”

            Hmmm. We don’t have any problems getting our code to run and we support Linux and Windows and Mac. Not sure why OpenCL code would have this problem where CUDA wouldn’t.

            Also OpenCL supports dynamic compilation which is very handy for generic optimizations.

    • ltcommander.data
    • 8 years ago

    Maybe I’m missing something but C++ AMP requires DirectCompute and DirectCompute is a part of DirectX. So isn’t Microsoft’s talk of non-Windows compiler support pretty useless in terms of actually running these programs on non-Windows platforms since OS X or Linux for example don’t support DirectX? I suppose it’s useful if you like using other platforms to write code and compile and then want to transfer it over to Windows to run.

    Unless Microsoft is actually planning on developing DirectX or at the very least DirectCompute APIs/frameworks/drivers to support OS X or Linux or plans to produce full documentation and free licenses to DirectCompute so third-parties can implement the API in non-Windows platforms?

      • sschaem
      • 8 years ago

      ?? AMP can use DirectCompute on windows but who stop someone to from using CUDA to implement the C++ extensions ?
      CUDA runs on all platform that matter: windows, linux and osx.

        • thesmileman
        • 8 years ago

        CUDA supports c fully and most of the C++ spec and this AMP stuff only supports a small subset of the language so it would make little since to implement it as extensions to CUDA. I really don’t understand why they don’t just build better support for OpenCL

          • bcronce
          • 8 years ago

          OpenCL doesn’t leverage Windows specific tweaks that AMP can. If people want to use OpenCL, they can, but CL support is overall sketchy as it needs driver support from the video card maker. AMP will work no matter what because it uses features that are required for Windows to work.

            • thesmileman
            • 8 years ago

            Did you read the article it isn’t running just on windows it is an open spec. Also Windows isn’t able to leverage anything related to the GPU.

            OpenCL support isn’t sketchy. Even ARM is releasing a driver in the next couple of months.

      • codedivine
      • 8 years ago

      [quote<]Maybe I'm missing something but C++ AMP requires DirectCompute and DirectCompute is a part of DirectX. [/quote<] C++ AMP does not require DirectCompute. C++ AMP is just a spec for a language extension to C++. Microsoft's implementation currently uses DirectCompute but there is nothing to stop anyone else from implementing it using any other backend. For example, it should be possible for a compiler to generate OpenCL kernels using the provided C++ AMP code. Herb Sutter mentioned that AMD is already working on an implementation for non-Windows platforms and as per above article, Nvidia is working on it too. Nvidia will likely just extend their CUDA compiler to generate PTX code from C++ AMP while AMD will likely generate either OpenCL or FSAIL. However, I do think the restrict<direct3d> tag will change name to something more general. Its likely a placeholder for now.

        • thesmileman
        • 8 years ago

        “AMD will likely generate either OpenCL or FSAIL.”

        OpenCL is dynamically compiled so it would be very silly to compile to a language which needs to be compiled again. Also it would be pretty darn hard to debug.

Pin It on Pinterest

Share This