Khronos’ President talks OpenCL, DX Compute Shader, and more

We recently had the opportunity to speak with Neil Trevett, who fills positions as both the Khronos Group’s President and Nvidia’s VP of Embedded Content. Consumers might not hear the Khronos name too often, but the organization is responsible for setting and updating a number of key standards: among them OpenGL, OpenGL ES, and most recently, OpenCL.

It was that last standard we wanted to talk about. In December, Khronos completed the first version of OpenCL, and all major players in the graphics market—Intel, Nvidia, and AMD—ratified it.

Once all of those companies release compliant drivers, developers will be able to write apps that tap into the parallel computing resources of any compliant GPU from any vendor. That’s a pretty major departure from older GPU compute application programming interfaces (APIs) like C for CUDA and Brook+, which are each tied to a particular vendor’s hardware (Nvidia for the former, AMD for the latter).

To break the ice, we asked Mr. Trevett to update us on what’s going on with OpenCL. Is Khronos doing anything new with the API? Here’s what he told us:

As you know, OpenCL 1.0 was released back in Siggraph Asia last year, so actually it’s only been around six months since the 1.0 specification was announced. You’ve probably seen the announcements that Apple made around their WWDC event. They’re beginning to explain how the Snow Leopard OS is going to use OpenCL to unleash the power of the GPU for a wide range of applications inside Snow Leopard.
At Nvidia, we are the first GPU company to ship beta OpenCL drivers. Actually, now we’re shipping fully conformant OpenCL drivers for our range of GPUs. So Nvidia is committed to timely shipment of . . . OpenCL implementations on our GPUs.

We are working, of course, on the next OpenCL specification. Because OpenCL is so new, we are in the mode of taking input from the developer community before we make any final decisions on what’s going to be included in the next generation of OpenCL and the precise timing. We’re not going to wait too long, but we do need to let the developer community kick the tires on OpenCL 1.0 before we head off with a next generation. That’s going to happen over the next few months. Siggraph is a good opportunity to get interaction with the developer community.

Will we see many OpenCL-enabled consumer apps from major application vendors?

Yeah, absolutely. I think it’s interesting; you can split the types of apps down to their individual categories. But I think as GPU compute becomes more widely available, I think over time you’re gonna see these historical categories begin to break down. I think you’re gonna see a very innovative ebb and flow between the different application categories, and see new types of applications emerge that weren’t possible before they could tap into the parallel computing inside GPUs.
So right now, these traditional parallel computing communities are coming to OpenCL. We had the high-end [high-performance computing]—the labs and engineering departments—doing large compute projects. They’re using OpenCL all the way down to consumer applications. The most obvious parallelization opportunity is of course with images and video. So I think you’ll see a wide range of imaging applications plugging into the parallel GPU. You can see the beginnings of that with things like Photoshop that have traditionally used CUDA. You can see a wide range of imaging applications tapping into OpenCL; video even more so—different transcoding, video enhancements, quality enhancements, even image-recognition types of applications. So your videos will be auto-metadata-tagged eventually with image recognition algorithms running on the GPU.

I think [this is] the first wave of making supercomputing performance available on every desktop and laptop, and it’s gonna take more than six months for the developer community to really get a feel of what’s possible. And I think it’s going to unleash a wave of innovation that we haven’t seen before.

What about the short term? We’ve recently seen video transcoders from Elemental and Cyberlink that use GPU computing through proprietary APIs. Are those apps going to be ported to OpenCL? Will we see other players join in?

I shouldn’t put words in vendors’ mouths. There are a lot of vendors using CUDA today. Some of them might stick with CUDA, a large number of them I think will move to OpenCL so they can tap into GPU compute across a broad range of platforms. From Nvidia’s point of view, we’re happy for them to use CUDA or OpenCL; we’re giving the choice to the application developers. It all taps down to the CUDA architecture running on our GPUs. So, it’s a just a choice of different programming techniques that we can offer to the developer community.
I think having a standard API that is portable across multiple vendors’ silicon will grow the total market for applications that use GPU compute. I think it’s a necessary evolutionary step to making parallel computation just pervasively available everywhere. Of course, it’s gonna happen first on the desktop, but you might’ve noticed that OpenCL also has an embedded profile—OpenCL “ES” if you like—in the 1.0 specification. So, over the next few years, you’re gonna see OpenCL embedded profiles used alongside OpenGL ES. So it’s not just high-end servers and high-end desktops; it’s gonna be laptops, netbooks, and mobile devices over the next few years that tap into parallel computation.

So, we’ll see OpenCL in cell phones. Would that involve, say, the graphics portion of a device’s system-on-a-chip?

Yeah. It’s not here today, it’s definitely— We’re preparing for the future here, but I think it is inevitable. You can look at the evolution of mobile graphics silicon. It is tracking the desktop silicon, so at some point in the not-too-distant future, the GPUs will be programmable enough to support CUDA or OpenCL programmability. And that’s going to enable another wave of innovation, having the power of a supercomputer in the palm of your hand in a device that has multiple sensors, such as video and still cameras, and will be always connected. [It] is going to enable so new classes of applications that you haven’t seen before.

To sum up, OpenCL use may grow slowly at first, and initial applications might not necessarily be groundbreaking. As developers get acquainted with the API and Khronos keeps improving it, though, Trevett thinks we can look forward to exciting new things (like automatic metadata-tagging of videos) and a spread into the world of handheld devices.

That’s all well and good, but OpenCL isn’t the only API in town. We just mentioned C for CUDA and Brook+, and Microsoft is also cooking up DirectX 11 Compute Shader—a vendor-independent API that also promises GPU computing for all. At Computex in June, AMD and Nvidia both demonstrated an automatic, profile-based video transcoding feature in Windows 7 that used DirectX Compute Shader. Let’s find out what Khronos thinks about all of these APIs.


OpenCL vs. other APIs, multi-core CPUs

We didn’t beat around the bush. We asked Trevett how the different APIs for graphics processor computing—C for CUDA, Brook+, DirectX Compute Shader—are going to co-exist with OpenCL. Here’s how he responded:

That’s actually interesting. The graphics APIs have been roughing it out for over a decade now. . . . It’s actually not as hard as people think to move from one API to the other, but people do care quite a lot about the APIs that they use. I think it’s actually less of a big decision for the parallel programming community, and there are already multiple languages for programming the CPUs—C, C++, C#, Java, [etc]—and that’s fine. People have the choice to pick a language that best suits their particular situation and their technical requirements.

So, I think it’s actually not a problem. I actually think it’s a positive and healthy thing that there are multiple programming languages out there for people to choose from to tap into parallel programming. For some application developers, platform portability will be the key driver, others with more specifications, they might choose to go with a vendor-specific language like C for CUDA. It doesn’t matter, actually, as long as they’re enabled to tap into parallel-compute goodness. That’s sort of what really matters at the end.

But the other interesting dynamic, though, and something that might factor into the choice that these individual developers might make—you’ve probably had this conversation with our CUDA team—is that OpenCL and C for CUDA are actually at very different levels. OpenCL is the typical Khronos API. Khronos likes to build the API as close as possible to the silicon. We call it the foundation-level API that everyone is going to need. Everyone who’s building silicon needs to at some point expose their silicon capability at the lowest and most fundamental, and in some ways the most powerful, level because we’ve given the developer pretty close access to the silicon capability—just high enough abstraction to enable portability across different vendors and silicon architectures. And that’s what OpenCL does. You have an API that you have control over the way stuff runs. It gives you that level of control.

Whereas C for CUDA, it takes all of that low-level decision making and automates it. So you just write a C program, and the C for CUDA architecture will figure out how to parallelize. Now, some developers will love that, because it’s much easier, and the system is doing a lot more figuring out for you. Other developers will hate that, and they will want to get down to bits and bytes and have a more instant level of control. But again, it’s all good, and as long as the developers are educated as to what are the various approaches that the different programming languages are taking, and are enabled to pick the one that best suits their needs, I think that’s a healthy thing.

But, perhaps more importantly, how does OpenCL compare with DirectX 11 Compute? Trevett addressed the subject twice, noting the following at the beginning of our interview:

It’s interesting to compare and contrast DirectX Compute Shaders with OpenCL. The approach we’ve taken with OpenCL is that you don’t have to use OpenCL with OpenGL obviously if you were using compute in a visual application. But the advantage of having OpenGL as a standalone compute solution is that you can get portability across a lot more different types of silicon architectures, CPUs as well as GPUs. . . . OpenCL is a very robust compute solution rather than compute within the context of the graphics pipeline, which is more the approach that DX 11 Compute Shaders have taken.

When we pressed him for details later on, he added the following:

I think DirectX 11 Compute is still under NDA, so I don’t want to go into that yet. Other than the obvious thing we mentioned before, which is that OpenCL is a standalone, complete compute solution you can use for protein folding and particle analysis never touching the pixel, and you have the option of interopping it very closely with OpenGL, so you can use it for image processing and feeding into and feeding out of the OpenCL pipeline.

Versus the approach that DirectX 11 Compute takes, which is . . . “super shaders”, which are like general-purpose C shaders. But those shaders exist within the context of the DX graphics pipeline, so it’s intended to soup up your graphics applications but you’d probably find it more difficult to write, you know, a general-purpose animation package. There’s a difference in approach.

DirectX 11 Compute Shader in action.

Finally, we were curious about OpenCL and GPU computing in general versus the CPU. Let’s imagine a system with four CPU cores and a relatively slow integrated GPU: for a task like video transcoding, would it be better to use the GPU through OpenCL or the CPU? Will consumers have to face that trade-off, needing to choose between the GPU and CPU to get the best performance in certain apps, or will it be so clear-cut that they’ll want to use the GPU every time?

It depends on a number of things. The high-order bit is that it depends on the application and the amount and type of parallel processing that’s available within an application. And imaging applications and video applications and other applications where you’re just dealing with large parallel data sets—not necessarily pixels, but for consumers, images and videos are the obvious big parallel data sets that people deal with every day—there’s a degree of parallelism there that is easily distributed over the hundreds of cores that you get in a GPU.

If you have a different type of application, where the parallelism is either not present, meaning there’s simply nothing happening in parallel, or the parallelism is a lot more difficult to extract—regardless of the API or programming language you’re using, it’s just hard to parallelize—then that application will have more affinity to running on a CPU.

Now over time, the two will begin to merge. We’re getting multi-core CPUs and the GPUs are getting more and more programmable. So over time, applications in the middle will have a grown choice. They could run essentially on either. So, again, we’re in the pretty early stages of this market developing, so I think the first wave of OpenCL applications, we’re probably gonna find applications that choose one or the other, probably. You will find some applications with not too much parallelism that will want to run on four-core or eight-core CPUs. Applications like imaging and video, it’s obvious that it’s gonna get a pretty big-time speedup running on hundreds of cores on a GPU.

So, the first roll of applications will make that hard choice at programming time. But as the silicon architectures get more advanced, and the APIs evolve and get more querying capabilities, so the application can tell dynamically what’s in the machine and what the machine’s already doing. I mean, if the GPU’s hard at work playing a video game and then the user wants to kick off video transcoding, some dynamic balancing decisions will be made. And over time, the APIs will begin to enable the application in real time to figure out where they can best run on a machine. And over time you will find applications that do dynamically decide where they’re gonna run and make best use of the resources as they are available in real time on a device. Most developers and APIs aren’t quite there yet, with maybe that level of dynamic load-balancing, but I think that’s the ideal that everyone will be working towards.

Here, Trevett’s answer was especially interesting in light of Nvidia’s latest PR campaign, which has involved talking down the importance of the CPU and hailing the GPU as a sort of computing panacea. Khronos and Trevett seem to be taking a more pragmatic view, hoping OpenCL can dynamically tap into the computing resources of any capable processor. With the line between CPU and GPU likely to blur only further in the future, that approach probably makes sense. (Just in case you forgot, Intel is just months away from releasing its first x86 CPUs with built-in graphics cores, and we expect to see the chipmaker launch Larrabee, an x86-derived GPU, next year.)

With all that said, OpenCL looks to have a bright future ahead of it. Trevett suggested that DirectX Compute Shader is more limited, especially since Microsoft has tied to Windows, so developers could flock mostly to Khronos’ API for their GPU compute needs. That would give us a wealth of general-purpose apps that can get a boost from Intel, Nvidia, and AMD GPUs and run across different operating systems. Down the line, developers should also be able to get their GPU-compute-enabled apps running on handhelds and cell phones. Exciting stuff. Now, we all we have to do is wait for developers to make some cool things with these new tools.

Comments closed
    • Paulomat
    • 10 years ago

    For all this gaming guys who argue that OpenGL is dead or irrelevant: gaming is not all and everything. It’s a mass market, but it isn’t as important, as it’s to cheap to make much money.

    You buy gaming software for $100-500 p.A. and a $300 GPU every now and then? This is peanuts compared to real industries.

    Nvidia is earning more money with workstation graphics in the industries, than with consumer products. Yes all this super expensive QuadroFX cards get sold. Mostly in high-end engineering workstations by HP, DELL, etc. (not their consumer stuff).

    About software: all CAD, DCC, scientific vis tools are based on OpenGL. No-one is interested in Windows gaming stuff with a graphics API that is source code incompatible after every major revision. They don’t have time to fix this every 2-3 years. CAD systems like Catia or NX have been around for much longer than playable windows 3D games (first on Unix of course). And a single license for high end product visualisation software like Showcase or RTT Deltagen is in the $20000-50000 range.

    This is where the money is. Games are a unimportant market – high volume but very expensive to build, low price, high piracy, no big money any more.

    But all this cars, planes, consumer toys, every bottle and every paper package must be constructed, designed, evaluated, reviewed by dozens and hundreds of engineers, product designers, managers, marketing gurus, etc. before mass production. Most of this is done with virtual prototypes, designed, visualized and rendered with software based on OpenGL before it hits the real world.

    Open your eyes: hundreds of new car models every year, hundreds of new/renewed factories with a lot of new machinery, robots, etc. All is constructed, planned, tested on OpenGL CAx software. Don’t play too much computer games and forget about the real world, a world full of buildings, super markets and malls full of all kind of products, stuff that is really build. Not just funny little polygon guys running amok in funny little polygon worlds.

    But even if you don’t like reality, think of Hollywood:
    All this nice GCI VFX in your favourite popcorn flick is surely designed on OpenGL workstations (often running OS X and Linux – no MS Windows) using Maya, Softimage XSI, Houdini, Lightwave, Cinema 4D, ZBrush, inhouse tools etc.

    Still: no DirectX, all platform indepentant running with OpenGL on OS X or Linux.

    I just know one professional DCC tool, that is using DirectX (beside OpenGL): 3ds max. It isn’t to popular in movie industries for being Windows only.

    What do you think your Mobile uses for 3D? Or Playstation and Wii?
    No DirectX, all OpenGL (ES).

    So everyone saying that Direct X rules the world and OpenGL is a niche API is just fooled by MS marketing and lives in a world distorted by playing to much PC games.

    • Sniper
    • 10 years ago

    For everyone else scratching their head, Open CL stands for Open Computing Language.

    • sativa
    • 10 years ago

    is techreport at siggraph right now?

    • moritzgedig
    • 10 years ago

    what is this “supercomputer” BS ?
    it is a onetime step that will just make up for the lack of progress in the last and future years on the CPU side.
    To utilize it, we will have to buy new software.
    Luckily the GPUs will be able to swallow more transistors as they get available.
    Also I don’t care for a Mobilephone that has 3D and the power of a “supercomputer”, I care for useability and “location based services”.
    phone+email+”GPS navigation”+PDA+IRC+”limited browser functunality for weather, time-table and booking”+VPN

    • 10 years ago

    I think a very interesting question to ask to nVidia is whether they have plans on porting PhysX to OpenCL? If I’m not mistaken, PhysX acceleration is done through CUDA while an OpenCL port of Havok has already been demoed. It seems that PhysX hardware acceleration adoption could really stall if they don’t make it cross-vendor.

    And on the question of whether it’s better to have OpenCL running on the CPU or the GPU for a certain application/circumstance and Neil’s comments on possible API limitations right now, we’ll have to see whether Apple actually has a solution to this problem. After-all, OpenCL was originally developed by Apple and even Neil confirms it’ll be deeply integrated into Snow Leopard. One of the key components of Snow Leopard is Grand Central which is supposed to be able to help split tasks up into blocks and dynamically allocate it to available cores. With Apple’s work on LLVM and JIT compilers, of which parts of OS X’s OpenGL pipeline is already JIT and I believe OpenCL has been rumored to be implemented this way as well, it may be possible for Grand Central to dynamically compile the code for either CPUs or GPUs on the fly depending on what is available. This would certainly make it simpler for programmers to multithread and take advantage of any hardware combination.

      • Freon
      • 10 years ago

      How much can we trust NV to give AMD access to PhysX and have it run at full speed?

    • A_Pickle
    • 10 years ago


    • murtle
    • 10 years ago

    WoW ! Only one FUBBY comment on this article! Where is everyone? This is very important point to start something really new.

    * to the stuff, Thanks for this brief review and please keep up digging beyond Neil. I think you smell something about He hide something and demonstrate diversion on this OpenCL thing. What is really DX11 with OCL? How much related each other ? Wait! Who are the inventor of OCL in the beginning? Actually why they have to do something like this ?

    • FubbHead
    • 10 years ago

    So Microsoft will do their own thing…..again. And I don’t see the death of DirectX anytime soon, so I guess it’s goodbye OpenCL aswell, then.

      • Aphasia
      • 10 years ago

      If they do it right and not like OpenGL, then it might survive just fine. That is, actually add and expand it with regards to the market, and not with regards to a committee.

      Not to mention its probably easier to use for non-direct x apps that dont have to go via the graphics pipeline. So for that reason alone i can see this becomming more prevalent in the stand alone apps. For physics accelaration within games, OpenCL or Direct X extension, who cares as long as it works.

        • Kurotetsu
        • 10 years ago


          • 10 years ago

          What’s more given the popularity and growth of embedded/handheld gaming and OpenGL ES’s dominance compared to DirectX in this area, OpenCL should remain relevant and continue to grow.

            • FubbHead
            • 10 years ago

            Yes, much like OpenGL, relevant in the embedded and handhelds segment, aswell as some (still) obscure systems. But that isn’t exactly the forefront of graphics technology, so is it really so much of a feat to remain relevant there? And if that is the main driving segment for this technology, I fear this will keep OpenCL, like OpenGL, 2-3 steps behind. But hopefully not, of course.

          • Aphasia
          • 10 years ago

          And please do tell, how large of a market share do these non-windows games entail?

          As for the scientific part, it has no correalation at all with gaming and is separate discussion.

      • Freon
      • 10 years ago

      I think a lot of the utility it will provide will be attractive to non-Windows (read non-DX capable) systems, such as “scientific computing” projects. So it might have a decent shot at least getting a good foothold there. How that will hold over to a larger consumer base is probably still seriously questionable.

Pin It on Pinterest

Share This