OpenCL vs. other APIs, multi-core CPUs
We didn't beat around the bush. We asked Trevett how the different APIs for graphics processor computing—C for CUDA, Brook+, DirectX Compute Shader—are going to co-exist with OpenCL. Here's how he responded:
That's actually interesting. The graphics APIs have been roughing it out for over a decade now. . . . It's actually not as hard as people think to move from one API to the other, but people do care quite a lot about the APIs that they use. I think it's actually less of a big decision for the parallel programming community, and there are already multiple languages for programming the CPUs—C, C++, C#, Java, [etc]—and that's fine. People have the choice to pick a language that best suits their particular situation and their technical requirements.
So, I think it's actually not a problem. I actually think it's a positive and healthy thing that there are multiple programming languages out there for people to choose from to tap into parallel programming. For some application developers, platform portability will be the key driver, others with more specifications, they might choose to go with a vendor-specific language like C for CUDA. It doesn't matter, actually, as long as they're enabled to tap into parallel-compute goodness. That's sort of what really matters at the end.
But the other interesting dynamic, though, and something that might factor into the choice that these individual developers might make—you've probably had this conversation with our CUDA team—is that OpenCL and C for CUDA are actually at very different levels. OpenCL is the typical Khronos API. Khronos likes to build the API as close as possible to the silicon. We call it the foundation-level API that everyone is going to need. Everyone who's building silicon needs to at some point expose their silicon capability at the lowest and most fundamental, and in some ways the most powerful, level because we've given the developer pretty close access to the silicon capability—just high enough abstraction to enable portability across different vendors and silicon architectures. And that's what OpenCL does. You have an API that you have control over the way stuff runs. It gives you that level of control.
Whereas C for CUDA, it takes all of that low-level decision making and automates it. So you just write a C program, and the C for CUDA architecture will figure out how to parallelize. Now, some developers will love that, because it's much easier, and the system is doing a lot more figuring out for you. Other developers will hate that, and they will want to get down to bits and bytes and have a more instant level of control. But again, it's all good, and as long as the developers are educated as to what are the various approaches that the different programming languages are taking, and are enabled to pick the one that best suits their needs, I think that's a healthy thing.
But, perhaps more importantly, how does OpenCL compare with DirectX 11 Compute? Trevett addressed the subject twice, noting the following at the beginning of our interview:
It's interesting to compare and contrast DirectX Compute Shaders with OpenCL. The approach we've taken with OpenCL is that you don't have to use OpenCL with OpenGL obviously if you were using compute in a visual application. But the advantage of having OpenGL as a standalone compute solution is that you can get portability across a lot more different types of silicon architectures, CPUs as well as GPUs. . . . OpenCL is a very robust compute solution rather than compute within the context of the graphics pipeline, which is more the approach that DX 11 Compute Shaders have taken.
When we pressed him for details later on, he added the following:
I think DirectX 11 Compute is still under NDA, so I don't want to go into that yet. Other than the obvious thing we mentioned before, which is that OpenCL is a standalone, complete compute solution you can use for protein folding and particle analysis never touching the pixel, and you have the option of interopping it very closely with OpenGL, so you can use it for image processing and feeding into and feeding out of the OpenCL pipeline.
Versus the approach that DirectX 11 Compute takes, which is . . . "super shaders", which are like general-purpose C shaders. But those shaders exist within the context of the DX graphics pipeline, so it's intended to soup up your graphics applications but you'd probably find it more difficult to write, you know, a general-purpose animation package. There's a difference in approach.
Finally, we were curious about OpenCL and GPU computing in general versus the CPU. Let's imagine a system with four CPU cores and a relatively slow integrated GPU: for a task like video transcoding, would it be better to use the GPU through OpenCL or the CPU? Will consumers have to face that trade-off, needing to choose between the GPU and CPU to get the best performance in certain apps, or will it be so clear-cut that they'll want to use the GPU every time?
It depends on a number of things. The high-order bit is that it depends on the application and the amount and type of parallel processing that's available within an application. And imaging applications and video applications and other applications where you're just dealing with large parallel data sets—not necessarily pixels, but for consumers, images and videos are the obvious big parallel data sets that people deal with every day—there's a degree of parallelism there that is easily distributed over the hundreds of cores that you get in a GPU.
If you have a different type of application, where the parallelism is either not present, meaning there's simply nothing happening in parallel, or the parallelism is a lot more difficult to extract—regardless of the API or programming language you're using, it's just hard to parallelize—then that application will have more affinity to running on a CPU.
Now over time, the two will begin to merge. We're getting multi-core CPUs and the GPUs are getting more and more programmable. So over time, applications in the middle will have a grown choice. They could run essentially on either. So, again, we're in the pretty early stages of this market developing, so I think the first wave of OpenCL applications, we're probably gonna find applications that choose one or the other, probably. You will find some applications with not too much parallelism that will want to run on four-core or eight-core CPUs. Applications like imaging and video, it's obvious that it's gonna get a pretty big-time speedup running on hundreds of cores on a GPU.
So, the first roll of applications will make that hard choice at programming time. But as the silicon architectures get more advanced, and the APIs evolve and get more querying capabilities, so the application can tell dynamically what's in the machine and what the machine's already doing. I mean, if the GPU's hard at work playing a video game and then the user wants to kick off video transcoding, some dynamic balancing decisions will be made. And over time, the APIs will begin to enable the application in real time to figure out where they can best run on a machine. And over time you will find applications that do dynamically decide where they're gonna run and make best use of the resources as they are available in real time on a device. Most developers and APIs aren't quite there yet, with maybe that level of dynamic load-balancing, but I think that's the ideal that everyone will be working towards.
Here, Trevett's answer was especially interesting in light of Nvidia's latest PR campaign, which has involved talking down the importance of the CPU and hailing the GPU as a sort of computing panacea. Khronos and Trevett seem to be taking a more pragmatic view, hoping OpenCL can dynamically tap into the computing resources of any capable processor. With the line between CPU and GPU likely to blur only further in the future, that approach probably makes sense. (Just in case you forgot, Intel is just months away from releasing its first x86 CPUs with built-in graphics cores, and we expect to see the chipmaker launch Larrabee, an x86-derived GPU, next year.)
With all that said, OpenCL looks to have a bright future ahead of it. Trevett suggested that DirectX Compute Shader is more limited, especially since Microsoft has tied to Windows, so developers could flock mostly to Khronos' API for their GPU compute needs. That would give us a wealth of general-purpose apps that can get a boost from Intel, Nvidia, and AMD GPUs and run across different operating systems. Down the line, developers should also be able to get their GPU-compute-enabled apps running on handhelds and cell phones. Exciting stuff. Now, we all we have to do is wait for developers to make some cool things with these new tools.
16 comments — Last by Paulomat at 9:46 AM on 08/21/09
|We discuss the GeForce GTX 970 memory controversyDissecting the issue||29|
|Nvidia: the GeForce GTX 970 works exactly as intendedA look inside the card's unusual memory config||200|
|Nvidia's GeForce GTX 960 graphics card reviewedThe green team brandishes a sawed-off shotgun||235|
|Catalyst Omega driver adds more than 20 features, 400 bug fixes...and some performance improvements, to boot||162|
|TR's 2014 Christmas gift guideThe best techie-friendly items for under your tree||52|
|GeForce GTX 970 cards from MSI and Asus reviewedMaxwell's silver hammer comes in two stylish models||55|
|GeForce GTX 980 cards from Gigabyte and Zotac reviewedThe Gigabyte G1 Gaming takes on the Zotac AMP! Omega||47|
|Maxwell's Dynamic Super Resolution explored4K resolutions on smaller displays? Hmm||115|
|We discuss the GeForce GTX 970 memory controversy||29|
|WSJ: Microsoft to back Cyanogen with $70M investment||41|
|You've goat to check out Silicon Power's new thumb drive||47|
|The TR Podcast 169 video: Win10, Elon's musk, and the gimpy GTX 970||0|
|In the lab: Dell's Venue 8 7000 tablet||30|
|Qualcomm posts record revenue, loses high-profile design||23|
|Intel refreshes high-endurance server SSDs with 20-nm NAND||15|
|The TR Podcast is live on Twitch right now||1|