review gtc 2010 gpu computing grows up

GTC 2010: GPU computing grows up

Nvidia’s GPU Technology Conference had its beginnings two years ago, as part of Nvision, a broadly focused trade show and technical conference centered around all things Nvidia, or as the firm put it at the time, the “visual computing ecosystem.” Nvision encompassed jarringly disparate elements, from the gamer-oriented GeForce LAN party to an “emerging companies summit” intended to pair start-ups using GPU-based technology with venture funding. That event was even open to the public, and featured appearances from notable rent-a-celebs such as Battlestar Galactica siren Tricia Helfer and the Mythbusters guys.

Turns out that mix of things, though interesting, didn’t mesh terribly well, so last year, the firm killed off Nvision and opted to host the more narrowly focused GPU Technology conference instead—with an emphasis on GPU computing in a much smaller venue. This year, GTC was more tightly focused than ever, but it was back in the San Jose Convention Center. In spite of the larger venue, the halls and session rooms were frequently packed with attendees, and conference attendance seemed to be up substantially.

The bottom line is that GTC is growing even as it specializes on just one aspect of Nvidia’s business, the CUDA platform for GPU computing. That’s just one of many signals that point to an undeniable trend: the use of GPUs for non-graphics computation is on the rise, led largely by Nvidia’s efforts.

Those who are familiar primarily with the consumer side of Nvidia’s GPU business, headlined by GeForce graphics cards, may scoff at the notion that CUDA is gaining real traction. For years now, Nvidia has touted the benefits of GPU computing to potential GeForce owners while software developers have delivered precious few consumer applications that truly take advantage of its power. The consumer PC market seems to be waiting for a non-graphics application that really requires GPU-class computational throughput and for a cross-platform programming language—something like OpenCL, perhaps—to make it ubiquitous.

However, some folks don’t have the patience for such things, and they already have applications that require absolutely all of the processing power they can get. Those folks come from the halls of major research universities, from the R&D arms of major corporations, from motion-picture animation studios, and from the fields of oil and gas exploration. These are the people who build and use supercomputers, after all, and they can easily tax the capabilities of today’s best computers by attempting to simulate, say, the interactions of particles at the nanomolecular level. If the attendance and participation in GTC’s many technical sessions and keynotes is any indication, Nvidia appears to have taken the spark of a nascent market for GPUs four years ago and nurtured it into a healthy flame today.

GPU computing comes of age
The story starts four years ago with the introduction of the G80 graphics processor, Nvidia’s first DirectX 10-class GPU and the first chip into which the firm built notable provisions for GPU computing. At that time, Nvidia also introduced CUDA, its architecture model for harnessing the prodigious floating-point processing power and streaming memory throughput of the GPU for non-graphics applications. Since then, Nvidia has made GPU computing and Tesla-branded expansion cards one of the four key pillars of its business, investing substantial effort and money in building the software and development tools required to make GPU computing widespread.

Those efforts are beginning to pay off in tangible ways, as evidenced by a series of major announcements that Nvidia CEO Jen-Hsun Huang made in the GTC opening keynote speech.

The first of those was the revelation that compiler maker PGI plans to create a product called CUDA-x86. That name may lead to some confusion in certain circles, but the compiler won’t let PC-compatible programs run on a GPU (such a beast isn’t likely to work very well, even if it would answer some persistent questions about Nvidia’s business challenges). Instead, it’s the inverse: the PGI product will allow programs developed for the CUDA parallel programming platform to be compiled and executed on x86-compatible processors. As the PGI press release states, “When run on x86-based systems without a GPU, PGI CUDA C applications will use multiple cores and the streaming SIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution.” The availability of such a tool should assuage the concerns of application developers about being locked into a proprietary software solution capable of running on only one brand of hardware. Also, it could allow supercomputing clusters with heavy investments in x86 processors to develop CUDA applications that take advantage of all available FLOPS.

No less significant was the string of technical software packages that will be incorporating CUDA support, including the MatLab accelerated development environment, the Ansys physical product simulation tool, and the Amber molecular dynamics simulator. These announcements extend GPU computing’s reach in the worlds of engineering and research without requiring users to have an in-depth understanding of parallel programming.

To further bolster the case that GPU computing uptake is on the rise, Huang cited a trio of new server solutions that will integrate Tesla GPUs, most notably an IBM BladeCenter offering. In fact, he claimed every major server maker (or OEM) in the world now offers Tesla-based products. Huang’s goal for the keynote was no doubt to highlight the growing momentum for GPU computing, but the evidence he was able to cite was undeniably impressive.

Huang then unveiled what may have been the most widely reported slide from the keynote, offering a brief glimpse at future Nvidia GPU architectures and their projected performance per watt in double-precision math. He also noted that some additional computing-centric features will make their way into Nvidia’s GPUs between now and 2013, including pre-emption (for multitasking), virtual memory with page faulting, and the ability for GPU threads to be non-blocking. (The middle item there came out only as “virtual memory,” but we confirmed the page faulting feature later, on a hunch, since basic virtual memory is already a part of the Fermi chips’ feature set.) It wasn’t much information, but this quick look at the path ahead was unprecedented for Nvidia or any GPU maker, since they typically hold their cards very close to the vest. This deviation from the usual practice, Huang later explained, came because he wanted to assure developers that additional computational power is coming and to encourage them to plan to make use of it. Public roadmap disclosures are traditionally the work of processor companies like Intel, and in a sense, that’s what Nvidia intends to become.

‘ARM is our CPU strategy’
Jen-Hsun’s infamous declaration that Nvidia would “open up a can of whoop-ass” on Intel remained in the backdrop as we moved from the opening keynote to his subsequent press conference. Persistent and difficult questions about the future have haunted Nvidia since it first became clear that CPUs and GPUs were on a collision course. Those questions grew more urgent as Intel and AMD moved toward integration of additional components into the CPU. Already, integration has cost Nvidia its chipset business, and early next year, GPUs will make their way into mainstream CPU silicon for the first time. Last year at GTC, Huang was peppered with questions about Intel’s Larrabee project—aimed at producing a discrete GPU/parallel processor—and any possible Nvidia plans to produce an x86-compatible CPU.

This year, the tone was subtly different. There were fewer Larrabee questions, given that project’s unceremonious demise, but Huang didn’t dance on Larrabee’s grave, either. The “can of whoop-ass” bravado was replaced with a more subdued, truer confidence and an admission of the obvious fact that GPUs and CPUs must coexist. There were also fewer questions about whether Nvidia is secretly producing an x86-compatible CPU (although rumors persist), perhaps because any forays into that territory were always met with the same response: “ARM is our CPU strategy.”

That mantra refers to another Nvidia product, the mobile device-oriented Tegra system-on-a-chip that includes CPU cores licensed from ARM. Just as he did last year, Huang used the rise of smart phones and tablets as a sort of shield against questions about threats to his core PC graphics business. He pointed out that ARM-based operating systems have the fastest growing user bases and software ecosystems, and Huang even claimed that “the enthusiasts” have moved to mobile devices, citing the fact that outlets like Tom’s Hardware Guide and AnandTech are now reviewing smart phones. The message: the PC business is stagnant and poised to shrink. In an attempt to underscore his point, Huang then observed that as he looked around the room, all of the “sexy” and interesting computer devices present were mobile ones.

If PCs are dying, what does that mean for Nvidia’s core and utterly crucial GeForce business?

This assertion raised two questions. One, did he expect this room full of journalists to have toted along three-foot-tall tower cases and dual 30″ displays? We have some dead sexy computers that won’t fit into a travel bag. And two, if PCs are dying, what does that mean for Nvidia’s core and utterly crucial GeForce business? We suspect Huang overstated his case in order to draw attention from the 800-lb gorilla in the corner known as Intel. Still, one wonders whether, in the hype-driven world of tech, such repeated declarations of the poor prospects for the PC market won’t eventually become self-fulfilling prophecy, to the detriment of all involved.

With that said, the prospects for Tegra are indeed fascinating, because it is the only one of Nvidia’s four tentpole businesses not based directly on the firm’s leading-edge GPU silicon. The others—GeForce, Quadro, and Tesla—rely on the same chips. Tegra is also the tentpole most conspicuously struggling to find customers. Since its announcement at CES in early 2010, the second-generation Tegra hasn’t shipped in a single consumer product, as far as we know. Huang did foreshadow some possible successes on that front by talking about “superphones,” or leading-edge smart phones, coming into their own in the near future. We heard whispers all week about possible Android-based Tegra phones debuting in the CES 2011 time frame. Huang also confirmed Nvidia’s commitment to Tegra going forward by proclaiming that Tegra 3 is “almost done” and Tegra 4 is “being built,” so the investment in new designs will apparently continue, even without current customers.

One intriguing question rattling around in the back of our minds is what advantages, exactly, the second-gen Tegra might have to offer compared to smart phones and tablets (or “tablet,” since the iPad is nearly alone) already on the market. Nvidia had some impactful demos at CES showing modern 3D game engines running on Tegra, but we now have similar examples quite literally in our pockets. Nvidia revealed precious little about the latest Tegra’s graphics architecture at CES, although we understand further details may be forthcoming as the first products, at long last, make their way to store shelves.

We can’t help but think Tegra’s struggles to date mirror the travails of another mobile system-on-a-chip (SoC) from a major PC chipmaker: Intel’s Atom, and particularly the Moorestown platform. Both Intel and Nvidia, kings of the traditional PC market, have had very limited success in the burgeoning market for mobile devices—a place where the major players like Apple and Samsung seem quite happy to license CPU cores from ARM and graphics cores from the likes of Imagination Tech and design their own SoCs. We expect the offerings from Intel and Nvidia will have to be dramatically better than their home-cooked recipes in order to persuade one of these major players to eschew vertical integration and buy into a pre-baked total platform solution. So far, we’ve seen nothing to indicate that’s likely to happen any time soon.

Getting technical
Beyond the opening keynote and press conference, GTC broke out into several days worth of tightly focused technical sessions on a range of topics related to GPU computing. Huang said Nvidia had received 67 submissions of papers for possible GTC presentations last year. In 2010, that number climbed to 334.

Posters showing off the research papers submitted to GTC lined the halls of the show

Not every paper turned into a technical session, but the selection of sessions was broad and eclectic, though focused almost entirely in the realms of scientific research, engineering and design, content creation, and quantitative finance. A sampling of session titles: “Binary Black Holes Simulations using CUDA,” “CU-LISP: GPU-based Spectral Analysis of Unevenly Sample Data,” “CUDA-FRESCO: An Efficient Algorithm for Mapping Short Reads,” and the somewhat unnerving “Bridging GPU Computing and Neuroscience to Build Large-Scale Face Recognition on Facebook.” The sessions we checked out were well attended, and most of them consisted of smart people with ridiculous amounts of specialized, domain-specific knowledge sharing information about GPU computing within and across disciplines.

Some of the equations were a little hard to follow, though.

Many of the discussions centered on how the presenters had struggled and eventually succeeded in mapping key parts of their computational problems to the CUDA programming model. Often, doing so required the combined efforts of a team of smart people and a willingness to let go of traditional algorithms and approaches. Not every problem and data set maps well to the GPU; the CPU remains the best place to do some work. Time and again, though, we heard claims of big performance gains when GPU optimization efforts paid off—from lows of around 2X the performance of a pair of Xeons to highs in the 40X range. We’d say reports of gains on the order of 8-10X the speed of a CPU-only solution were common—and those reports were coming from independent academics and researchers with supreme credibility.

Inevitably, when you have such a collection of folks using your processors for their work, there are decidedly cool applications among the bunch. In his day-two keynote, Klaus Schulten of the University of Illinois at Urbana-Champaign presented six examples of how researchers are using GPU clusters to simulate the interactions of objects at the atomic level—the sort of computational biology work that many readers should know from the [email protected] project. One of Schulten’s examples was an effort to understand how photosynthesis works. What must be simulated is a process that involves the electrostatic charging of a spherical structure consisting of roughly 10 million atoms. This simulation can be broken down into a series of operations performed on a grid, which maps very well to the GPU programming model. As a result, the team was able to conduct a simulated step on a trio of G80-class GPUs in 90 seconds. On a single CPU core, that same simulation takes an hour and 10 minutes. As Schulten noted, that’s a 46X speed-up, and such gains open up new frontiers for research.

Other mind-blowing applications included a doctor pioneering the field of robotically assisted, minimally invasive heart surgery. One of his goals was to achieve a motion compensation algorithm for a beating human heart, which would allow a surgeon to make remote incisions on what appears to be a static organ—while the robotics compensate for the motion and cut into the moving tissue as the heart pumps.

Nvidia is unquestionably getting real traction with the people on the frontiers of scientific computing. Meanwhile, the firm continues to invest in the development tools, math libraries, and partnerships necessary to build a robust GPU computing ecosystem. Many of those tools, like the Parallel Nsight debugging and profiling plug-in for Microsoft Visual Studio, are available for download, free of charge. Sanford Russell, Nvidia’s GM of CUDA & GPU computing, insists Nvidia’s main interest is fostering the growth of GPU computing and thus selling chips, not in growing a software business or a closed, proprietary development platform.

So what’s next?
What we don’t know yet is whether—or how much—this traction for GPU computing in certain quarters will turn into growth for Nvidia’s Tesla business. The company is addressing a new market here, with nothing but cursory competition from rival GPU makers like AMD. We’ve seen some sky-high projections from Nvidia in the past about the size of Tesla’s potential market, but the dominoes haven’t fallen that way just yet. They may be poised to do so soon.

Another vexing question is how these advances in GPU computing will translate into consumer applications. So far, all we’ve seen is a handful of video encoding apps, Nvidia’s proprietary PhysX game enhancements, and a [email protected] client. That must-have killer application for GPU computing remains elusive, and may be so for years to come.

The area with the largest potential for GPU computing, oddly enough, may be gaming, where GPUs already reign supreme. We’ve become increasingly convinced that pushing toward further visual realism in real-time graphics will require the use of physical simulations in place of artist-tweaked animations. During the GTC opening keynote, Nvidia’s Tony Tamasi presented an amazing real-time demo of fluid flow around a lighthouse, inspired by a project at Stanford that won an Academy Award. In such cases, work being done in universities and at movie studios may translate fairly directly into similar effects for video games.

Whatever happens next, it seems increasingly clear that we’ll be talking about Nvidia as a processor company, and not just a graphics company, in the years to come.

0 responses to “GTC 2010: GPU computing grows up

  1. On the subject of folding (as mentioned in the article), did anything at all come from folding so far? And don’t say BSE ;/

  2. Interesting article, thanks. I was aware of most of those points in general, I just take every opportunity I get to poke finance in the eye (they are one of the few “industries” that richly deserve it).

  3. The quants were using far more powerful hardware already; GPUs just allow them to (potentially) pack it into smaller area. Finance has always been one of the “big six” areas for HPC, along with geophysics (petroleum exploration), pharma research, medical imaging, 3D rendering for Hollywood, and defense.

    §[<<]§ In fact, to a large extent finance has been in the driver's seat for the past decade, because they had the deepest pockets. Well, until recent events, anyway.

  4. GPUs are not flat SIMD arrays, there’s no performance penalty for branches if they don’t intersect a SIMD vector. On an Nvidia card, that means as long as your threads branch together in groups of 32, you’re fine. On AMD, your threads should branch together in groups of 64 to avoid performance penalty.

    For example, on Nvidia, if one thread branches different from the rest, you’ll stall 31 threads. Since you can have 23040 threads in flight (GTX480), it’s negligable – the remaining 23008 threads will keep running without penalty.

  5. as far as I know, AI is very branchy. GPUs don’t like branches(if/else/etc)

    GPUs are like SIMD, except instead of applying the same instruction to the same data, the same instruction is applied to multiple cores. If one core has a branch, all other cores must stall and wait for the branch to converge with the current logic flow.

    Imagine 1799 of your 1800 shader units coming to a halt because one core decided to have a branch.

  6. I’d like to see gpu’s used to do much more intelligent AI for real time strategy games and to allow much large numbers of units.

    I’d love to see something like Age Of Empires II with 200+ units a side where the AI could play as smart as a human player without cheating (it periodically gets a resource bonus).
    At the very least gpus could accelerate unit pathing.
    I’d also like to see techniques like Antiobjects (wikipedia it) to cause units to avoid areas where other units have died. (A bit like crushed ant smell alerts other ants).

  7. Damn. Articles like that are why I love TR. I’m not as smart and 4/5ths of you eggheads. 😉 But that was an easy and informative read indeed.

  8. See the original post, LG has announced plans for Tegra 2 smartphones in Q4.
    edit: Failed. Was reply to Voldenuit 🙁

  9. I would also like to comment that the credibility of academic researchers isnt always that great. I am myself an academic (in GPGPU compilers to be specific) and I have often seen papers published that claim 10x speedups over CPU only to find that it was over a non-optimized single core CPU or that the algorithm being compared is non-optimal on the CPU.

    Academics also want to claim big numbers in papers because it makes it more likely that their paper will be accepted. Reviewers from a physics journal for example may not be experts in GPU computing so often will accept papers that look good enough and claim big speedups using GPUs even though an expert programmer can probably spot issues with their implementation. And in academics, buzzwords and hype are just as much of a force as they are in consumer electronics and nowadays GPGPU is a buzzword for scientific computing so getting papers about GPGPU accepted is not very hard.

    An old but funny paper about performance claims in HPC: §[<<]§ However, standards have certainly improved recently and it is fair to say that indeed many scientific problems benefit greatly and that Nvidia has indeed done a great job of providing and continuosly improving programming tools for GPUs as well as spreading the word and providing education to the community. Gains in many fields are definitely real and good job Nvidia.

  10. Interesting that these are all tablets. I wonder if this means Tegra 2’s power consumption/heat is too high for smartphones?

    While the ARM tablets are pretty interesting, I do wish for an x86 tablet (maybe Bobcat) so I won’t have to rebuy all my software. Although the catch-22 is that windows apps are horribly suited to touch UIs, so are somewhat pointless on a tablet.

  11. Second generation Tegra is shipping in several products though they might not be high profile or high volume products. For example, Toshiba’s AC1000 Android smartbook already shipping in Japan and Europe is also based on Tegra 2. Keyboardless tablet version of the same product is also coming soon.

    Several other products have been announced. Elocity A7 is a tablet with Tegra2 available for preorder on Amazon. LG has also announced that it will introduce Tegra 2 based smartphones in Q4.

    There have also been rumors that the Blackberry Playbook is based on Tegra 2 and that HTC and Motorola tablets will also use it but no confirmation on any of these.