Have you ever stayed awake at night wondering about the intricacies of Nvidia's general-purpose GPU computing implementation? Well, maybe not. But the few who have are in luck: David Kanter at Real World Technologies has strung together a surprisingly detailed article about how Nvidia's GT200 graphics processor does all its GPGPU magic.
Kanter starts off with a discussion of CUDA, the programming interface that lets developers write general-purpose apps for Nvidia GPUs. He covers the way the API lays out tasks for the GPU, how it handles memory, and how everything looks from the coder's perspective. Then, Kanter dives head-first into the GT200's hardware architecture, from the nooks and crannies of the stream processors to the memory pipeline.
Along the way, Kanter provides some interesting insight into how current GPGPU implementations compare, how they relate to more conventional parallel CPUs (like Cell and Sun's Niagara II), and what they can't do so well (namely workloads that require "complex data structures" such as trees, linked lists, and so on). According to Kanter, Nvidia's monolithic GPU design is better-suited to GPGPU applications than AMD's smaller, more multi-GPU-friendly RV770 chip. Kanter believes CUDA is currently "the only game in town for parallel computing," too.
As with pretty much all Real World Technologies pieces, we advise that non-technically minded readers stay well away—that is, unless they feel like having to scoop up their brains from the floor.