Single page Print

GF100 graphics architecture unveiled

Nvidia gives us a Fermi
— 9:26 PM on January 20, 2010

I have to admit, long-delayed graphics chips can be kind of fun. Instead of a single, overwhelming burst of information all at once, complete with performance data and hands-on impressions from the actual product, we get to partake in a slow drip-drip-drip of information about a new GPU architecture. That's certainly been the case with Nvidia's DirectX 11-class GPU, known variously as Fermi (the overarching GPU architecture) and GF100 (the first chip to implement that architecture). We've already had a look at the compute-specific bits of the Fermi architecture, and we've engaged in deep, informed speculation on its graphics capabilities, as well. We know exactly what the competition looks like, and gosh darn it, we'd like to lay hands on the GPU itself soon.

The first cards based on the GF100 aren't quite ready yet, though, and we have one more stop to make before we get that chance. After the Consumer Electronics Show in Las Vegas, Nvidia invited various members of the press, including your humble correspondent, to get a closer look at the particulars of the GF100's graphics hardware. We now know that a great deal of Rys's speculation about the GF100's graphics particulars was correct, but we also know that he was off in a few rather notable places. We've filled in quite a few details in surprising ways, as well. Keep reading as we round out our knowledge of the GF100's graphics architecture and explain why this GPU just might be worth the wait.

A graphics architecture overview
First things first, I suppose. The GF100 is late, and Nvidia made no bones about it in this most recent briefing about the chip. Drew Henry, head of the company's GeForce business, told us forthrightly that he'd prefer to have a product in the market now, but said of the situation, "It is what it is." At present, the message about the GF100's status is equal parts straightforward and cautious: GF100 chips are "in full production at TSMC" and we can expect to see products in "Q1 2010." If the chip is in production, we can probably assume the main sources of the product delays have been rectified in the latest silicon spin. Beyond that, we have very little: no product names, prices, clock speeds, or more precise guidance on ship dates.

By making the window extend throughout the first quarter of the year, Nvidia has given itself ample leeway. Products could ship as late as March 31st without missing that target. If I were to narrow it down, though, I'd probably expect to see products somewhere around the first of March, give or take a week or two.

Time will tell on that front, but we now have a trove of specifics about the operation of the GPU from Nvidia itself. We've covered the computational capabilities of the GF100 quite thoroughly in our two prior pieces on the architecture, so we'll focus most of our attention here on its graphics features. Let's begin, as we often do, with a high-altitude overview.

A functional block diagram of GF100. Source: Nvidia.

As GPUs become more complex, these diagrams become ever more difficult to read from this distance. However, much of what you see above is already familiar, including the organization of the GPU's execution resources into 16 SMs, or shader multiprocessors. Those SMs house an array of execution units capable of executing, at peak, 512 arithmetic operations in a single clock cycle. Nvidia would tell you the GF100 has 512 "CUDA cores," and in a sense, they might be right. But the more we know about the way this architecture works, the less we're able to accept that definition, any more than we can say that AMD's Cypress has 1600 "shader cores." The "cores" proper are really the SMs, in the case of the GF100, and the SIMD engines, in the case of Cypress. Terminology aside, though, the GF100 does have a tremendous amount of processing power on tap. Also familiar are the six 64-bit GDDR5 memory interfaces, which hold the potential to deliver as much as 50% more bandwidth than Cypress or Nvidia's prior-generation part, the GT200.

The first hint we have of something new is the presence of four "GPCs," (or graphics processing clusters, I believe, although I thought that name was taken by Gary Phelps' Choice, as we used to call our Dean of Students' preferred smokes back in college). Nvidia Senior VP of GPU Engineering Jonah Alben called the GPCs "almost complete, independent GPUs" when he first described them to us. As you can see, each one has its own rasterization engine, which points toward an intriguing departure from the norm.

Each GPC contain four SMs, and we'll have to zoom in on a single SM in order to get a closer look at the rest of the GF100's graphics-focused hardware.

A functional block diagram of GF100 shader multiprocessor. Source: Nvidia.

Now we can see that each SM has four texture units associated with it. More unconventionally, each SM also hosts a geometry unit, which Nvidia has creatively dubbed a "polymorph engine." Since the GF100 has four GPCs and 16 SMs, it has a total of 64 texture units and 16 polymorph engines. The Fermi architecture detailed here is scalable along several lines: variants could be made to have fewer GPCs and fewer numbers of SMs within each GPC. We can surely expect to see smaller chips based on this architecture that have been scaled down in one or both ways.

We should also note the GF100's ROP units, of which there are 48 ringing the L2 cache in the diagram above. With that bit added, we have sketched in full the general outlines of the GF100. What remains is to fill in some detail in several areas, starting, of course, with the curiously quad rasterizers and 16 geometry units.