Single page Print

Nvidia's GeForce GTX 580 graphics processor


Another spin 'round a very big block
— 8:00 AM on November 9, 2010

As you may know, the GeForce GTX 480 had a troubled childhood. The GF100 chip that powered it was to be Nvidia's first DirectX 11-class graphics processor, based on the ambitious new Fermi architecture. But the GF100 was famously tardy, hitting the market over six months after the competition's Radeon HD 5000-series of DX11-capable chips. When it did arrive aboard the GTX 470 and 480, the GF100 had many of the hallmarks of a shaky semiconductor product: clock speeds weren't as fast as we'd anticipated, power consumption and heat production were right at the ragged edge of what's acceptable, and some of chip's processing units were disabled even on the highest-end products. Like Lindsay Lohan, it wasn't living up to its potential. When we first tested the GTX 480 and saw that performance wasn't much better than the smaller, cooler, and cheaper Radeon HD 5870, we were decidedly underwhelmed.

Yet like Miss Lohan, the GF100 had some rather obvious virtues, including formidable geometry processing throughput and, as we learned over time, quite a bit of room for performance increases through driver updates. Not only that, but it soon was joined by a potent younger sibling with a different take on the mix of resources in the Fermi architecture, the GF104 chip inside the very competitive GeForce GTX 460 graphics cards.


Little did we know at the time, but back in February of this year, before the first GF100 chips even shipped in commercial products, the decision had been made in the halls of Nvidia to produce a new spin of the silicon known as GF110. The goal: to reduce power consumption while improving performance. To get there, Nvidia engineers scoured each block of the chip, employing lower-leakage transistors in less timing-sensitive logic and higher-speed transistors in critical paths, better adapting the design to TSMC's 40-nm fabrication process.

At the same time, they made a few targeted tweaks to the chip's 3D graphics hardware to further boost performance. The first enhancement was also included in the GF104, a fact we didn't initially catch. The texturing units can filter 16-bit floating-point textures at full speed, whereas most of today's GPUs filter this larger format at half their peak speed. The additional filtering oomph should improve frame rates in games where FP16 texture formats are used, most prominently with high-dynamic-range (HDR) lighting algorithms. HDR lighting is fairly widely used these days, so the change is consequential. The caveat is that the GPU must have the bandwidth needed to take advantage of the additional filtering capacity. Of course, the GF110 has gobs of bandwidth compared to most.

The second enhancement is unique to GF110: an improvement in Z-culling efficiency. Z culling is the process of ruling out pixels based on their depth; if a pixel won't be visible in the final, rendered scene because another pixel is in front of it, the GPU can safely neglect lighting and shading the occluded pixel. More efficient Z culling can boost performance generally, although the Z-cull capabilities of current GPUs are robust enough that the impact of this tweak is likely to be modest.

The third change is pretty subtle. In the Fermi architecture, the shader multiprocessors (SMs) have 64KB of local data storage that can be partitioned either as 16KB of L1 cache and 48KB of shared memory or vice-versa. When the GF100 is in a graphics context, the SM storage is partitioned in a 16KB L1 cache/48KB shared memory configuration. The 48KB/16KB config is only available for GPU computing contexts. The GF110 is capable of running with a 48KB L1 cache/16KB shared memory split for graphics, which Nvidia says "helps certain types of shaders."

Now, barely nine months since the chip's specifications were set, the GF110 is ready to roll aboard a brand-new flagship video card, the GeForce GTX 580. GPU core and memory clock speeds are up about 10% compared to the GTX 480—the GPU core is 772MHz, shader ALUs are double-clocked to 1544MHz, and the GDDR5 memory now runs at 4.0 GT/s. All of the chip's graphics hardware is enabled, and Nvidia claims the GTX 580's power consumption is lower, too.

Peak pixel
fill rate
(Gpixels/s)
Peak bilinear
FP16  texel
filtering rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
Peak shader
arithmetic
(GFLOPS)
Peak
rasterization
rate
(Mtris/s)
GeForce GTX 460 1GB 810MHz 25.9 47.6 124.8 1089 1620
GeForce GTX 480 33.6 21.0 177.4 1345 2800
GeForce GTX 580 37.1 49.4 192.0 1581 3088
Radeon HD 5870 27.2 34.0 153.6 2720 850
Radeon HD 5970 46.4 58.0 256.0 4640 1450

On paper, the changes give the GTX 580 a modest boost over the GTX 480 in most categories that matter. The gain in FP16 filtering throughput, though, is obviously more prodigious. Add in the impact of the Z-cull improvement, and the real-world performance could rise a little more.

A question of balance

ROP
pixels/
clock
Textures
filtered/
clock
int/fp16
Shader
ALUs
Rasterized
triangles/
clock
Memory
interface
width (bits)
GF104 32 64/64 384 2 256
GF100 48 64/32 512 4 384
GF110 48 64/64 512 4 384
Cypress 32 80/40 1600 1 256
Barts 32 56/28 1120 1 256

Notably, other than the increase in FP16 filtering rate, the GF110 retains the same basic mix of graphics resources as the GF100. We'll raise an eyebrow at that fact because the GF104 is arguably more efficient, yet it hits some very different notes. Versus GF100/110, the GF104's ROP rate, shader ALUs, and memory interface width are lower by a third, the rasterization rate is cut in half, yet the texturing rate remains constant.

Not reflected in the tables above is another element: the so-called polymorph engines in the Fermi architecture, dedicated hardware units that handle a host of pre-rasterization geometry processing chores (including vertex fetch, tessellation/geometry expansion, viewport transform, attribute setup, and stream output). The GF104 has eight such engines, while the GF100/110 have 16. Only 15 of the 16 are enabled on the GTX 480, while the GTX 580 uses them all, so it possesses even more geometry processing capacity than anything to come before. (If you want to more fully understand the configuration of the different units on the chip and their likely impact on performance, we'd refer you to our Fermi graphics architecture overview for the requisite diagrams and such. Nearly everything we said about the GF100 still applies to the GF110.)

Also still kicking around inside of the GF110 are the compute-focused features of the GF100, such as ECC protection for on-chip storage and the ability to handle double-precision math at half the rate of single-precision. These things are essentially detritus for real-time graphics, and as a consequence of product positioning decisions, GTX 580's double-precision rate remains hobbled at one-quarter of the chip's peak potential.


The GF110 hides under a metal cap that's annoyingly tough to remove

We detected some trepidation on Nvidia's part about the reception the GF110 might receive, given these facts. After all, there was a fairly widespread perception that GF100's troubles were caused at least in part by two things: its apparent dual focus on graphics and GPU computing, and its clear emphasis on geometry processing power for graphics. The GF104's singular graphics mission and improved efficiency in current games only fed this impression.

Nvidia's counter-arguments are worth hearing, though. The firm contends that any high-end GPU like this one has plenty of throughput to handle today's ubiquitous console ports, with their Unreal engine underpinnings and all that entails. The GF110's relative bias toward geometry processing power is consistent with Nvidia's vision of where future games based on DirectX 11 should be headed—with more complex models, higher degrees of tessellation, and greater geometric realism. In fact, Drew Henry, who runs Nvidia's GeForce business, told us point blank that the impetus behind the GF110 project was graphics, not GPU computing products. That's a credible statement, in our estimation, because the GF100-based Tesla cards have essentially zero competition in their domain, while the GeForces will face a capable foe in AMD's imminent Cayman GPU.

Our sense is that, to some extent, the GF110's success will depend on whether game developers and end users are buying what Nvidia is selling: gobs and gobs of polygons. If that's what folks want, the GF110 will deliver in spades. If not, well, it still has 50% more memory bandwidth, shader power, and ROP throughput than the GF104, making it the biggest, baddest GPU on the planet by nearly any measure, at least for now.