ROPs and antialiasing
![]() A ROP pipeline logical diagram Source: NVIDIA. |
Like everything else on the G80, these ROPs are substantially improved from the G71. Each ROP partition can process up to 16 color samples and 16 Z samples per clock, or alternately 32 Z samples per clock (which is useful for shadowing algorithms and the like.) That adds up to a total peak capacity of 96 color + Z samples per clock or 192 Z samples per clock. An individual ROP partition can only write four pixels per clock to memory, but the overload of sample capacity should help with antialiasing. Nvidia has also endowed the G80's ROPs with improved color and Z compression routines, that they claim are twice as effective as the G71's.
Most notably, perhaps, the G80's ROPs can handle blending of high-precision data formats in conjunction with multisampled antialiasinggone is the G71's inability to do AA along with HDR lighting. The G80 can handle both FP16 and FP32 formats with multisampled AA.
Speaking of multisampled antialiasing, you are probably familiar by now with this AA method, because it's standard on both ATI and Nvidia GPUs. If you aren't, let me once again recommend reading this article for an overview. Multisampled AA is too complex a subject for me to boil down into a nutshell easily, but its essence is like that of many antialiasing methods; it captures multiple samples from inside the space of a single pixel and blends them together in order to determine the final color of the pixel. Compared to brute-force methods like supersampling, though, multisampling skips several steps along the way, performing only one texture read and shader calculation per pixel, usually sampled from the pixel center. The GPU then reads and stores a larger number of color and Z samples for each pixel, along with information about which polygon covers each sample point. MSAA uses the Z, color, and coverage information to determine how to blend to the pixel's final color.
Clear as mud? I've skipped some steps, but the end result is that multisampled AA works relatively well and efficiently. MSAA typically only modifies pixels on the edge of polygons, and it makes those edges look good.
For the G80, Nvidia has cooked up an extension of sorts to multisampling that it calls coverage sampling AA, or CSAA for short. Present Nvidia and former SGI graphics architect John Montrym introduced this mode at the G80 press event. Montrym explained that multisampling has a problem with higher sample rates. Beyond four samples, he asserted, "the storage cost increases faster than the image quality improves." This problem is exacerbated with HDR formats, where storage costs are higher. Yet "for the vast majority of edge pixels," Montrym said, "two colors are enough." The key to better AA, he argued, is "more detailed coverage information," or information about how much each polygon covers the area inside of a pixel.
CSAA achieves that goal without increasing the AA memory footprint too drastically by calculating additional coverage samples but discarding the redundant color and Z information that comes along with them. Montrym claimed CSAA could offer the performance of 4X AA with roughly 16X quality. This method works well generally and has some nice advantages, he said, but has to fall back to the quality of the stored color/Z sample count in a couple of tough cases, such as on shadow edges generated by stencil shadow volumes.
Nvidia has added a number of new AA modes to the G80, three of which use a form of CSAA. Here's a look at the information stored in each mode:

The 8X, 16X, and 16xQ modes are CSAA variants, with a smaller number of stored color/Z samples than traditional multisampling modes. 16X is the purest CSAA mode, with four times the coverage samples compared to color/Z. Note, also, that Nvidia has added a pure 8X multisampled mode to the G80, dubbed 8xQ.
We can see the locations of the G80's AA sample points by using a simple FSAA test application. This app won't show the location of coverage-only sample points for CSAA, unfortunately.
| GeForce 7900 GTX | GeForce 8800 GTX | Radeon X1950 XTX | |
| 2X | ![]() | ![]() | ![]() |
| 4X | ![]() | ![]() | ![]() |
| 6X | ![]() | ||
| 8x | ![]() | ||
| 8xS/8xQ | ![]() | ![]() | |
| 16X | ![]() | ||
| 16xQ | ![]() |
Lo and behold, Nvidia's 8xQ multisampled mode introduces a new, non-grid-aligned sample pattern with a quasi-random distribution, much like ATI's 6X pattern. I've tried to shake a map of the 16 sample pattern out of Nvidia, but without success to date.
The big question is: does CSAA work, and if so, how well? For that, we have lots of evidence, but we'll start with a quick side-by-side comparison of 4X multisampling with the three CSAA modes. Here's a small example with some high-contrast, near-vertical edges, straight out of Half-Life 2. I've resized these images to precisely four times their original size to make them easier to see, but they are otherwise unretouched.
| Coverage Sampling AA Quality | |||
| 4X | 8X | 16X | 16xQ |
![]() | ![]() | ![]() | ![]() |
To my eye, CSAA works, and works well. In fact, 16xQ looks no more effective to me than the 16X mode, despite having twice as many stored color and Z samples. You can see direct comparisons of all of the G80's modes to the G71 and R580+ in the larger image comparison table on the next page, but first, let's look at how CSAA impacts performance.

This isn't quite 16X quality at 4X performance, but CSAA 8X and 16X both have less overhead than the 8xQ pure multisampled mode. In fact, CSAA 16X may be the sweet spot for image quality and performance together.
These numbers from Half-Life 2 are also our first look at G80 performance in a real game. Perhaps you're getting excited to see more, in light of these numbers? Before we get there, we have a couple of quick AA image quality comparisons between the GPUs to do.

















