Single page Print

Render back-ends and antialiasing


R600 render back-end logical diagram
Source: AMD.

To the right is a logical diagram of one of the R600's render back-ends. (Nvidia calls these ROPs, if you're wondering.) The R600 packs four of these units, and they work pretty much as you'd expect from the diagram. They can output four pixels per clock to the frame buffer and can process depth and stencil tests at twice that rate. Among the improvements from R580 are higher peak rates of Z and stencil compression, some improvements to common Z-buffer optimizations, and the ability to use FP32-format Z buffers for higher depth precision.

The render back-ends are also traditionally the place where the resolve process for multisampled antialiasing happens. AMD has carried over all of the previous antialiasing goodness of its prior chips in R600, including gamma-correct blends, programmable sample patterns, temporal AA, and Super AA modes for CrossFire load balancing. The R600 trades the older GPU's 6X multisampling mode for a new 8X mode that, duh, offers higher quality by virtue of more samples. I've added the R600's default sample patterns to my Giant Chart of AA Sample Patterns, producing the following glorious cornucopia of colored dots. As always, the green dots represent texture/shader samples, pink dots represent color/Z and coverage samples, and teensy red dots represent coverage samples alone (for Nvidia's CSAA modes).

I've included the Radeon HD 2900 XT's CrossFire SuperAA mode in a separate column, although SuperAA is presently limited to a single mode on the 2900 XT. I've also included composite sample patterns for the 2900 XT's temporal AA modes. These sample patterns actually occur in two halves over the course of two frames whenever frame rates go above 60 FPS. My current assessment of temporal AA: meh. It sounded like a good idea at the time, but AMD could spike it and I'd be happy.

 GeForce
7900 GTX
GeForce
7900 GTX
SLI
GeForce
7950 GX2
SLI
GeForce
8800 GTX
Radeon
X1950 XTX
Radeon
X1950 XTX
CrossFire
Radeon
HD
2900 XT
Radeon
HD
2900 XT
Temporal 
Radeon
HD
2900 XT
CrossFire
2X

  

 

 
4X

  

 

 
6X    

   
8x   

 

 
8xS/8xQ
/8X/10X

 

   
12X     

   
14X     

   
16X 

    

16xQ   

     
32X  

      

And so the grand table adds the R600's distinctiveness to its own. As ever, AMD has used a nice quasi-random pattern in the R600's new 8X multisampled mode.

So that's part of the story. After seeing Nvidia's very smart coverage sample antialiasing technique in the G80, I had doubts about whether AMD could answer with something as good and innovative itself. To recap in a nutshell, coverage sampled AA does what it appears to do in the table above: stores more samples to determine polygon coverage while discarding color/Z samples it doesn't necessarily need. That keeps its memory footprint and performance overhead low, yet it generally produces good results, as you'll see in the examples on the following pages.

AMD's answer to coverage sampled AA is made possible by the fact that the render back-ends in the R600 can now quickly pass data back to the shaders, and that leads to AMD's latest innovation: custom filter antialiasing. The essence of CFAA is that R600 can run a multitude of antialiasing filters, with a programmable resolve stage, allowing for all kinds of new and different AA voodoo. That voodoo starts with a couple of new filters AMD has included with the first round of Radeon HD drivers: a pair of tent filters. Unlike the traditional box filter, these tent filters reach outside of the bounds of the pixel to grab extra samples. Here are a couple of examples, with narrow and wide tent filters using the Radeon HD's 8X sample pattern, from AMD.

The narrow tent grabs a single sample from each neighboring pixel, while the wide tent grabs two. That leads to an effective sample size of 12X for the narrow tent and 16X for the wide tent. The HD 2900 XT can also combine narrow and wide tent filters with its 2X and 4X AA modes for effective sample sizes of 4X, 6X, 6X again, and 8X.

Those of you who are old-school PC graphics guys like me may be having some serious, gut-wrenching flashbacks right now to Nvidia's screen-blurring Quincunx mode from GeForces of old. These tent filters are fairly smart about how they go about their business, though; they compute a weighted average of the samples based on a linear function that decreases the weight of samples further from the pixel center. Tent filters do introduce a measure of blurring across the whole screen, but the effect is very subtle, as you can see in the example below. The base AA mode is 8X multisampled.

Box - 8X MSAANarrow tent - 12XWide tent - 16X

The blurring is most obvious in the text, but it is in fact a full-scene affair. Look at the leaves on the sidewalk below the park bench, the bricks and windowpanes of the building behind, or the cobblestone texture on the street. The tent filters blur all of these things subtly, which leads to a tradeoff: images aren't as sharp, but high-frequency "pixelation" is reduced throughout the scene.

Frankly, I was all set not to like CFAA's tent filters when I first heard about them. They make things blurry, don't involve clever tricks like Nvidia's coverage sampling, and hey, Quincunx sucked. But here's the thing: I really like them. It's hard to argue with results, and CFAA's tent filters do some important things well. Have look at this example shot from Oblivion.

GeForce 8800 GTS
CSAA 8X
GeForce 8800 GTS
CSAA 16X
Radeon HD 2900 XT
CFAA 8X - 4X MSAA + Wide tent

This CFAA mode with 8 samples produces extremely clean edges and does an excellent job of resolving very fine geometry, like the tips of the spires on the cathedral. Even 16X CSAA can't match it. Also, have a look at the tree leaves in these shots. They use alpha transparency, and I don't have transparency AA enabled, so you see some jagged edges on the GeForce 8800. The wide tent filter's subtle blending takes care of these edges, even without transparency AA.

You may not be convinced yet, and I don't blame you. CFAA's tent filters may not be for everyone. I would encourage you to try them, though, before writing them off. There is ample theoretical backing for the effectiveness of tent filters, and as with any AA method, much of their effectiveness must be seen in full motion in order to be properly appreciated. I prefer the 4X MSAA + wide tent filter to anything Nvidia offers, in spite of myself. I've found that it looks great on the 30" wide-screen LCD attached to my GPU test rig. The reduction in high-frequency pixel noise is a good thing on a sharp LCD display; it adds a certain solidity to objects that just.. works. Oblivion has never looked better on the PC than it does on the Radeon HD 2900 XT.

How does this AA voodoo perform, you ask? Here's a test using one of 3DMark's HDR tests, which uses FP16 texture formats.

Another feature of CFAA tent filters is that they have no additional memory footprint or sampling requirements, and in this case, that translates to almost no performance overhead. Ok, my graph here is hard to read, but if you look closely, you'll see that CFAA's narrow and wide tent filters don't slow down the 2X and 4X MSAA modes on which they're based. There is a performance penalty involved when they're combined with 8X MSAA, but it's not too punishing.

In its current state, then, the R600's CFAA is an impressive answer to Nvidia's CSAA, the, er, Quincunx smear aside. The thing about custom filters is that they can do many things, and AMD has big plans for them. They're talking about a custom filter than runs an edge-detect pass on the entire image and then goes back and applies AA selectively. In fact, they even delivered a driver to us late in our testing along with a separate executable to enable this filter. Unfortunately, I wasn't able to get it working in time to try it out. We'll have to look at it later.

Oh, and it is possible that Nvidia could counter CFAA with some shader-based custom AA filters of its own, completely stealing AMD's thunder. For the record, I'd wholeheartedly endorse that move.