ROPs and antialiasing
Nvidia has traditionally closely associated its ROP hardwarewhich converts shaded fragments into pixels and writes them to memorywith its L2 cache and memory controllers. With the move to GDDR5 memory, the GF100 promises to have as much as 50% higher memory bandwidth than the GT200, but it now has only six 64-bit memory controllers onboard, down from eight in the prior-gen chip. To keep the right balance of ROP hardware, Nvidia has reworked its ROP partitions: each one now houses eight ROP units, for a total of 48 ROP units across the chip. At peak, then, the GF100 can output 48 pixels per clock in a 32-bit integer formata straightforward increase of 50% over the GT200 or Cypress. GF100's ROPs require two cycles to process pixels in FP16 data formats and four for FP32.

Source: Nvidia.
Not only are the GF100's ROPs more numerous, but they've also been modified to handle 8X multisampled antialiasing without taking a big performance hit, mainly due to improved color compression speed. GeForce GPUs have been at a disadvantage in this 8X multisampling performance since the introduction of the Radeon HD 4800 series, but the GF100 should rectify the situation, as indicated by the Nvidia-supplied numbers above. (I should caution, however, that HAWX supports DirectX 10.1, which also accelerates antialiasing performance on newer GPUs. We'll want to test things ourselves before being fully confident on this point.)

Source: Nvidia.
One saving grace for the GT200's antialiasing performance has been Nvidia's coverage sampled AA modes, which store larger numbers of coverage samples than color samples and offer nicely improved edge quality with little performance cost. Now that true 8X multisampling is more comfortable, Nvidia has added a coverage sampled AA mode based on it. The new 32X CSAA mode stores eight full coverage-plus-color samples and an additional 24 coverage-only samples.

CSAA 32X: blue positions are full samples, and gray positions are coverage only. Source: Nvidia.


Alpha-to-coverage on foliage: 8 coverage samples versus 32. Source: Nvidia.
Not only will 32X CSAA provide higher fidelity antialiasing on traditional object edges, but Kilgariff pointed out that many games use a technique called alpha-to-coverage to render dense grass or foliage with soft edges, in which alpha test results contribute to a coverage mask. This method produces better results than a simple alpha test, but it relies on coverage samples to work its magic. Sometimes four or eight samples will be insufficient to prevent aliasing. In such cases, 32X CSAA can produce markedly superior results, with a total of 33 levels of transparency. Also, Nvidia's transparency multisampling modea driver feature that promotes simple alpha-test transparency to alpha-to-coverageshould benefit from the additional coverage samples in 32X CSAA.
| Friday night topic: The trouble with Best Buy | 146 |