Z-buffer optimizations — continued

Mechanism of Hierarchical Z pixel rejection
While the exact mechanism of R200's HZ remains sketchy, what is known is that it takes place during rasterization, when triangles are being scan-converted into pixels (Figure 2). Each triangle is conceptually divided into blocks of pixels. Preferably, at least two depth values (the nearest and furthest depth values within each block) are assigned to each block of pixels, accompanied by a flag that indicates complete or partial coverage of the triangle (Figure 3). Effectively, one or two pixels per block are being tested for visibility against the contents of the HZ. This contrasts with a traditional Z-buffer, where the depth of each and every pixel within a triangle is evaluated for visibility.


Figure 3: Pictorial representation of tile coverage

There are generally two outcomes of the 'all or none' type:

  1. The tile is totally obscured. The entire block of pixels may be rejected if determined to be occluded by a previous tile with 100% coverage. This creates the greatest bandwidth savings: texture, framebuffer and Z-buffer.

  2. The entire tile is in the foreground with respect to corresponding contents within the Hierarchical Z-buffer. Where an entire block of pixels with 100% coverage has been determined to completely occlude the previous tile, a full Z-buffer comparison is dispensed with and a portion of Z-buffer bandwidth is saved. This is the second best scenario.

When either of the above two criteria are not met, the tile requires a pixel-by-pixel Z-buffer comparison.Performance impact of Z-buffer optimizations
To assess the impact of Z-buffer optimizations, we take a look at the results obtained from the 'Evolva Rolling Demo' (Figure 4). The baseline graph is devoid of Z-buffer optimizations. Then we enable Z-compression, 'Fast Z-clear' and HZ strictly in the order, measuring the results with each setting. First, we note that Z-compression provides a maximum of 13.8% improvement over the baseline, understandably lower than ideal (18%) because of the increased polygon count. Next, we note a hefty 24% increase with 'Fast Z-clear'. Lastly, HZ provides a small but noticeable boost.


Figure 4: Fill rate graphs with varying levels of Z-buffer optimizations

Table 3: Evolva 32-bit framerates
640x480 800x600 1024x768 1280x1024 1600x1200
Baseline 150.4 147.3 134.3 97 65.7
Z-compression 155.4 152 142.4 110.2 76.8
Fast Z-clear 155.4 152 145.2 126.6 95.3
Hierarchical Z 155.3 151.8 142.5 127.7 98.3