Texturing and memory
Cypress' memory hierarchy has been massaged, too. The doubling of the number of SIMDs on chip means twice as many 8KB L1 caches onboard, so the total L1 size doubles to 160KB. All told, these caches give the GPU as much as one terabyte per second of bandwidth for L1 texture fetches, according to Demersa staggering number. The four L2 caches associated with the memory controllers have grown from 64KB to 128KB each, and deliver up to 435 GB/s of bandwidth between the L1 and L2 caches.

AMD has held steady at four 64-bit memory controllers, yielding an aggregate 256-bit path to memory. GDDR5 data rates are up from 3.6 Gbps on the Radeon HD 4870 to 4.8 Gbps on the 5870, an increase of less than 50%. This is perhaps one potential weakness of a chip that has doubled in nearly every other department, but AMD contends the RV770 was memory bandwidth-rich and compute-poor, relatively speaking, and thus not taking full advantage of the memory bandwidth available to it. Cypress, the firm claims, will be more balanced.
AMD has made other provisions to make the best use of the bandwidth available to Cypress. Chief among them are new block texture compression modes, contributed by AMD to the DirectX 11 spec and now available to all GPU makers. These compression modes are purported to offer higher quality than prior standards, with better signal-to-noise ratios and better handling of transparency. The basic technology has been adapted to work with both standard 8-bit-per-channel integer and FP16 HDR texture formats, with compression ratios up to 6:1 possible. Texture sizes of up to 16k by 16k are now supported, as well.
Another tweak that gets my IQ-junkie juices flowing is the move to a new anisotropic filtering algorithm that does not vary the level of detail according to the angle of the surface to which it is being applied. This is a hardware-level change to the texture filtering units. AMD claims it has implemented an algorithm that achieves the same results as the Microsoft Direct3D reference rasterizer, but does so more efficiently, with no additional performance cost compared to its prior GPUs.
We can see the impact of this change using the infamous tunnel test, pictured below. The idea here is that you're looking down a 3D-rendered cylinder, and the mip-maps are different colors in order to show you where one ends and the other begins. Some level of blending between them is being applied by the GPU, alsothat's trilinear filtering. The closer the colored shape is to a circle, the less the aniso filtering algorithm varies according to the angle of inclination. In other words, rounder is better. The smoother the blending between the colors, the more trilinear filtering is being applied. Smoother is better.
| Anisotropic texture filtering and trilinear blending | |
| Radeon HD 4870 | Radeon HD 5870 |
|
|
| GeForce GTX 285 | GeForce GTX 285 HQ |
|
|
Up to now, as you can see, Nvidia has performed better on this test than AMD. I don't want to overstate the importance of that; the reality is that these things are much easier to spot in a contrived test like this one than in a real game, where the differences are very tough to see. Still, the 5870 aces this test in pixel-perfect fashion, setting a new standard for anisotropic filtering.
Not only that, but generally we'd be handing you a caveat right now about trilinear filtering on the Radeons, because AMD has long used an adaptive trilinear algorithm that applies more or less blending depending on the contrast between the textures involved. In the case of a test like this one, that algorithm always does its best work, because the mip maps are entirely different colors. In games, it applies less filtering and may not always achieve ideal results. However, for Cypress, AMD has decided to stop using that adaptive algorithm. Instead, they say, the Radeon HD 5870 applies full trilinear filtering all of the time by default, so the buttery smooth transitions between mip-map colors you're seeing in the image above are in earnest.
In games, the impact of these performance quality improvements is subtle, but you can expect to see less high-frequency noise in the form of things like texture crawling and sparkle on the 5870. I need to play with it some more, frankly, in order to find some good examples of the differences. I can tell you now that they'll likely be very difficult to capture in a static screenshot. We'll try to look into this topic more when we have time, though.
|
Peak pixel fill rate (Gpixels/s) |
Peak bilinear texel filtering rate (Gtexels/s) |
Peak bilinear FP16 texel filtering rate (Gtexels/s) |
Peak memory bandwidth (GB/s) |
|
| GeForce 9800 GT | 9.8 | 34.3 | 17.1 | 57.6 |
| GeForce GTS 250 | 12.3 | 49.3 | 24.6 | 71.9 |
| GeForce GTX 285 | 21.4 | 53.6 | 26.8 | 166.4 |
| GeForce GTX 295 | 32.3 | 92.2 | 46.1 | 223.9 |
| Radeon HD 4850 | 10.9 | 27.2 | 13.6 | 67.2 |
| Radeon HD 4870 | 12.0 | 30.0 | 15.0 | 115.2 |
| Radeon HD 4890 OC | 14.4 | 36.0 | 18.0 | 124.8 |
| Radeon HD 4870 X2 | 24.0 | 60.0 | 30.0 | 230.4 |
| Radeon HD 5850 | 23.2 | 52.2 | 26.1 | 128.0 |
| Radeon HD 5870 | 27.2 | 68.0 | 34.0 | 153.6 |
One reason AMD was able to make these image quality improvements is this GPU's embarrassment of riches in the texture filtering department, where it more than doubles the peak theoretical capacity of the Radeon HD 4870. Cypress also has twice as many render back-ends or ROPs as the RV770, with two attached to each memory controller, so it has substantially more peak pixel fill rate and antialiasing oomph.

This color fill rate test usually ends up being memory-bandwidth limited. The Radeon HD 5870 has a little bit less memory bandwidth, in theory, than the GeForce GTX 285, and it works out that way in practice, too.

This test measures filtering rates with standard integer texture formats. The 5870 falls a little shy of its theoretical peak of 68 bilinear filtered Gtexels/s, but most of these GPUs do. Interestingly enough, the 5870 also falls behind the Radeon HD 4870 X2 and the GeForce GTX 285 when we get to the higher levels of anisotropy. Those other cards both have more memory bandwidth than the 5870, which may play a part. But remember, also, that they're producing lower-quality results than the 5870. Notice that in its high-quality filtering mode, which still can't match the 5870's output, the GTX 285's performance drops below the 5870's.

This is a test of FP16 texture filtering, which is probably where we want to focus more of our attention, since this is the hard stuff. However, I still don't know what the heck is going on with the units here. At the very least they're off by a factor of 100, since the 5870's peak theoretical FP16 filtering speed is 34 Gtexels/s and 3DMark is reporting 1868. This has been a long-standing problem with 3DMark Vantage, and the folks at FutureMark have stopped answering my emails about it. I'm open to suggestions for alternate FP16 texture filtering tests.
In the meantime, we're going to assume the relative differences here are meaningful, at least, and notice that the Radeon HD 5870 is alone at the top of the charts. This is likely one of the places where most GPUs are interpolation limited, and the 5870's shader-based interpolation allows it to outpace even two Radeon HD 4870s on an X2 card.
| Friday night topic: The trouble with Best Buy | 143 |