On the edge: better antialiasing
The render back-ends haven't been overlooked in Cayman's wide-ranging overhaul. Several new capabilities should raise performance and image quality.
Among those is native support in the ROP units for some additional color formats, including 16-bit integer (snorm/unorm) and 32-bit floating-point. AMD claims antialiasing with these color formats should be 2-4X faster than before, which is true largely because those formats are no longer handled in software—that is, in the shader core rather than in the ROPs.
The biggest news, though, is the introduction of a new antialiasing capability known as EQAA (which I believe stands for enhanced quality antialiasing.) The intriguing thing here is that EQAA is more or less a clone of the coverage sampled AA (CSAA) feature Nvidia first introduced in the G80, its first-gen DX10 GPU. At that time, AMD was touting its custom-filtered antialiasing modes as an alternative to CSAA. Now, CFAA has all but disappeared, with both the wide and narrow tent filters from prior generations having been excised from the 6800/6900-series drivers. Only the edge-detect filter remains, although it is an interesting option.
Sadly, we don't have time to explain multisampled antialiasing (or quantum physics, for that matter) in this space, but for those who are familiar, EQAA simply stores fewer color samples than it does coverage samples, thereby increasing accuracy (and thus image quality) with a minimal increase in the memory footprint or performance cost. We've found Nvidia's corresponding feature, CSAA, to deliver visibly superior edge AA quality without slowing frame rates much at all. Cayman's ROPs can be programmed to store a different number of color and coverage samples, so many things are possible, but AMD has largely replicated Nvidia's CSAA modes, with one notable addition. Also, AMD's naming scheme for the different EQAA modes is a little more modest, since it's based on the number of color samples rather than coverage samples. I've mapped the names and sample sizes to clear up any confusion. Included are the traditional multisampled AA modes for reference.
|2X MSAA||1||2||2||2X MSAA|
|4X MSAA||1||4||4||4X MSAA|
|4X EQAA||1||4||8||8X CSAA|
|8X MSAA||1||8||8||8xQ CSAA|
|8X EQAA||1||8||16||16xQ CSAA|
AMD's new mode, as you can see, is 2X EQAA, which captures two only color samples but four coverage samples. This mode could be a nice choice, especially in situations where performance is marginal—perhaps less likely to be an issue in Cayman than in a smaller derivative, but you get the picture.
The EQAA sample patterns from the AMD presentation above are apparently only for illustrative purposes. We've captured the texture/shader (green dots), color (gray dots), and coverage (small red dots) sample patterns from Cayman and the GF110 with some simple tools, and they don't really correspond with AMD's presentation.
|4X EQAA||8X CSAA|
|8X EQAA||16xQ CSAA|
In reality, AMD's sample patterns are quite a bit funkier. In 8X EQAA, one color and coverage sample is taken from the very top left corner of the pixel space. In the bottom right corner, you can see that same color sample point intruding from the pixel below.
EQAA's effects are very evident in this simple test pattern. You have to like that 2X EQAA mode, which looks nearly as good as 4X multisampling.
I had hoped to include a lot more information on EQAA, including robust image quality comparisons with Nvidia's CSAA and some performance data, but we'll have to circle back and do that at a later date. We're quite pleased to see AMD adding this feature, because it offers the possibility of direct performance comparisons between GeForces and Radeons in high-quality AA modes like 4X EQAA/8X CSAA. In fact, since we tend to prefer the image quality and performance of these AA methods, they may soon become our new de facto standard for testing, supplanting 4X multisampling.
Cayman does retain one other interesting antialiasing option, the morphological AA capability introduced with the Radeon HD 6800 series. MLAA is a post-process filter that lacks sub-pixel accuracy, so it's a decidedly lower quality option than multisampling or EQAA—especially in motion, where its deficiencies are more evident than in static screen captures—but it has the great virtue of working properly with a wide range of games, including those that use deferred shading methods that don't play well with MSAA and its derivatives. Again, this feature deserves more attention than we can give it presently, but we have it on our hit list for later.
PowerTune, somehow, isn't for electric guitars
Speaking of features that deserve more attention than we can give them, Cayman introduces a novel power containment scheme known as PowerTune, whose stated goal is to keep the GPU from exceeding its maximum power rating (or TDP) in "outlier" applications that are much more power-intensive than the typical game. Nvidia added a similar feature in its GeForce GTX 580 and 570 graphics cards just recently, but AMD claims its approach is better on several fronts. For one, Cayman contains an integrated power control processor that monitors power draw constantly. This processor then algorithmically adjusts clock speeds for various logic blocks on the GPU in order to enforce the product's stated TDP limit.
Any such mechanism that reduces clock speeds has the potential to impact performance. The picture becomes more complicated from there very quickly, though. PowerTune is, in a sense, kind of the inverse of the Turbo Boost capability built into the latest Intel CPUs. Turbo Boost will opportunistically raise clock speeds in order to grab more performance when available, whereas PowerTune limits clock frequencies when the chip draws too much power. AMD tell us PowerTune generally shouldn't kick in during normal use—but it adds a caveat: especially with antialiasing in the mix. Of course, antialiasing isn't always in use. PowerTune will reduce performance in some measurable ways—and not just in FurMark or the like. Even 3DMark Vantage's Perlin Noise test, which has lots of shader arithmetic, will cause PowerTune to kick in.
AMD is very open about the implications of this feature, even going so far as to point out that default GPU clocks for its products will no longer have to be constrained by "outlier" applications. Taken another way, that's a straightforward admission that GPU clock frequencies will be set higher and allowed to bump up against the TDP limits. That's a departure from the usual approach, say in the CPU world, in which a buying a certain product generally guarantees the user a certain level of performance and the invocation of throttling generally means a cooling problem has occurred. Intel has struck a very different compromise by offering its users some extra, non-guaranteed performance in the form of Turbo Boost. The question, we suppose, is how far AMD will push on binning and power capping its products over time—and whether users will decide to push back.
AMD tells us its PowerTune algorithm for each video card model will be tuned for the worst-case scenario, to accommodate the leakiest, most power-hungry chips that fall into that particular product bin. As a result, performance should not vary substantially from one, say, Radeon HD 6950 to the next, even if ASIC quality does. AMD claims this steadiness from chip to chip is a contrast to Nvidia's power-limiting scheme, which is based directly on power draw at the 12V rail. Since Nvidia claims its cards shouldn't clamp power during normal use, though, we're unsure whether (or how much) that distinction matters.
The presence of the PowerTune controller opens up some tweaking options, which AMD has decided to expose to the end user. A slider in the Catalyst Control Center will allow users to raise or lower their video cards' TDP limits by 20%. The possibilities here are several. The user could raise the TDP limit alone to get less frequency clamping and higher performance in some cases. He could overclock his GPU but leave the TDP clamp in place, capturing additional performance where possible while ensuring his video card's power consumption doesn't exceed its limits. He might choose to raise both clock speeds and power limits to achieve maximum performance. Or he might decide to lower the TDP limit in, say, a home-theater PC to ensure modest noise levels and power draw.
I suppose one could also overclock the snot out of the thing and plunge the PowerTune slider to negative 20% just to create confusion about how the card will perform in any given situation. Whee!
With that said, we're about ready to close the book on Cayman's architectural enhancements and move on to the specifics of the new Radeon cards. Before we do so, though, we should point out that Cayman inherits all of the display and multimedia goodness already familiar from the Radeon HD 6800 series, including DisplayPort 1.2, a considerable array of display outputs compatible with the Eyefinity multi-monitor gaming scheme, and AMD's UVD3 video processing block.
|AMD issues statement on R9 290X speed variability, press samples||64|
|MSI's new gaming notebook has a 2880x1620 screen||17|
|Next-gen Intel SSDs could have 2TB capacities, integrated heatsinks||21|
|Data suggests consumer drives are as reliable as enterprise models||50|
|Valve joins the Linux Foundation||60|
|USB group designing slim, orientation-independent connector||66|
|Are retail Radeon R9 290X cards slower than press samples?||234|
|Cherry intros MX RGB key switch; first keyboard due from Corsair||57|
|MSI's latest Z87 motherboard, GeForce GTX 760 graphics card have Mini-ITX dimensions||35|