GCN learns even more tricks
For years, AMD has used the letters "GCN" to refer to is core graphics technology, but that tech has gotten consistent updates that have expanded its feature set in little ways over time.
Fiji inherits all of the goodness from the Tonga GPU that popped up last fall, including delta-based color compression, faster tessellation, and a video-processing block revamped for 4K resolutions. Beyond obvious things like HBM, Fiji has a handful of added tweaks that go beyond Tonga. Many of those changes have to do with power efficiency, an obvious target given AMD's desire to cram 4096 stream processors into a power envelope similar to the 290X's.
Fiji's big shader array has more extensive power gating than prior GCN parts, so portions of the shader units not currently in use will be turned off more aggressively in order to avoid wasted energy. The Fiji team imported the dynamic voltage and frequency scaling algorithm (usually branded PowerTune in AMD's GPUs) from the APU side of the business, too. As a result, the GPU now avoids seeking the maximum possible clock frequency in cases where the workload doesn't require it.
Furthermore, AMD has changed how it chooses the voltage and frequency operating points for its GPUs. The new method introduces some card-to-card variability, which AMD has avoided in the past, but it allows for more efficiency overall. (I believe Nvidia's GPU Boost first adopted some tolerance of card-to-card variability in the Kepler generation.) AMD Graphics CTO Raja Koduri told us AMD was willing to accept this change in part because it realized variability was part and parcel of the modern PC ecosystem, given sheer variety of cases and cooling solutions and the like.
One other power optimization in the Fury X really isn't a GCN improvement, but it helps explain why Fiji is able to run at ~1GHz with less board power than the 290X. It has to do with that liquid cooler. AMD has cited operating temperatures around 52C for the GPU on this card, and operating at such low temperatures tamps down on leakage power in pretty dramatic fashion. The transistors on a warmer chip will leak more and thus require more power. By cooling Fiji aggressively, the Fury X likely saves a non-trivial amount of wattage that would otherwise be wasted. This fact is noteworthy in this context because it suggests the power-oriented improvements in Fury X aren't all related to more efficient GPU architecture per se.
Another set of changes in Fiji is meant to further improve geometry and tessellation throughput. That news is a bit of a counter to the peak rates we've discussed above, where the Fury X matches the R9 290X. That said, even if the Fury X delivers on its full potential very consistently, it still won't match the polygon throughput we've measured for the GeForce GTX 980 Ti in synthetic tests.
Two other units in Fiji are notable for different reasons.
The UVD block responsible for video decoding has gotten an important upgrade: it can now accelerate the decoding of H.265/HEVC-encoded video in hardware, which is crucial for 4K video in particular. This hardware can also decode Google's VP8 format, but AMD says it's still investigating the software side of that equation. Full acceleration of the latest video formats gives Fiji a leg up on its competition.
That advantage comes with a downside, though. The display outputs in this GPU are not HDMI 2.0-capable, so Fury cards will not be able to drive 4K TVs at 60Hz over HDMI. AMD has a bit of a handicap here that its competition doesn't share.
Performance expectations and a strange parity
We don't yet have a Fury X card to test, but AMD has released some of its own home-brewed performance numbers comparing the new Radeon to the GeForce GTX 980 Ti in some popular games at 4K resolution. Sadly, these numbers are only FPS averages, so they communicate more about GPU potential than they do about a true experience of smooth gaming. Also, AMD has almost surely cherry-picked the games and image quality settings than best suit its cause. These numbers are—in big, hairy, important ways—no substitute for a properly conducted independent review.
With those caveats in mind, here's what AMD has offered:
These results look like generally good news for the red team. However, the contest is close in many places, and it seems possible, perhaps even likely, that the GTX 980 Ti will have its own share of victories in a more neutral selection of games and settings
Once I had a sense that the Fury X might not deliver a resounding victory for AMD in overall performance, I asked AMD's Koduri about the situation. After all, HBM is some seriously impressive tech, and the Fury X has a massive advantage in terms of both memory bandwidth and shader processing power. Why doesn't it mop the floor with the competition?
Koduri answered by stating my question another way: why didn't AMD build a bigger engine? That's an astute way to view things, because a bigger GPU engine would have taken fuller advantage of HBM's considerable bandwidth.
The reason why Fiji isn't any larger, he said, is that AMD was up against a size limitation: the interposer that sits beneath the GPU and the DRAM stacks is fabricated just like a chip, and as a result, the interposer can only be as large as the reticle used in the photolithography process. (Larger interposers might be possible with multiple exposures, but they'd likely not be cost-effective.) In an HBM solution, the GPU has to be small enough to allow space on the interposer for the HBM stacks. Koduri explained that Fiji is very close to its maximum possible size, within something like four square millimeters.
He also noted that the question of workloads—how the GPU is used by game developers—is key to the performance question. Fiji has a lot more shader computing power than the GM200. Some workloads are better suited for Fiji's balance resources, while others are not. That fact will likely add fuel to the disputes around vendor-specific optimizations in games, as I've already noted.
Of course, AMD could have endowed Fiji with a different mix of resources—with more ROP throughput, more geometry-processing power, and a somewhat smaller shader array, for example. Doing so might not have worked for important reasons, though. Shader ALUs can be packed densely onto a chip and tend to be relatively power efficient compared to other sorts of resources. Going with a different mix might have caused Fiji to run up against its power budget in a bad way, or it might not have been a terribly effective use of the available die space.
At the end of the day, what's happened is that we appear to have arrived at a strange sort of parity between to two major GPU players in these high-end chips. Both built GPUs that are about 600 square millimeters. AMD invested in HBM and carried over the GCN architecture with relatively minor tweaks. Nvidia invested in a truly new GPU microarchitecture with Maxwell but stuck with legacy GDDR5 memory. The resulting GPUs have dramatically different respective strengths, and early indicators point to them being pretty closely matched in current games.
Rough parity in the GPU race isn't anything new. Long ago, ATI's David Nalasco explained this dynamic to me by pointing out that both of the major GPU firms have "a lot of smart people" and the constraints they're up against tend to be shared in common. It's logical to expect a tight race.
What's remarkable about 2015's version of that race is that AMD and Nvidia have taken very different paths to a similar destination. I suspect AMD's bet on HBM would have paid off much bigger had TSMC been able to deliver a compelling 20- or 16-nm fabrication process in time for this generation of products. That didn't happen, though, and both firms had to adapt. The ensuing contest between products based on Fiji and the GM200 should be wonderfully weird and more fun than anything we've seen in a while.