Tonga: it's a magical place
Turns out my worries were misplaced, because Tonga is not just a smaller version of Hawaii. A year after the release of that bigger GPU, AMD has slipstreamed some significant new technology into Tonga—and has done so rather quietly, without a branding change or any of the usual fanfare. In fact, I had to prod AMD a little bit in order to understand what's new in Tonga. I don't yet have a clear picture of how everything works, but I'll share what I know.
By far the most consequential innovation in Tonga is a new form of compression for frame buffer color data. GPUs have long used various forms of compression in order to store color information more efficiently, but evidently, the method Tonga uses for frame buffer data is something novel. AMD says the compression is lossless, so it should have no impact on image quality, and "delta-based." Tonga's graphics core knows how to read and write data in this compressed format, and the compression happens transparently, without any special support from applications.
We don't have many details on exactly how it works, but essentially, "delta-based" means the compression method keys on change. My best bet is that whenever a newly completed frame is written to memory, only the pixels whose color have changed from the frame prior are updated. ARM does something along those lines with its Mali mobile GPUs, and I expect AMD has taken a similar path.
The payoff is astounding: AMD claims 40% higher memory bandwidth efficiency. I'm not quite sure what the basis of comparison is for that claim, nor am I clear on whether 40% is the best-case scenario or just the general case. But whatever; we can measure these things.
3DMark Vantage's color fill test has long been gated primarily by memory bandwidth, rather than the GPU's raw pixel fill rate. Here's how Tonga fares in it.
Compare the R9 285 to the Radeon HD 7950 Boost, which we used in place of the Radeon R9 280. (Only 8MHz of clock speed separates them.) The 7950 Boost has 240 GB/s of memory bandwidth to Tonga's 176 GB/s, yet the new Radeon maintains a substantially higher pixel fill rate. That's Tonga magic in action.
Perhaps my concerns about Tonga's memory bandwidth were premature. We'll have to see how well this compression mojo works in real games, but it certainly has my attention.
That's not all. Tonga has inherited a new front-end and internal organization from Hawaii that grants it more potential for polygon throughput. The triangle setup rate has doubled from two primitives per clock in Tahiti to four per clock in Tonga. Beyond that, Tonga adds some of its own provisions to improve geometry and tessellation performance, including a larger parameter cache that spills into the L2 cache when needed. The division of work between the geometry front-end units has been improved, and these units can better re-use vertices, which AMD says should help performance in cases where "many small triangles" are present.
These architectural modifications more than bring the R9 285 up to par with its nearest rival, the GeForce GTX 760, in terms of geometry throughput. Tonga also surpasses the Hawaii-based Radeon R9 290X in this synthetic test of tessellation performance.
Between the new color compression method and the geometry performance gains, Tonga could plausibly claim to usher in a new generation of Radeon technology. The use of the GCN or "Graphics Core Next" label has proven incredibly flexible inside the halls of AMD, but what we're seeing here sure feels like a fundamental shift.
That's not the full extent of the changes, either. AMD has revamped Tonga's media processing capabilities in order to ensure fluid performance and high-quality images in the era of 4K video. That starts with a new hardware image-scaling block in the display pipeline. This scaler is capable of upscaling to and downscaling from 4K video targets in real time.
In a related move, the graphics core has gained some new instructions for 16-bit integer and floating-point media and compute processing at reduced power levels. Also, both the video decode (UVD) and encode (VCE) engines on Tonga have been upgraded to allow for higher throughput. The UVD block now supports the MJPEG standard and can decode high-frame-rate 4K video compliant with the High Profile Level 5.2 spec. The beefier VCE block can encode 1080p video at 12X the real-time rate and is capable of encoding 4K video, as well.
We've had limited time to test Tonga, so we haven't been able to scrutinize its video processing chops yet. Above are some encoding performance results that AMD supplied to reviewers showing the R9 285 outperforming the GeForce GTX 760. Make of them what you will.
But wait, there's more!
One of the strange things about Tonga's introduction to the world is that it's debuting in a product where's its at less than full strength. AMD hasn't provided a ton of info about the full GPU, perhaps as a result of that fact, but below is my best guess at how Tonga looks from 10,000 feet.
The image above shows eight compute units per shader engine, with four shader engines across the chip. AMD has confirmed to us that Tonga is indeed hiding four more compute units than are active in the R9 285, so the diagram above ought to be accurate in that regard. Here's my best estimate of how Tonga stacks up in terms of key metrics versus its closest competition.
The die size and transistor count for Tonga above come directly from AMD. What fascinates me about these figures is that Tonga is barely any smaller than Tahiti. The idea that Tonga is a cost-reduced version of Tahiti pretty much goes out of the window right there.
Look at the transistor count, though. Tonga packs in roughly five billion transistors, while Tahiti is less complex, at 4.3 billion. Both chips are made at TMSC on a 28-nm process. How is it that Tonga's not quite as large as Tahiti yet has more transistors?
Since the chips are separated by three years, I suspect GCN compute units in Tonga are more densely packed than those in Tahiti. AMD has had more time to refine them. That said, we know that the two GPUs have the same number of compute units, so presumably Tonga doesn't get its much higher transistor count from its shader core. All of the other additions we've talked about, including the TrueAudio DSP block, the color compression capability, and video block enhancements, add some complexity. I doubt they're worth another 700 million transistors, though.
My best guess is that most of the additional transistors come from cache, perhaps a larger L2. SRAM arrays can be very dense, and a larger L2 cache would be a natural facilitator for Tonga's apparently quite efficient use of memory bandwidth. I've pinged AMD about the size of Tonga's L2 cache but haven't heard back yet.
Another question these numbers raise is whether Tonga natively has a 256-bit memory interface. Generally, the size of a chip like this one is dictated by the dimensions of the I/O ring around its perimeter. Since Tonga occupies almost the same area as Tahiti, it's got to have room to accommodate a 384-bit GDDR5 interface. Surely we'll see a Radeon R9 285X card eventually with a fully-enabled Tonga GPU clocked at 1GHz or better. If I were betting, I'd put my money on that card having a 384-bit path to memory.