2014 has been a strange year for graphics chips. Many of the GeForce and Radeon graphics cards currently on the market are based on GPUs over two years old. Rather than freshening up their entire silicon lineups top-to-bottom like in the past, AMD and Nvidia have chosen to take smaller, incremental steps forward.
Both firms introduced larger chips based on existing GPU architectures last year. Then, two weeks ago, the Tonga GPU in the Radeon R9 285 surprised us with formidable new technology that's still somewhat mysterious. Before that, this past spring, Nvidia unveiled its next-gen "Maxwell" graphics architecture on a single, small chip aboard the GeForce GTX 750 Ti. We could tell by testing that card's GM107 GPU that Maxwell was substantially more power-efficient than the prior-gen Kepler architecture. However, no larger Maxwell-based chips were forthcoming.
Until today, that is.
At long last, a larger Maxwell derivative is here, powering a pair of new graphics cards: the flagship GeForce GTX 980 and its more affordable sibling, the GTX 970. These cards move the needle on price, performance, and power efficiency like only a new generation of technology can do.
The middle Maxwell: GM204
The chip that powers these new GeForce cards is known as the GM204. Although the Maxwell architecture is bursting with intriguing little innovations, the GM204 is really about two big things: way more pixel throughput and vastly improved energy efficiency. Most of what you need to know about this chip boils down to those two things—and how they translate into real-world performance.
Here's a look at the basic specs of the GM204 versus some notable contemporaries:
|GM204||64||128/128||2048||4||256||5200||416 (398)||28 nm|
|Tonga||32 (48)||128/64||2048||4||256 (384)||5000||359||28 nm|
Nvidia did well to focus on energy efficiency with Maxwell, because foundries like TSMC, which makes Nvidia's GPUs, have struggled to move to smaller process geometries. Like the entire prior generation of GPUs, GM204 is built on a 28-nm process. (Although TSMC is apparently now shipping some 20-nm silicon, Nvidia tell us the 28-nm process is more cost-effective for this chip, and that assessment is consistent with what we've heard elsewhere.) Thus, the GM204 can't rely on the goodness that comes from a process shrink; it has to improve performance and power efficiency by other means.
Notice that the GM204 is more of a middleweight fighter, not a heavyweight like the GK110 GPU in the GeForce GTX 780- and Titan-series cards. Nvidia considers the GM204 the successor to the GK104 chip that powers the GeForce GTX 680 and 770, and I think that's appropriate. The GM204 and the GK104 both have a 256-bit memory interface and the same number of texture filtering units, for instance.
Size-wise, the GM204 falls somewhere in between the GK104 and the larger GK110. Where exactly is an interesting question. When I first asked, Nvidia told me it wouldn't divulge the new chip's die area, so I took the heatsink off of a GTX 980 card, pulled out a pair of calipers, and measured it myself. The result: almost a perfect square of 20.4 mm by 20.4 mm. That works out to 416 mm². Shortly after I had my numbers, Nvidia changed its tune and supplied its own die-size figure: 398 mm². I suppose they're measuring differently. Make of that what you will.
The GM204's closest competition from AMD is the new Tonga GPU that powers the Radeon R9 285. We know for a fact that not all of Tonga's capabilities are enabled on the 285, though, and I have my own crackpot theories about how the full Tonga looks. I said in my review that I think it has a 384-bit memory interface, and after more noodling on the subject, I strongly suspect it has 48 pixels per clock of ROP throughput waiting to be enabled. Mark my words so you can mock me later if I'm wrong!
One reason I suspect Tonga has more ROPs is that it just makes sense to increase a GPU's pixel throughput in the era of 4K and high-PPI displays. I believe the GM204's ROPs are meant to be represented by the deep blue Chiclets™ surrounding the L2 cache in the fakey diagram above. At 64 pixels per clock, the GM204 has 50% more per-clock ROP throughput than the big Kepler GK110 chip—and double that of the GK104. That's a sizeable commitment, an enormous increase over the previous generation, and it means the GM204 is ready to paint lots of pixels.
At 2048KB, the GM204's L2 cache is relatively large, too. The GK104 has only a quarter of the cache, at 512KB, and even the GK110's 1536KB L2 cache is smaller. Caches are growing by leaps and bounds in recent graphics architectures, as a means of both amplifying bandwidth and improving power efficiency (since memory access burns a lot of power.)
The larger cache is just one way Nvidia has pursued increased efficiency in the Maxwell architecture. Many of the other gains come from the new Maxwell core structure, known as the shader multiprocessor or SMM. The GM204 has a total of 16 SMMs. Each of them is broken into four "quads," and each of those has a single 32-wide vector execution unit with its own associated control logic. Threads are still scheduled in "warps," or groups of 32 threads, with one thread per "lane" executing sequentially on each vec32 execution unit. Nvidia says the SMM's new structure makes scheduling tasks on Maxwell simpler and more efficient, which is one reason this architecture uses less energy per instruction than Kepler. Maxwell's efficiency improvements come from several sources, though, and I hope to have time to explore them in more depth in a future article.
For now, let's look at the new GeForce cards.