Way, way back in the fall of 2006, I put together my first PC in my dorm room. I picked out a Core 2 Duo E6400 and a proper motherboard, guided by a friendly-sounding article from some PC hardware site I'd found while Googling around. My elation with that PC—Dual cores! 4GB of memory! A graphics card that can run Half-Life 2! Free Windows Vista!—probably wasn't shared by the AMD boardroom at the time.
The Conroe cores in that E6400 and its friends helped touch off an Intel CPU performance lead that AMD hasn't much challenged since. 2007's Phenom family of chips suffered from a performance-robbing TLB erratum, and the Phenom II series only duked it out with Intel chips from prior generations in its time. Famously, 2011's ambitious Bulldozer architecture trailed Intel's seminal Sandy Bridge CPUs substantially when it launched aboard the FX-8150, and the Piledriver refresh of that architecture in 2014 didn't help much. Our move to frame-time benchmarking between Bulldozer and Piledriver made the refreshed architecture's shortcomings especially clear for gaming performance. Then-AMD CEO Rory Read eventually conceded that Bulldozer "was not the game-changing part [sic] when it was introduced three years ago," but the 'dozer's derivatives have had to soldier on in various forms in AMD's CPUs ever since.
It didn't help that AMD's bet on the fusion of Radeon graphics and traditional CPU cores over seven generations of APUs didn't find many takers in the lower end of the market. We can't forget the company's long slide into data-center irrelevance, either, an attractive and high-margin business that Intel basically has to itself these days.
So, yeah. After 10 years and change, the Zen microarchitecture that's launching this morning aboard AMD's Ryzen CPUs has a lot riding on its shoulders. The entire company's future, if I had to guess. No biggie.
Not to spoil things too much, but Zen is solid. Go ahead and breathe a sigh of relief now. We've had three Ryzen CPUs in the TR labs this past week: the Ryzen 7 1800X, the Ryzen 7 1700X, and the Ryzen 7 1700. We've spent nearly every waking hour of the past few days turning every knob and dial we can to make our Ryzen CPUs sweat. Before we see whether or how the first Zen chips live up to the deafening hype that AMD has drummed up over the past few months, it's worth taking a peek under the hood to see just how the company fulfilled the promises it's made about Ryzen's performance.
From the ground up
The Zen microarchitecture is a complete re-imagining of what an AMD x86 processor should look like. The company's engineers have tossed the tightly-coupled "module" concept of Bulldozer and friends on the scrap heap. Instead, Zen is a sleek, shiny new chassis that looks a bit like Sandy Bridge and its derivatives if you squint a bit. AMD has consistently touted a "40% IPC speedup" in its discussions of Zen from the beginning, and I'll do the best I can to briefly explain how AMD got there with its latest and greatest.
At the highest level, I want to draw your attention to two clusters of rectangles in this high-level block diagram of the Zen CPU core. The first point of interest is that each core will have its own integer and floating-point units to work with. This coprocessor layout is quite a bit different from the dual-integer-unit and shared-floating-point-unit structure of the Bulldozer core. Another new AMD trick for Zen is simultaneous multithreading, or SMT—better known as Hyper-threading in Intel parlance—to take advantage of otherwise idle execution resources. Much of the Zen core can be competitively shared between multiple threads of execution, and only a few resources—a new structure for AMD chips called the op cache, the store queue, and the retire queue—are statically partitioned.
The op cache is one of the biggest improvements to Zen's fetch-and-decode stage. This structure first made its appearance in Intel's Sandy Bridge architecture, and it serves as a temporary home for the internal micro-ops generated as part of the decode stage. This bit of cache is important because it can let the core leave its power-hungry fetch-and-decode hardware spun down. Instead, recently-decoded micro-ops can be dispatched straight into the maw of the core's execution units for processing if they're needed again. That shortcut has benefits for both latency and power consumption. You can read more about the benefits of op-caching in David Kanter's incredible Sandy Bridge deep-dive. (David's deep-dives have been indispensable in laying the foundations for this article, and they're required reading for anyone with even the slightest curiosity about modern CPU architectures. Do go check them out.)
Zen also features an improved hashed-perceptron branch predictor compared to its predecessors. AMD (accurately) calls this a "neural network" instead, because neural networks are cool right now. It's also not a new concept for AMD chips: Bulldozer, Piledriver, and Jaguar have all used similar technology in their predictors. AMD didn't share many details of what it changed in the Zen predictor relative to its prior architectures.
In any case, better branch prediction is critical for allowing the chip to speculatively execute instructions without choosing the wrong path in the instruction stream. Get it wrong, and you have to flush the pipeline, an extraordinarily wasteful and performance-degrading operation in most cases. You can read more about the hashed-perceptron predictor in Daniel Jiménez's introductory paper on the subject.
Intel has trumpeted better branch predictor accuracy in virtually every one of its recent microarchitectures, and it's been quite reluctant to share any details of what it's changed to get there. That guardedness suggests the company's branch-prediction secret sauce is a major competitive advantage. Given Haswell CPUs' uncanny accuracy in branch prediction, for example, there's a reason for that.
Zen features a ton of other architectural improvements that contribute to its impressive performance gains over prior generations of AMD CPUs. We'd love to cover them all in depth for you, but we've been running tests on Ryzen right up until the NDA lift this morning and beyond. If you'd like to know more, be sure to check out David's Zen write-up at the Microprocessor Report for more detail than we can possibly offer.
|Toshiba QLC 3D NAND squeezes a fourth bit into flash cells||7|
|Microsoft resurrects EMET to improve Windows 10 security||0|
|Samsung's Galaxy Note 7 returns as the Fandom Edition||19|
|European Commission fines Google $2.7 bn over Shopping results||53|
|Thermaltake glasses up its Suppressor and Core cases||8|
|National Sunglasses Day Shortbread||11|
|Gigabyte GA-AB350N-Gaming WIFI mobo stuffs Ryzen into Mini-ITX||42|
|Biostar TB250-BTC Pro motherboard hands miners a shovel||15|
|Intel SSD 545s arrives with 64-layer 3D TLC flash aboard||8|