AMD's Zen architecture has proven impressively scalable. From four cores and four threads to eight cores and sixteen threads and everything in between, the basic eight-core Zen die (often referred to as a Zeppelin) has made a name for itself in PCs ranging from budget builds to the high end. Most impressively, Ryzen 7 CPUs matched Intel's Haswell and Broadwell high-end desktop parts for productivity performance.
Although the power of Ryzen 7 chips proved impressive, they lacked some of the features that the most demanding users have come to expect from Intel's platforms. Quad-channel-memory and gobs of PCIe lanes have, until recently, remained the calling card of Intel's X99 and X299 platforms. The Ryzen Threadripper CPU family changes all of that.
|PCIe 3.0 lanes
|16||32||3.4 GHz||4.0 GHz||200 MHz||8MB||32MB||64||4||$999|
Thanks to its Infinity Fabric on-die and inter-die interconnect, AMD can join multiple Zeppelins on one package to scale past eight cores and two channels of memory. We've already seen that approach used to great effect in the Epyc family of server processors, and it's the technique AMD is using to make the Ryzen Threadripper 1920X and Ryzen Threadripper 1950X. If you'd like to read more about the Zen architecture's fine details, David Kanter's run-down of Zen is a fine place to start. We'll just be covering the broad strokes of how Zeppelins become Threadrippers.
The massive, Epyc-esque Threadripper package sews two Zeppelins together in a diagonal Infinity Fabric topology. AMD then taps two channels of DDR4 memory and all 32 lanes of PCIe 3.0 connectivity from each die to create a formidable multi-chip module. Of the 64 PCIe 3.0 lanes this stitching-together makes available, 60 will be available for motherboard makers to distribute across the various slots and ports of their products. The remaining four are reserved for the connection between CPU and chipset. The other two "dies" under Threadripper's massive heatspreader are dummies that serve to maintain the integrity of the package. Overclockers will appreciate that AMD continues to solder the heatspreader to Threadripper CPUs, a practice that Intel recently abandoned with its Skylake-X chips.
Because each eight-core Zeppelin brings two channels of memory to the table that are connected across the Infinity Fabric, Threadripper's memory-access characteristics are inherently non-uniform. Non-uniform memory access is a familiar concept in the server world, but regular folk haven't had to worry about NUMA on the desktop at least since AMD's own Quad FX platform put up a fight against the Core 2 Extreme QX6700 a decade and change ago.
Basically, when a naive application performs a memory operation on a NUMA system, it may find that some accesses are serviced more quickly than others. Some applications might not care about this difference, but other, more latency-sensitive software could experience reduced performance. Operating system support for NUMA should generally serve to cushion these bumps, but it doesn't change the basic fact that not every application will tolerate inconsistent memory access characteristics well.
As a result, AMD offers Threadripper owners two choices for memory-access modes in its Ryzen Master utility. By default, the CPU will present itself as a uniform memory access node, which AMD calls "Distributed Mode." Despite what its topology might suggest, this mode does not offer uniform memory access latencies. Instead, Distributed Mode accepts the latency differences inherent to the Threadripper multi-chip module as a tradeoff for delivering maximum bandwidth from all four memory channels. AMD says applications with unknown or unpredictable threading behavior will, on average, still experience a performance benefit from distributing memory accesses over all of the available channels.
In its testing, AMD also found that the performance of some applications, games especially, can benefit from running the Threadripper MCM as two separate NUMA nodes, each with what is essentially dual-channel memory. In this "Local Mode," the operating system will have free rein to keep applications running on cores closest to the memory where their data resides. Local Mode will let the operating system work as hard as possible to keep an application's workload and memory within one node before assigning work and data to the other node. For the most part, then, Local Mode should present an application with the lowest memory latencies possible at the cost of potentially lower bandwidth.
AMD further discovered that presenting some mainstream applications with 32 threads, including some DiRT and Far Cry titles, caused application-breaking problems. For those applications, the company is offering a third switch in its Ryzen Master utility called "Legacy Compatibility Mode." The goal of this mode is to both reduce the number of active threads on the system and to lower memory access latencies. AMD achieves this by leaving simultaneous multi-threading on, turning off the active cores and threads on one Zeppelin, and putting the system in NUMA mode. In this mode, the Threadripper 1950X will behave like a single Ryzen 7 1800X, while the Threadripper 1920X will behave like a Ryzen 5 1600X. Even though all of the active cores and threads on one Zeppelin are disabled in this mode, the sleeping die will still leave its memory controller powered up in case the system needs to access the memory pool connected to the remote node.
If this is all too much to keep track of, AMD's Ryzen Master utility offers two pre-built profiles that are pretty foolproof. "Creator Mode" sets legacy compatibility mode to off, SMT to on, and the memory mode to Distributed. Game Mode turns on Legacy Compatibility Mode, and that's it. After a given profile is applied in Ryzen Master, the system will restart with the chosen mode applied.
The inclusion of a Game Mode in Ryzen Master's settings might lead some to expect that they'll need to flip between these settings as they switch between work and play. That's probably not the case. AMD says that the average performance improvement it observed across all 60 of the games and all three of the resolutions it tested with Game Mode (read: Legacy Compatibility Mode) was 4%. Some games benefited more, while others actually experienced performance reductions. The company says the following applications are especially helped by Game Mode:
- Civilization VI
- Call of Duty: Modern Warfare Remastered
- Heroes of the Storm
- Gears of War Ultimate Edition
- DOTA 2
- Watch Dogs
- Hitman: Absolution
- Fallout 4
AMD says it made Game Mode as an option for Threadripper customers who simply couldn't tolerate getting anything but the best performance possible from their systems. Honestly, I don't think most people are going to want to deal with profile fiddling and restarts in exchange for a few more FPS here and there. Be sure to see whether any games you play are on the list above, to be sure, but we felt the most appropriate way to test Threadripper was in the mode AMD ships it in by default.