Just over a year ago, AMD's Ryzen Threadripper CPUs delivered more cores to demanding users for less money than the competition, and that formula proved successful in re-establishing AMD as a player in high-end desktop systems. As a matter of fact, AMD says its 16-core, 32-thread Threadripper 1950X was its best-selling high-end desktop part. Despite the copious compute resources on offer from the top-end Threadripper, the company heard from customers who wanted even more from their high-end desktops.
AMD listened to those folks, and this morning, the company is unleashing even more multithreaded performance from the same X399 platform that underpinned first-generation Threadrippers in the form of the $1799 Ryzen Threadripper 2990WX. As we've known it would for some time, the 2990WX will let builders put 32 cores and 64 threads in any X399 motherboard with nothing more than a firmware update. AMD recently flew us all the way to Maranello, Italy, home of a little racing team called Scuderia Ferrari, to give us a look under the 2990WX's hood and to make the point that this chip is one heck of a fast CPU.
|Threadripper 2990WX||32/64||3.0||4.2||16||64||250 W||$1799|
|Threadripper 2950X||16/32||3.5||4.4||8||32||180 W||$899|
Adding more cores and threads to a Threadripper would seem to invite scaling out other parts of its platform, as well, but AMD wanted to remain within the bounds of first-generation X399 motherboards to keep an upgrade path for owners of those boards intact—much as it plans to for Socket AM4 builders through 2020. As the company tells it, that meant no changes to the pinout of the TR4 socket and no new memory channels that would require a move to single-socket-server-like motherboards.
The Threadripper 2990WX soldiers on with the same quad-channel memory arrangement as the 16-core Threadripper 1950X—and thus the same potential memory bandwidth—despite having double the number of active cores under its heat spreader. Those constraints posed challenges that AMD had to work around as it figured out how to cram even more into the same socket.
The key to scaling out Threadripper, as with most AMD projects of late, is the use of the Infinity Fabric on-die and on-package interconnect. The Infinity Fabric let AMD join together two eight-core Ryzen dies (also known as Zeppelins) to form the first Threadrippers. Two-die Threadripper multi-chip modules (or MCMs) enjoy a 50 GB/s bi-directional link over the Infinity Fabric, or roughly the equivalent of two channels of DDR4-3200 RAM.
Achieving a die-to-die connection for every Zeppelin on the Threadripper 2990WX multi-chip module comes at a cost to that inter-die bandwidth. Each of the four dies on the 2990WX MCM has a 25 GB/s bi-directional connection to every other die on the package, assuming DDR4-3200 in the motherboard's memory slots. That bandwidth is roughly equivalent to a single channel of memory, and it's quite a bit lower than the 42 GB/s die-to-die bandwidth that the company specifies for inter-die communication on fully-connected Epyc multi-chip modules.
AMD concedes that saturating this die-to-die link could have a negative effect on performance for applications that care about bandwidth above all else. That said, our experience suggests applications that saturate memory bandwidth are rare in single-user computing, although users seriously shopping for a 32-core, 64-thread CPU probably have one foot in client workloads and one in the data center or high-performance computing center. Even so, AMD is likely safe to take this bet for the 2990WX's target audience.
The fact that some dies on the 2990WX have access to resources like memory and connected graphics cards, while others do not, creates a challenge for the operating system in pairing programs with resources for the best performance. To help out the OS, the 2990WX will always run in a non-uniform memory access, or NUMA, topology. In contrast, the Threadripper 1950X and Threadripper 2950X give their owners the options of running in a local (or NUMA) memory-access mode as well as a distributed mode that presents the entire MCM as one uniform memory access domain.
Each I/O-capable die on the 2990WX is its own NUMA node, and each compute-only die is its own NUMA node. Because of the 2990WX's always-on NUMA topology, the operating system will attempt to schedule threads on the die where their associated memory resides first before spilling threads over to the compute dies (where memory latency is always worst-case thanks to the round trip over the Infinity Fabric to an I/O die, all the way out to memory and back).
AMD says the performance implications of this unevenness on a fully-loaded 2990WX are less of a concern than they might seem at first blush. The company says workloads that scale up to 64 threads are generally less concerned with memory latency than they are with bandwidth, a fact that the 2990WX's design is well-positioned to take advantage of in its mission to scale out. AMD says it's also worked with Microsoft to increase awareness of the unusual memory-access topology of the Threadripper 2990WX multi-chip module, and that it's continuing to work with Redmond to refine the way the chip and OS work together for better performance in the future.
Just because the Threadripper 2990WX always operates as a group of NUMA nodes doesn't mean users are out of luck when it comes to maximizing application performance. Not all software is NUMA-aware, and other programs may be overwhelmed by having 32 cores and 64 threads at its disposal. To accommodate those applications, AMD will give owners the option to power down two Zeppelins through its Ryzen Master utility, leaving the chip with 16 cores and 32 threads. If that's not good enough, Ryzen Master can disable as many as three Zeppelins to leave the 2990WX with eight cores and 16 threads.
As a second-generation Ryzen CPU, the Threadripper 2990WX benefits from three major refinements. First, AMD's Precision Boost 2 technology lets Threadrippers respond gracefully to changing load conditions, resulting in what AMD fellow Joe Macri called "a much more usable machine." First-generation Threadrippers relied on a four-cores-active boost speed before dropping down to their all-cores-active speed, a performance characteristic that Macri likened to going over a cliff. Precision Boost 2 manages boost speeds in a more linear fashion in a continuous curve from one-core loads to all-core loads, and it can adjust clock speeds in 25-MHz increments to support that mission.
Second, Extended Frequency Range 2 (XFR 2) allows the 2990WX to take advantage of ambient conditions and beefy cooling hardware to deliver better sustained performance under multi-threaded workloads. Unlike the first generation of XFR, which applied a fixed offset to both single-core and all-core clock speeds when conditions allowed, XFR 2 only affects multithreaded speeds.
Finally, the Threadripper 2990WX incorporates the Zen+ microarchitecture, born from GlobalFoundries' 12LP process. 12LP allowed AMD to use better-performing transistors in critical parts of the Ryzen die, resulting in better cache and memory latencies. In the case of the Ryzen Threadripper 2950X—for which we'll have a separate review shortly—the 12LP process also allowed AMD to raise the peak single-core clock speed to 4.4 GHz. By a hair, that's the fastest stock single-core clock that AMD has shipped on a Ryzen CPU so far. The Threadripper 2990WX only tops out at a single-core clock speed of 4.2 GHz, though.
That deficit might seem strange at first, but AMD says finding enough dies with similar enough performance characteristics to allow for the same single-core clock speeds on both the 2950X and 2990WX was a challenge. That makes sense when you consider that AMD continues to select only the top five percent of Ryzen dies for use in Threadrippers to start with. The company likely needs to leave itself enough 4.4-GHz-capable silicon to make 2950Xes, and statistics probably favor assembling sets of four 4.2-GHz capable dies in sufficient quantities to make 2990WXes.
For more information on the second-generation Ryzen Threadripper lineup, as well as an unboxing video that goes through some of the hardware we're testing with today, be sure to check out our second-generation Threadripper unveiling. Otherwise, let's get to testing.