Just over a year ago, AMD’s Ryzen Threadripper CPUs delivered more cores to demanding users for less money than the competition, and that formula proved successful in re-establishing AMD as a player in high-end desktop systems. As a matter of fact, AMD says its 16-core, 32-thread Threadripper 1950X was its best-selling high-end desktop part. Despite the copious compute resources on offer from the top-end Threadripper, the company heard from customers who wanted even more from their high-end desktops.
AMD listened to those folks, and this morning, the company is unleashing even more multithreaded performance from the same X399 platform that underpinned first-generation Threadrippers in the form of the $1799 Ryzen Threadripper 2990WX. As we’ve known it would for some time, the 2990WX will let builders put 32 cores and 64 threads in any X399 motherboard with nothing more than a firmware update. AMD recently flew us all the way to Maranello, Italy, home of a little racing team called Scuderia Ferrari, to give us a look under the 2990WX’s hood and to make the point that this chip is one heck of a fast CPU.
|Threadripper 2990WX||32/64||3.0||4.2||16||64||250 W||$1799|
|Threadripper 2950X||16/32||3.5||4.4||8||32||180 W||$899|
Adding more cores and threads to a Threadripper would seem to invite scaling out other parts of its platform, as well, but AMD wanted to remain within the bounds of first-generation X399 motherboards to keep an upgrade path for owners of those boards intact—much as it plans to for Socket AM4 builders through 2020. As the company tells it, that meant no changes to the pinout of the TR4 socket and no new memory channels that would require a move to single-socket-server-like motherboards.
The Threadripper 2990WX soldiers on with the same quad-channel memory arrangement as the 16-core Threadripper 1950X—and thus the same potential memory bandwidth—despite having double the number of active cores under its heat spreader. Those constraints posed challenges that AMD had to work around as it figured out how to cram even more into the same socket.
The Infinity Fabric topology of the Threadripper 2950X (and 1950X). Source: AMD
The key to scaling out Threadripper, as with most AMD projects of late, is the use of the Infinity Fabric on-die and on-package interconnect. The Infinity Fabric let AMD join together two eight-core Ryzen dies (also known as Zeppelins) to form the first Threadrippers. Two-die Threadripper multi-chip modules (or MCMs) enjoy a 50 GB/s bi-directional link over the Infinity Fabric, or roughly the equivalent of two channels of DDR4-3200 RAM.
The Infinity Fabric topology of the Threadripper 2990WX. Source: AMD
Achieving a die-to-die connection for every Zeppelin on the Threadripper 2990WX multi-chip module comes at a cost to that inter-die bandwidth. Each of the four dies on the 2990WX MCM has a 25 GB/s bi-directional connection to every other die on the package, assuming DDR4-3200 in the motherboard’s memory slots. That bandwidth is roughly equivalent to a single channel of memory, and it’s quite a bit lower than the 42 GB/s die-to-die bandwidth that the company specifies for inter-die communication on fully-connected Epyc multi-chip modules.
AMD concedes that saturating this die-to-die link could have a negative effect on performance for applications that care about bandwidth above all else. That said, our experience suggests applications that saturate memory bandwidth are rare in single-user computing, although users seriously shopping for a 32-core, 64-thread CPU probably have one foot in client workloads and one in the data center or high-performance computing center. Even so, AMD is likely safe to take this bet for the 2990WX’s target audience.
The fact that some dies on the 2990WX have access to resources like memory and connected graphics cards, while others do not, creates a challenge for the operating system in pairing programs with resources for the best performance. To help out the OS, the 2990WX will always run in a non-uniform memory access, or NUMA, topology. In contrast, the Threadripper 1950X and Threadripper 2950X give their owners the options of running in a local (or NUMA) memory-access mode as well as a distributed mode that presents the entire MCM as one uniform memory access domain.
Each I/O-capable die on the 2990WX is its own NUMA node, and each compute-only die is its own NUMA node. Because of the 2990WX’s always-on NUMA topology, the operating system will attempt to schedule threads on the die where their associated memory resides first before spilling threads over to the compute dies (where memory latency is always worst-case thanks to the round trip over the Infinity Fabric to an I/O die, all the way out to memory and back).
AMD says the performance implications of this unevenness on a fully-loaded 2990WX are less of a concern than they might seem at first blush. The company says workloads that scale up to 64 threads are generally less concerned with memory latency than they are with bandwidth, a fact that the 2990WX’s design is well-positioned to take advantage of in its mission to scale out. AMD says it’s also worked with Microsoft to increase awareness of the unusual memory-access topology of the Threadripper 2990WX multi-chip module, and that it’s continuing to work with Redmond to refine the way the chip and OS work together for better performance in the future.
Just because the Threadripper 2990WX always operates as a group of NUMA nodes doesn’t mean users are out of luck when it comes to maximizing application performance. Not all software is NUMA-aware, and other programs may be overwhelmed by having 32 cores and 64 threads at its disposal. To accommodate those applications, AMD will give owners the option to power down two Zeppelins through its Ryzen Master utility, leaving the chip with 16 cores and 32 threads. If that’s not good enough, Ryzen Master can disable as many as three Zeppelins to leave the 2990WX with eight cores and 16 threads.
As a second-generation Ryzen CPU, the Threadripper 2990WX benefits from three major refinements. First, AMD’s Precision Boost 2 technology lets Threadrippers respond gracefully to changing load conditions, resulting in what AMD fellow Joe Macri called “a much more usable machine.” First-generation Threadrippers relied on a four-cores-active boost speed before dropping down to their all-cores-active speed, a performance characteristic that Macri likened to going over a cliff. Precision Boost 2 manages boost speeds in a more linear fashion in a continuous curve from one-core loads to all-core loads, and it can adjust clock speeds in 25-MHz increments to support that mission.
Second, Extended Frequency Range 2 (XFR 2) allows the 2990WX to take advantage of ambient conditions and beefy cooling hardware to deliver better sustained performance under multi-threaded workloads. Unlike the first generation of XFR, which applied a fixed offset to both single-core and all-core clock speeds when conditions allowed, XFR 2 only affects multithreaded speeds.
Finally, the Threadripper 2990WX incorporates the Zen+ microarchitecture, born from GlobalFoundries’ 12LP process. 12LP allowed AMD to use better-performing transistors in critical parts of the Ryzen die, resulting in better cache and memory latencies. In the case of the Ryzen Threadripper 2950X—for which we’ll have a separate review shortly—the 12LP process also allowed AMD to raise the peak single-core clock speed to 4.4 GHz. By a hair, that’s the fastest stock single-core clock that AMD has shipped on a Ryzen CPU so far. The Threadripper 2990WX only tops out at a single-core clock speed of 4.2 GHz, though.
That deficit might seem strange at first, but AMD says finding enough dies with similar enough performance characteristics to allow for the same single-core clock speeds on both the 2950X and 2990WX was a challenge. That makes sense when you consider that AMD continues to select only the top five percent of Ryzen dies for use in Threadrippers to start with. The company likely needs to leave itself enough 4.4-GHz-capable silicon to make 2950Xes, and statistics probably favor assembling sets of four 4.2-GHz capable dies in sufficient quantities to make 2990WXes.
For more information on the second-generation Ryzen Threadripper lineup, as well as an unboxing video that goes through some of the hardware we’re testing with today, be sure to check out our second-generation Threadripper unveiling. Otherwise, let’s get to testing.
Introducing the Mechanical TuRk
This review marks a major change in our CPU testing methods—possibly the biggest change in our approach to any hardware testing since we began collecting and digesting frame-time data a while back. You see, every CPU review we’ve produced over the past couple years has involved manually running every single benchmark on every single system under test, manually recording those results, and manually entering them into Excel for sorting and graphing.
That method simply doesn’t scale when you have large numbers of CPUs to benchmark, multi-thousand-word reviews to write, and tight deadlines to deliver it all. It also leads to torturous all-nighters in the lead-up to reviews, and those are exceedingly bad for long-term human health and mental performance. All told, our methods weren’t serving us or you, the audience, any longer.
Over the past few weeks, our code monkey Bruno Ferreira has been exploring how to automate our CPU tests so that we can balance the competing demands of performing in-depth testing, getting launch-day coverage out the door, and respecting the limits of the human body. That work has been an overwhelming success, and the end product is a little utility called the Mechanical TuRk. We’ve been able to wrap up the vast majority of our CPU tests into a single easy-to-install package that requires one button click to run and no human involvement to log or collect the results.
Even better, Bruno went above and beyond by creating a utility for us called Suleiman that automates the process of collecting data from each test system, transferring it into Excel, sorting it, and graphing it—all over the local network in the TR labs. That means most of our productivity testing is only limited by the number of systems I can build and run simultaneously, not the amount of time I can stay awake before falling asleep on my feet. Watching test results flow through Suleiman and into Excel automagically is downright intoxicating.
For all its automation, the TuRk doesn’t alter a key value behind our testing: the availability and reproducibility of the benchmarks behind our work. All of the utilities and tests it runs are still free to download and run individually, or they reside on public websites. Through head-to-head testing, we’ve confirmed that the overhead of the tool is vanishing and has no material effect on the benchmark numbers it collects.
We won’t be making the TuRk package itself available to the community, but interested parties can still verify our work independently or compare the performance of their own systems by downloading and running any particular benchmark or benchmarks of interest.
All told, the Mechanical TuRk will ultimately let us give you more of what you want from TR: more in-depth tests on more systems in more high-quality reviews. The value of this tool can’t be overstated for our work. Three cheers for Bruno for putting this tool together for us in a short timeframe and making it production-ready.
For the moment, however, our work in testing and refining the TuRk has left us short on time to perform some of the manual benchmarking we still need to do in order to make a complete TR review of the Ryzen Threadripper 2990WX. The most prominent victim of our time crunch is the DAW Bench suite, which still requires quite a bit of active time for every chip we need to test. We apologize for the omission of those numbers from this review, and we’ll be collecting them ASAP for a follow-up article (along with a number of other interesting content-creation workloads that weren’t natural fits for this piece). Thanks in advance for your patience—the long-term payoff will be worth it.
Our testing methods
As always, we did our best to deliver clean benchmarking numbers. We ran each benchmark at least three times and took the median of those results. Our test systems were configured as follows:
|Processor||Intel Core i7-8086K|
|CPU cooler||Corsair H110i 280-mm closed-loop liquid cooler|
|Motherboard||Gigabyte Z370 Aorus Gaming 7|
|Memory size||16 GB|
|Memory type||G.Skill Flare X 16 GB (2x 8 GB) DDR4 SDRAM|
|Memory speed||3200 MT/s (actual)|
|Memory timings||14-14-14-34 2T|
|System drive||Samsung 960 Pro 512 GB NVMe SSD|
|Processor||AMD Ryzen Threadripper 2990WX||AMD Ryzen Threadripper 2950X|
|CPU cooler||Enermax Liqtech TR4 240-mm closed-loop liquid cooler|
|Motherboard||Gigabyte X399 Aorus Xtreme|
|Memory size||32 GB|
|Memory type||G.Skill Flare X 32 GB (4x 8 GB) DDR4 SDRAM|
|Memory speed||3200 MT/s (actual)|
|Memory timings||14-14-14-34 1T|
|System drive||Samsung 970 EVO 500 GB NVMe SSD|
|Processor||AMD Ryzen Threadripper 1950X||AMD Ryzen Threadripper 1920X|
|CPU cooler||AMD Wraith Ripper|
|Motherboard||Gigabyte X399 Aorus Gaming 7|
|Memory size||32 GB|
|Memory type||G.Skill Flare X 32 GB (4x 8 GB) DDR4 SDRAM|
|Memory speed||3200 MT/s (actual)|
|Memory timings||14-14-14-34 1T|
|System drive||Samsung 960 EVO 500 GB NVMe SSD|
|Processor||Core i9-7980XE||Core i9-7960X||Core i9-7900X||Core i7-7820X|
|CPU cooler||Corsair H150i Pro 360-mm closed-loop liquid cooler|
|Motherboard||Gigabyte X299 Designare EX|
|Memory size||32 GB|
|Memory type||G.Skill Flare X 32 GB (4x 8 GB) DDR4 SDRAM|
|Memory speed||3200 MT/s (actual)|
|Memory timings||14-14-14-34 1T|
|System drive||Intel 750 Series 400 GB NVMe SSD|
Our test systems shared the following components:
|Graphics card||Nvidia GeForce GTX 1080 Ti Founders Edition|
|Graphics driver||GeForce 398.82|
|Power supply||Thermaltake Grand Gold 1200 W (Intel X299)
Seasonic Prime Platinum 1000 W (AMD Threadripper 2950X/2990WX)
Corsair RMx 850 W (AMD Threadripper 1950X/1920X)
Seasonic SS660-XP2 660 W (Core i7-8086K)
Some other notes on our testing methods:
- All test systems were updated with the latest firmware, graphics drivers, and Windows updates before we began collecting data, including patches for the Spectre and Meltdown vulnerabilities where applicable. As a result, test data from this review should not be compared with results collected in past TR reviews. Similarly, all applications used in the course of data collection were the most current versions available as of press time and cannot be used to cross-compare with older data.
- Our test systems were all configured using the Windows Balanced power plan, including AMD systems that previously would have used the Ryzen Balanced plan. AMD’s suggested configuration for its CPUs no longer includes the Ryzen Balanced power plan as of Windows’ Fall Creators Update, also known as “RS3” or Redstone 3.
- Unless otherwise noted, all productivity tests were conducted with a display resolution of 2560×1440 at 60 Hz. Gaming tests were conducted at 1920×1080 and 144 Hz.
Our testing methods are generally publicly available and reproducible. If you have any questions regarding our testing methods, feel free to leave a comment on this article or join us in the forums to discuss them.
Memory subsystem performance
The AIDA64 utility includes some basic tests of memory bandwidth and latency that will let us peer into the differences in behavior among the memory subsystems of the processors on the bench today, if there are any.
With the same memory speeds and timings, the Skylake-X chips run away in AIDA64’s memory read test, but the results for writes and copies are much more closely matched among our quad-channel contenders. The Threadripper 2950X proves an especially eager writer and copier in these tests.
AIDA64’s memory latency test is especially interesting given that we have X-series and WX-series Threadrippers in our stable. On the Threadripper 2990WX, Windows seems to be working to keep the memory latency test scheduled on a core with “near” access, so we get the lowest possible latency figure we would expect from the part. As more and more threads are scheduled on the chip and spill onto its compute dies, we might expect latencies to creep upwards, but we can’t test that expectation with AIDA64 alone.
The 2950X, on the other hand, doesn’t seem to benefit from that same NUMA-awareness, so we get a result more in line with Threadrippers past.
Some quick synthetic math tests
AIDA64 also includes some useful micro-benchmarks that we can use to flush out broad differences among CPUs on our bench. The PhotoWorxx test uses AVX2 and AVX-512 on compatible CPUs. The CPU Hash integer benchmark uses AVX and Ryzen CPUs’ Intel SHA Extensions support, while the single-precision FPU Julia and double-precision Mandel tests use AVX2 with FMA.
While PhotoWorxx uses AVX-512, FinalWire suggests that it’s also quite sensitive to memory bandwidth. That might explain the negative scaling in performance we see as core counts climb among our Skylake-X chips—there isn’t a concurrent increase in bandwidth to feed the i9-7960X or i9-7980XE. Despite being limited to AVX2 instructions and having less theoretical SIMD throughput, the Threadrippers don’t trail that far behind the Skylake-X pack.
As we mentioned, CPU Hash uses Intel’s SHA Extensions on compatible chips to accelerate certain cryptographic functions. As the Threadrippers’ disproportionately large results in this test show, they support those extensions. Intel’s chips do not.
FinalWire says both the single-precision Julia and double-precision Mandel tests use AVX-512, as well, and as the plucky Core i7-7820X shows, those instructions can let compatible chips punch far above their weight class when SIMD throughput is asked for. The Core i9-7960X and i9-7980XE lead the pack by no small margin in this synthetic, too. However, the Threadripper 2990WX plants itself among the Core i9-7900X, the Core i9-7960X, and i9-7980XE by dint of its sheer core count.
We don’t normally include AIDA64’s ray-tracing benchmarks in our selection of synthetics, but our slate of high-end desktop CPUs warrants pulling this one out. These tests support AVX-512, as well, and they allow the Skylake-X chips on the bench to achieve some truly jaw-dropping throughput—especially from the i9-7960X and i9-7980XE. Only the Threadripper 2990WX’s core count allows it to stay in the mix here.
Now that we’ve oohed and aahed at the theoretical mathematical prowess of our test subjects, let’s see how they handle real-world workloads.
On the whole, Chrome seems to be kinder to AMD CPUs than Edge. The Threadripper 2990WX can generally outpace the Core i9-7960X and Core i9-7980XE, and it tends to hang right with the Core i9-7900X. It’s impressive that we don’t have to sacrifice much, if any, single-threaded responsiveness to get 32 cores in an X399 motherboard.
The WebXPRT 3 benchmark is meant to simulate some realistic workloads one might encounter in web browsing. It’s here primarily as a counterweight to the more synthetic microbenchmarking tools above.
Compiling code with GCC
Our resident code monkey, Bruno Ferreira, helped us put together this code-compiling test. Qtbench records the time needed to compile the Qt SDK using the GCC compiler. The number of jobs dispatched by the Qtbench script is configurable, and we set the number of threads to match the hardware thread count for each CPU.
Qtbench does scale with thread count, but its returns quickly diminish around 16 cores. Still, the Threadripper 2990WX ekes out the barest win here.
File compression with 7-Zip
The free and open-source 7-Zip archiving utility has a built-in benchmark that occupies every core and thread of the host system.
So here’s the first bump in the road for the Threadripper 2990WX. Despite what should be a class-leading complement of cores and threads in this benchmark, the chip falls to the back of the pack in 7-Zip’s compression test. We asked AMD whether it had observed similar behavior with 7-Zip compression, and the company does believe that some refinement may be required to see full performance from the 2990WX in this benchmark.
I’ve also asked some other reviewers about their experiences with the chip in this benchmark, and it does appear that we’re looking at a Windows-specific performance issue. In at least one benchmark of 7-Zip compression under Linux that the reviewers at Techgage shared with me, the 2990WX leads the pack, as we would expect.
It’ll be interesting to see what happens if AMD does manage to help the 7-ZIp developers straighten out compression performance, because the 2990WX’s decompression speeds are eye-popping.
Just to drive home the point that the 2990WX’s compression performance under 7-zip is abnormal, witness its performance in AIDA64’s benchmark of the Zlib compression algorithm. Presuming 7-zip’s developers can uncork this Threadripper’s full capabilities with some tuning, we could be in for some great compression numbers to go with the chip’s incredible decompression performance in the future.
Disk encryption with Veracrypt
May as well get the bad results out of the way early. The Veracrypt benchmark also doesn’t seem to like something about the 2990WX, at least in its accelerated AES portion.
Move to the pure number-crunching demands of the unaccelerated Twofish algorithm, however, and the 2990WX stretches its incredible integer muscle once again.
The evergreen Cinebench benchmark is powered by Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. The test runs with a single thread and then with as many threads as possible.
One doesn’t run Cinebench for its single-threaded mode, though. This benchmark is all about stretching every core and thread possible, and the Threadripper 2990WX does just that in this test.
Blender is a widely-used, open-source 3D modeling and rendering application. The app can take advantage of AVX2 instructions on compatible CPUs. We chose the “bmw27” test file from Blender’s selection of benchmark scenes to put our CPUs through their paces.
Our Cinebench result wasn’t a fluke. The Threadripper 2990WX shaves an incredible 50 seconds off the Core i9-7980XE’s render time. We don’t often see step-function increases in performance like this.
Corona, as its developers put it, is a “high-performance (un)biased photorealistic renderer, available for Autodesk 3ds Max and as a standalone CLI application, and in development for Maxon Cinema 4D.”
The company has made a standalone benchmark with its rendering engine inside, so it’s a no-brainer to give it a spin on these CPUs.
Corona is another out-of-the-park home run for the Threadripper 2990WX.
Indigo Bench is a standalone application based on the Indigo rendering engine, which creates photo-realistic images using what its developers call “unbiased rendering technologies.”
Despite its general willingness to devour every core and thread we can throw at it, Indigo’s “Bedroom” scene doesn’t favor the Threadripper 2990WX like our other rendering benchmarks have. Perhaps this is another software-related rough spot that can be polished out with future updates.
The “Supercar” scene is kinder to the Threadripper 2990WX, but its performance is no better than that of its 32-threaded stablemates. The Core i9-7960X and i9-7980XE demonstrate that further scaling is still possible, so perhaps the Threadrippers still have some performance waiting to be tapped.
Handbrake is a popular video-transcoding app that recently hit version 1.1.1. To see how it performs on these chips, we converted a roughly two-minute 4K source file from an iPhone 6S into a 1920×1080, 30 FPS MKV using the HEVC algorithm implemented in the x265 open-source encoder. We otherwise left the preset at its default settings.
Handbrake proves another case where our Threadrippers hit a scaling wall that the Core i9-7960X and Core i9-7980XE do not.
Computational fluid dynamics is an interesting and CPU-intensive benchmark. For years and years, we’ve used the Euler3D benchmark from Oklahoma State University’s CASElab, but that benchmark has become more and more difficult to continue justifying in today’s newly-competitive CPU landscape thanks to its compilation with Intel tools (and hence a baked-in vendor advantage).
Ahead of this review, we set out to find a more vendor-neutral and up-to-date computational fluid dynamics benchmark than the wizened Euler3D. As it happens, the SPECwpc benchmark includes a CFD test constructed with Microsoft’s HPC Pack, the OpenFOAM toolkit, and the XiFoam solver. More information on XiFoam is available here. SPECwpc allows us to yoke every core and thread of our test systems for this benchmark. The result is reported as a multiple of SPEC’s reference system performance, namely a Lynnfield Xeon X3430 with 8 GB of dual-channel RAM.
As our long experience with Euler3D has suggested, CFD workloads tend to bottleneck on memory bandwidth, perhaps explaining the large cluster of Threadrippers and Skylake-X CPUs in the middle of our chart. Somehow, though, the Threadripper 2950X and 2990WX break through to take the top spots in this benchmark.
The SPECwpc benchmark also includes a Windows-ready implementation of NAMD. As its developers describe it, NAMD “is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations.” Our ambitions are considerably more modest, but NAMD seems an ideal benchmark for our many-core single-socket CPUs.
As its developers suggest, NAMD has no trouble scaling with every core and thread we can throw at it. The Threadripper 2990WX is far and away the fastest chip here.
That’s it for our first round of productivity testing with the 2990WX. While many applications in our test suite have no trouble yoking every core and thread the 2990WX has to offer, others delivered performance far below the chip’s apparent potential. AMD may have some work ahead of it to ensure that even multithreaded software can take full advantage of its uberchip.
Next up, we’re going to do something completely ill-advised for the heck of it by running our CPU-bound gaming stress tests on the 2990WX. Buckle up.
Even as it passes six years of age, Crysis 3 remains one of the most punishing games one can run. With an appetite for CPU performance and graphics power alike, this title remains a great way to put the performance of any gaming system in perspective.
Crysis 3 ought to be a best-case scenario for the 2990WX, but even so, the chip trails the pack. At least the game seems to handle 32 cores and 64 threads gracefully, as cutting over to the 2990WX’s 16-core, 32-thread mode actually results in a minor decrease in performance.
These “time spent beyond X” graphs are meant to show “badness,” those instances where animation may be less than fluid—or at least less than perfect. The formulas behind these graphs add up the amount of time our graphics card spends beyond certain frame-time thresholds, each with an important implication for gaming smoothness. Recall that our graphics-card tests all consist of one-minute test runs and that 1000 ms equals one second to fully appreciate this data.
The 50-ms threshold is the most notable one, since it corresponds to a 20-FPS average. We figure if you’re not rendering any faster than 20 FPS, even for a moment, then the user is likely to perceive a slowdown. 33 ms correlates to 30 FPS, or a 30-Hz refresh rate. Go lower than that with vsync on, and you’re into the bad voodoo of quantization slowdowns. 16.7 ms correlates to 60 FPS, that golden mark that we’d like to achieve (or surpass) for each and every frame.
To best demonstrate the performance of these systems with a powerful graphics card like the GTX 1080 Ti, it’s useful to look at our three strictest graphs. 8.3 ms corresponds to 120 FPS, the lower end of what we’d consider a high-refresh-rate monitor. We’ve recently begun including an even more demanding 6.94-ms mark that corresponds to the 144-Hz maximum rate typical of today’s high-refresh-rate gaming displays.
While the fact that the 2990WX only holds our GTX 1080 Ti up for about three seconds of our one-minute test run at the 8.3-ms mark might sound impressive, that’s not a great result in this company. Even the Threadripper 1920X makes our graphics card spend about half as long past that mark, and the rest of our chips manage a second or less of hold-up time at this threshold.
Deus Ex: Mankind Divided
Thanks to its richly detailed environments and copious graphics settings, Deus Ex: Mankind Divided can punish graphics cards at high resolutions and make CPUs sweat at high refresh rates.
Turn Deus Ex: Mankind Divided loose on every one of the 2990WX’s cores and threads, and you will have a bad time. Cutting the core count in half leads to better performance, but the 2990WX is still overshadowed by two-die Threadrippers, not to mention every Skylake-X part in our stable.
At the 8.3-ms mark, our time-spent-beyond-X analysis throws the 2990WX’s struggles into sharp relief. Running a 1920×1080 monitor with a massively powerful graphics card like the GTX 1080 Ti is a stress test, to be sure, but the 2990WX just wilts under the pressure. Choosing settings that force a game to become CPU-bound on the 2990WX is simply not advisable.
Grand Theft Auto V
Grand Theft Auto V‘s lavish simulation of Los Santos and surrounding locales can really put the hurt on a CPU, and we’re putting that characteristic to good use here.
Grand Theft Auto V is no better equipped to take advantage of the 2990WX’s thread count than Deus Ex: Mankind Divided, but at least switching off half the chip leads to a major increase in performance for both average frame rates and 99th-percentile frame times.
Our time-spent-beyond-X measurements show just how large an improvement the 2990WX’s 16-core mode provides in GTA V. The chip spends less than a quarter of the time holding up our GTX 1080 Ti with half of its cores active than with every cylinder firing.
Assassin’s Creed Origins
Assassin’s Creed Origins isn’t just striking to look at. It’ll happily scale with CPU cores, and that makes it an ideal case for our test bench.
Our Assassin’s Creed Origins numbers don’t paint a pretty picture for the 2990WX or 2950X at first glance. However, the Denuvo DRM baked into Assassin’s Creed Origins prevented me from performing retesting to confirm or exclude a performance problem with these parts. That said, I am comfortable in saying that Origins isn’t terribly happy on the 2990WX, especially in its full 32-core, 64-thread glory. A sub-60-FPS average frame rate is not something the unsuspecting gamer wants to see from their GTX 1080 Ti. Flipping on the 2990WX’s 16-core mode helps, but it still can’t help the GTX 1080 Ti run as quickly as it does with the Skylake-X chips or the first-generation Threadrippers. All told, we likely need to re-run some tests to confirm the behavior we’re seeing in this title from the 2950X and 2990WX.
Far Cry 5
Far Cry 5‘s Montana mountain setting has plenty to say about the end times, but that doesn’t mean one should have to suffer apocalyptic performance from their gaming PC in the bargain. Weirdly enough, the low frame rates and high 99th-percentile frame times we saw in this title didn’t go away when we started bumping up the resolution to shift our performance bottleneck onto the CPU. Perhaps Far Cry 5 just isn’t expecting to be run on a CPU with a server-like core and thread count. Indeed, shifting our 2990WX into its 16-core form greatly improved the gameplay experience.
Our time-spent-beyond-X measurements prove that it’s essential to cut down the 2990WX’s core and thread count if you’re unwisely attempting to push Far Cry 5‘s frame rates to its limits on this chip.
Overall, we wouldn’t put too much stock in our gaming tests on the way to a final judgment of the Threadripper 2990WX or any other CPU architecture that descends from the data center to the desktop. High-end desktop chips and high-refresh-rate gaming generally do not go together well, and we knew that going into these tests. We forged ahead just to see what would happen. Even so, we weren’t expecting the 2990WX to perform as weakly as it did in many of our CPU-bound games.
It’s nice that the Ryzen Master utility offers users the ability to cut down the chip’s core and thread counts to combat potential performance issues like the ones we saw. Even so, having to deactivate cores and threads in order to achieve even decent baseline performance at lower resolutions with fast graphics cards could limit the 2990WX’s appeal for framerate-hungry single-PC streamers who might want every bit of the chip’s processing power at their disposal. AMD did demonstrate the chip encoding a stream of a 4K game at its presentation in Italy, and perhaps that mostly-GPU-bound scenario is how flush-with-cash streamers enamored of AMD CPUs will want to plan around this part.
Speaking of which, we wanted to run our usual single-PC gaming and streaming tests with the fully-enabled Threadripper 2990WX to see how it stacked up against the Core i9-7980XE, but Far Cry 5 is our title of choice for that testing right now, and you can imagine how that might turn out given our test results above. We’re exploring alternative titles that we can perform that testing with, and we’ll gather performance results and share them as soon as we’re able.
A quick look at power consumption and efficiency
To get a sense of just how the chips we’re testing today balance power consumption and performance, we’re going to run our usual back-of-the-napkin numbers using each part’s performance in the Blender “bmw27” benchmark and some handy observations from our Watts Up power meter.
First, we’ll compare the instantaneous power draw of our test systems under load while they run the Blender “bmw27” benchmark. If we stopped our analysis here, we might call the Threadripper 2990WX a huge power hog, but that wouldn’t be correct.
Recall that the 2990WX completes our Blender benchmark in record time—far faster than any other chip on the bench. Take that time-to-completion in seconds and multiply it by the instantaneous power draw of the chip, and we get an estimate of the total energy each chip on the bench needs to complete the “bmw27” test scene.
The 2990WX may have the highest instantaneous power draw of any high-end desktop chip on our test bench by a wide margin, but its startlingly quick completion of that workload lets it consume the least power of any chip on the bench by our estimate. Not only is the 2990WX the winner on sheer performance, it’s also the least power-hungry when we ask it to do something that scales well across all of its cores.
This scatter plot might help visualize the balance between power consumption and performance for the chips we’re testing. The best results on this chart will tend toward the lower-left corner, where energy consumption is lowest and time-to-completion is fastest. While there’s a cluster of chips that take about the same amount of time and expend the same amount of energy to finish our test workload, the Threadripper 2990WX just stands out all the more here.
AMD’s Ryzen Threadripper 2990WX often proves to be the best-performing high-end desktop CPU on the planet, but its potential isn’t fully realized yet.
When this chip rips, it really rips. We’ve never seen the kind of speed the 2990WX delivers in Blender and Corona rendering before. The 2990WX also turns in chart-topping performances in HPC workloads like the SPECwpc NAMD benchmark. Handbrake transcoding with the x265 encoder and the Indigo “Supercar” scene are about the only tests we’ve yet run where Intel’s i9-7980XE (and i9-7960X) take large leads over AMD’s uberchip.
In other cases, software doesn’t seem to know quite what to make of the 2990WX, as evidenced by our 7-zip compression results, the Indigo benchmark’s “Bedroom” test scene, and our Veracrypt AES test. AMD acknowledges that the 2990WX’s hardware might be in front of the software in some cases, so some patience might be required until developers can tune their applications to get the most from the chip. I’m fairly confident devs will be able to smooth out those wrinkles with time, but we have to judge the Ryzen 2990WX as it stands today.
One might be tempted to think a chip with this many cores and threads can handle both work and play, and that might be true for high-resolution gaming at sub-100-Hz refresh rates, but we’d still advise caution with the 2990WX after hours. We ran our high-refresh-rate 1920×1080 gaming tests on the 2990WX just to see what would happen, and it trails far behind the rest of the high-end desktop pack. On top of that, its sheer thread count and NUMA-ness seem capable of causing show-stopping playability problems with some titles, as our Far Cry 5 tests demonstrated.
To get around those issues, AMD lets owners shut off the 2990WX’s compute dies or even reduce the chip to just one eight-core, 16-thread die through the Ryzen Master utility, and those measures do work. Still, those are Band-Aids, not panaceas. If for some reason you primarily envision yourself running this chip in its 16-core mode, the Threadripper 2950X is a much better choice and costs much less.
We’re not holding gaming performance against the 2990WX, to be clear—high-end desktop platforms are rarely the right choice for gamers trying to push the most frames possible. Given the behavior we saw, though, developers may need to begin thinking about NUMA nodes and single-socket-server-like core counts as they refine today’s titles and prototype tomorrow’s.
If AMD works with software developers to address the issues we observed with some of our multithreaded benchmarks, we’ll immediately crown the Threadripper 2990WX as the chip to get in the vast majority of ultra-high-end desktops. For the moment, though, I’m holding off on the full-throated endorsement a TR Editor’s Choice award would imply.
That said, if your workload already scales well on the 2990WX and your time is money, I’d still recommend this chip today. Stay tuned as we finish even more testing on our stable of CPUs to get a complete picture of the newly reshaped high-end desktop landscape the Threadripper 2990WX heralds.