AMD’s Zen architecture has proven impressively scalable. From four cores and four threads to eight cores and sixteen threads and everything in between, the basic eight-core Zen die (often referred to as a Zeppelin) has made a name for itself in PCs ranging from budget builds to the high end. Most impressively, Ryzen 7 CPUs matched Intel’s Haswell and Broadwell high-end desktop parts for productivity performance.
Although the power of Ryzen 7 chips proved impressive, they lacked some of the features that the most demanding users have come to expect from Intel’s platforms. Quad-channel-memory and gobs of PCIe lanes have, until recently, remained the calling card of Intel’s X99 and X299 platforms. The Ryzen Threadripper CPU family changes all of that.
|PCIe 3.0 lanes
|16||32||3.4 GHz||4.0 GHz||200 MHz||8MB||32MB||64||4||$999|
Thanks to its Infinity Fabric on-die and inter-die interconnect, AMD can join multiple Zeppelins on one package to scale past eight cores and two channels of memory. We’ve already seen that approach used to great effect in the Epyc family of server processors, and it’s the technique AMD is using to make the Ryzen Threadripper 1920X and Ryzen Threadripper 1950X. If you’d like to read more about the Zen architecture’s fine details, David Kanter’s run-down of Zen is a fine place to start. We’ll just be covering the broad strokes of how Zeppelins become Threadrippers.
The massive, Epyc-esque Threadripper package sews two Zeppelins together in a diagonal Infinity Fabric topology. AMD then taps two channels of DDR4 memory and all 32 lanes of PCIe 3.0 connectivity from each die to create a formidable multi-chip module. Of the 64 PCIe 3.0 lanes this stitching-together makes available, 60 will be available for motherboard makers to distribute across the various slots and ports of their products. The remaining four are reserved for the connection between CPU and chipset. The other two “dies” under Threadripper’s massive heatspreader are dummies that serve to maintain the integrity of the package. Overclockers will appreciate that AMD continues to solder the heatspreader to Threadripper CPUs, a practice that Intel recently abandoned with its Skylake-X chips.
Because each eight-core Zeppelin brings two channels of memory to the table that are connected across the Infinity Fabric, Threadripper’s memory-access characteristics are inherently non-uniform. Non-uniform memory access is a familiar concept in the server world, but regular folk haven’t had to worry about NUMA on the desktop at least since AMD’s own Quad FX platform put up a fight against the Core 2 Extreme QX6700 a decade and change ago.
Basically, when a naive application performs a memory operation on a NUMA system, it may find that some accesses are serviced more quickly than others. Some applications might not care about this difference, but other, more latency-sensitive software could experience reduced performance. Operating system support for NUMA should generally serve to cushion these bumps, but it doesn’t change the basic fact that not every application will tolerate inconsistent memory access characteristics well.
As a result, AMD offers Threadripper owners two choices for memory-access modes in its Ryzen Master utility. By default, the CPU will present itself as a uniform memory access node, which AMD calls “Distributed Mode.” Despite what its topology might suggest, this mode does not offer uniform memory access latencies. Instead, Distributed Mode accepts the latency differences inherent to the Threadripper multi-chip module as a tradeoff for delivering maximum bandwidth from all four memory channels. AMD says applications with unknown or unpredictable threading behavior will, on average, still experience a performance benefit from distributing memory accesses over all of the available channels.
In its testing, AMD also found that the performance of some applications, games especially, can benefit from running the Threadripper MCM as two separate NUMA nodes, each with what is essentially dual-channel memory. In this “Local Mode,” the operating system will have free rein to keep applications running on cores closest to the memory where their data resides. Local Mode will let the operating system work as hard as possible to keep an application’s workload and memory within one node before assigning work and data to the other node. For the most part, then, Local Mode should present an application with the lowest memory latencies possible at the cost of potentially lower bandwidth.
AMD further discovered that presenting some mainstream applications with 32 threads, including some DiRT and Far Cry titles, caused application-breaking problems. For those applications, the company is offering a third switch in its Ryzen Master utility called “Legacy Compatibility Mode.” The goal of this mode is to both reduce the number of active threads on the system and to lower memory access latencies. AMD achieves this by leaving simultaneous multi-threading on, turning off the active cores and threads on one Zeppelin, and putting the system in NUMA mode. In this mode, the Threadripper 1950X will behave like a single Ryzen 7 1800X, while the Threadripper 1920X will behave like a Ryzen 5 1600X. Even though all of the active cores and threads on one Zeppelin are disabled in this mode, the sleeping die will still leave its memory controller powered up in case the system needs to access the memory pool connected to the remote node.
If this is all too much to keep track of, AMD’s Ryzen Master utility offers two pre-built profiles that are pretty foolproof. “Creator Mode” sets legacy compatibility mode to off, SMT to on, and the memory mode to Distributed. Game Mode turns on Legacy Compatibility Mode, and that’s it. After a given profile is applied in Ryzen Master, the system will restart with the chosen mode applied.
The inclusion of a Game Mode in Ryzen Master’s settings might lead some to expect that they’ll need to flip between these settings as they switch between work and play. That’s probably not the case. AMD says that the average performance improvement it observed across all 60 of the games and all three of the resolutions it tested with Game Mode (read: Legacy Compatibility Mode) was 4%. Some games benefited more, while others actually experienced performance reductions. The company says the following applications are especially helped by Game Mode:
- Civilization VI
- Call of Duty: Modern Warfare Remastered
- Heroes of the Storm
- Gears of War Ultimate Edition
- DOTA 2
- Watch Dogs
- Hitman: Absolution
- Fallout 4
AMD says it made Game Mode as an option for Threadripper customers who simply couldn’t tolerate getting anything but the best performance possible from their systems. Honestly, I don’t think most people are going to want to deal with profile fiddling and restarts in exchange for a few more FPS here and there. Be sure to see whether any games you play are on the list above, to be sure, but we felt the most appropriate way to test Threadripper was in the mode AMD ships it in by default.
Threadripper in the flesh
Any discussion of Threadripper cannot overlook the fact that these are enormous chips, quite unlike anything that’s graced a desktop system in recent memory. Each Threadripper CPU ships in a plastic protective shell that’s part of an elaborate and (in theory) durable box.
While my business-major instincts are hitting on words like “economies of scale with Epyc” when thinking about the sheer size of the Threadripper multi-chip module itself, it’s also ingenious marketing. Most people have an inescapable lizard-brain instinct that bigger things are better, and even compared to Intel’s LGA 2066 CPUs, the massive Threadripper chips evoke an undeniable reptilian satisfaction when you handle them.
Perhaps because of the risks of dropping a ZIF chip this large onto a socket the wrong way, Threadripper CPUs rest in a semi-permanent plastic mounting frame. The assembly pops, locks, and drops into an enormous new socket called TR4. Unlike the pin grid array of Socket AM4, TR4 is a 4094-pin land grid array that’s like a much larger version of Intel’s modern sockets. TR4 is outwardly similar (if not identical to) AMD’s Socket SP3, which plays host to Epyc server chips. Despite that similarity, the two chip families can’t be switched between sockets. You’ll still see artifacts of this shared lineage in the “SP3 SAM” stamp on the TR4 retention bracket and pin protector, though.
Installing a CPU in Socket TR4 will be a new experience for anybody not accustomed to data-center hardware. AMD includes a torque wrench (that’s also a Torx driver) in its Threadripper packaging. Use of this wrench should be considered mandatory for installing and uninstalling Threadripper CPUs. Should you lose yours, the torque spec is 1.5 newton-meters.
The first step is to loosen each Torx screw using the order printed on the retention bracket itself (3-2-1 to loosen, 1-2-3 to tighten). Once the bracket is open, it reveals a spring-loaded receiver frame that can be released by pulling gently on the two blue tabs toward the top of the socket.
Once the frame is vertical, builders will need to gently slide out a clear plastic external cap before sliding the Threadripper assembly into the guide rails on the frame. The carrier will slide into the receiver until it gently clicks into place at the bottom of its travel.
Once you feel the click, make sure to take the gray plastic pin protector off the socket before swinging the receiver frame and CPU down onto the socket. The receiver frame will also click once it’s locked back into place. Finally, lower the retainer bracket back onto the CPU and work around the three Torx screws with the torque wrench. I found that doing a half-turn on each screw in sequence was the best approach. The torque wrench will click once you’ve tightened each screw adequately.
Overall, this sequence sounds more intimidating than it is. Just watch an instructional video a couple of times before proceeding, and whatever you do, don’t plop the CPU directly onto the pins or remove it from the plastic carrier frame. I’d also keep the external cap and pin protectors stored away in your motherboard box, since you won’t ever want to leave a socket this large exposed for any length of time. Repositioning bent pins on this baby will likely not be possible without damaging others in the socket.
Applying thermal paste to a heat spreader this large also requires more planning than the usual “grain-of-rice-and-squash-it” method. The fine folks at GamersNexus have detailed several different methods of paste application and their effects on performance. The short version is that AMD recommends applying five dots of compound on the heat spreader , but I found it easiest to apply a generous blob of compound (about three times as much as one might normally apply for LGA 115x CPUs) smack in the middle of the “Z” of the Ryzen logo on the heat spreader. Regardless of the method you use, you will need much more thermal compound than with smaller CPUs to achieve full pasting of your liquid cooler’s cold plate.
Once the CPU is installed and pasted, the next step is getting the right cooler on top. Although air coolers for Threadripper are in the pipe, we imagine most will want to use liquid coolers for the most socket clearance and best performance. AMD includes a Socket TR4 mounting bracket in the box for Asetek-based liquid coolers from a wide range of manufacturers, including Corsair, NZXT, Thermaltake, and Arctic Cooling. This bracket installs easily on unadorned Asetek coolers like the Thermaltake Water 3.0 Ultimate AMD sent us for testing. Other, fancier coolers, like the Corsair H115i that I often use in testing, have preinstalled brackets that will pop off the pump head with a firm counterclockwise twist.
The Threadripper cooler bracket has asymmetric lugs that are narrower at the top of the socket, so builders will want to make sure the top of the pump head and the top of the bracket are in agreement. Once you have the bracket on your liquid cooler of choice, follow the tightening order near each screw on the bracket until they’re all snug, and you’re set.
Touring the X399 platform with Gigabyte’s X399 Aorus Gaming 7 motherboard
Ryzen Threadripper CPUs may be impressive in their own right, but a CPU is nothing without a motherboard to go with it. AMD’s vehicle for its high-end platform is the X399 chipset, which bristles with USB and PCIe lanes of its own to go with the 60 available from every Ryzen Threadripper SoC.
A great deal of connectivity comes from the Threadripper package beyond PCIe lanes. Eight USB 3.0 ports are tied to the CPU itself. The X399 chipset provides eight lanes of PCIe 2.0, eight SATA ports, two USB 3.1 Gen2 ports, another USB 3.0 port for rear-panel connectors, four internal USB 3.0 headers, and six USB 2.0 headers.
We performed our Threadripper testing using the Gigabyte X399 Aorus Gaming 7. This beastly board taps most every resource X399 has to offer, and its black-and-gray color scheme serves as a neutral canvas for today’s RGB LED-bedecked builds. More conservative builders can turn the onboard LEDs off, but honestly, on high-end systems like this, the extensive illumination the Gaming 7 offers will help communciate that you don’t have just any old motherboard in your mid-tower.
The enormous TR4 socket gets flanked with eight DIMM slots on the Gaming 7. These slots are all RGB LED-illuminated, and they use my favored one-clip design for easy insertion and removal of DIMMs. The board offers memory multipliers for DDR4 DIMMS ranging past 3600 MT/s, but I expect most will be happier to hear that ECC memory is supported by this board.
Its back panel offers a whopping eight USB 3.0 ports powered by the Ryzen SoC itself, plus USB 3.1 Gen2 Type-A and Type-C ports powered by the X399 chipset. Audio output comes courtesy of a Realtek ALC1220 codec, and an Intel wireless adapter offers 802.11ac and Bluetooth connectivity right from the board. Killer’s E2500 Gigabit Ethernet adapter handles wired networking duties.
The Gaming 7 distributes a Threadripper CPU’s 60 PCIe 3.0 lanes across four PCIe slots and three M.2 connectors. The first and fourth slots from the left in the picture above offer a full 16 lanes to the CPU, while the second and fifth tap eight of those lanes. The third slot provides four lanes of PCIe 2.0 from the X399 chip.
Two M.2 22110 slots with four lanes of PCIe 3.0 hooked up nestle between these physical X16 slots. Both connectors are shrouded with heatsinks backed by pre-applied thermal tape. I was initially concerned that one would only be able to install rare M.2 22110 devices in these slots, but Gigabyte helpfully includes a bag of M.2 standoffs in the box that can be added to the board for use with shorter drives. Once an M.2 2280 drive is secured to a standoff, one can simply peel off the protective plastic on the heatsink’s thermal pad and screw the heatsink back into the M.2 22110 standoff. Handy.
A third M.2 2280 slot with its own dedicated heatsink sits beneath the chipset heatsink. One doesn’t need a separate standoff for use with this slot unless the plan is to install a shorter drive than the typical 80-mm gumstick. I’d use this slot as the default location for an M.2 2280 drive if I were building with the Gaming 7, since it’s located a ways from any hardware that might cause heat-soaking issues with the heatsink above.
The nice thing about all of these PCIe and M.2 slots is that not a one shares its lanes with any other device on the motherboard. What you see is what you get, and that should be the case with every Ryzen Threadripper CPU. Even better, only data from the eight SATA connectors and some assorted peripherals should have to traverse the four PCIe 3.0 lanes from the chipset to the CPU. All of the M.2 devices and PCIe slots could, in theory, operate at full bandwidth without risk of a bottleneck.
I cannot overstate how big of a relief this arrangement is compared to the complicated lane-sharing that can arise on today’s Intel motherboards. Intel’s X299 chipset can be tapped for up to 24 PCIe 3.0 lanes on top of the 28 or 44 lanes direct from an LGA 2066 CPU, but those lanes will all have to traverse the DMI 3.0 connection from chipset to CPU, and it’s possible that adding M.2 devices to a board will disable random SATA connectors or cause other minor headaches.
The beefy heatsink on Aorus’ beefy power-delivery circuitry
Other nice features on the Gaming 7 include eight four-pin fan headers with automatic three-pin or four-pin fan detection, nine separate temperature sensors, gold-plated ATX and EPS power connectors, and a front-panel USB 3.1 Gen2 header.
All told, the X399 Aorus Gaming 7 has practically everything one could ask for in order to take advantage of a Ryzen Threadripper CPU’s impressive resources. Aside from one minor early teething issue that the company explained how to work around from the get-go, my experience with the Gaming 7 was flawless. At $389.99, this is not a cheap board, but it lands about in the middle of the range for X399 mobos right now. I’d heartily recommend it to anybody looking for a reasonably-priced foundation for their Threadripper CPU.
Now that we’ve seen the X399 platform in its totality, let’s get to our performance testing.
Our testing methods
As always, we did our best to deliver clean benchmarking numbers. We ran each benchmark at least three times and took the median of those results. Our test systems were configured as follows:
|AMD Ryzen Threadripper 1950X||AMD Ryzen Threadripper 1920X|
|Motherboard||Gigabyte X399 Aorus Gaming 7|
|Memory type||G.Skill Trident Z DDR4-3600 (rated) SDRAM|
|Memory speed||3200 MT/s (actual)|
|Memory timings||15-15-15-35 1T|
|System drive||Intel 750 Series 400GB NVMe SSD|
|Intel Core i9-7900X||Intel Core i7-7820X|
|Motherboard||Asus Prime X299-Deluxe|
|Memory type||G.Skill Trident Z DDR4-3600 (rated) SDRAM|
|Memory speed||3200 MT/s (actual), 3600 MT/s (actual)|
|Memory timings||15-15-15-35 2T (DDR4-3200), 16-16-16-36 2T (DDR4-3600)|
|System drive||Samsung 850 Pro 512GB|
|Intel Core i7-6950X||Intel Core i7-7740X|
|Motherboard||Gigabyte GA-X99-Designare EX||Gigabyte X299 Aorus Gaming 3|
|Chipset||Intel X99||Intel X299|
|Memory type||G.Skill Trident Z DDR4-3200 (rated) SDRAM||G.Skill Trident Z DDR4-3866 (rated) SDRAM|
|Memory speed||3200 MT/s (actual)||3200 MT/s (actual)|
|Memory timings||16-18-18-38 2T||15-15-15-35 2T|
|System drive||Samsung 960 EVO 500GB|
They all shared the following common elements:
|Storage||2x Corsair Neutron XT 480GB SSD
1x HyperX 480GB SSD
|Discrete graphics||Nvidia GeForce GTX 1080 Ti Founders Edition|
|Graphics driver version||GeForce 384.94|
|OS||Windows 10 Pro with Creators Update|
|Power supply||Seasonic Prime Platinum 1000W|
Our thanks to AMD, Intel, Gigabyte, Corsair, and G.Skill for helping us to outfit our test rigs with some of the finest hardware available. Some additional notes on our testing methods:
- Unless otherwise noted, we ran our gaming tests at 2560×1440 at a refresh rate of 144 Hz. V-sync was disabled in the driver control panel.
- For our Intel test system, we used the Balanced power plan, as we have for many years. Our AMD test bed was configured to use the Ryzen Balanced power plan that ships with AMD’s chipset drivers.
Our testing methods are generally publicly available and reproducible. If you have questions, feel free to post a comment on this article or join us in the forums.
Memory subsystem performance
To get a sense of how Threadripper’s quad-channel memory architecture affects performance in the move from the AM4 platform to X399, we rely on AIDA64’s built-in memory benchmark suite.
Compared to the Ryzen 7 1800X and its two channels of memory, the Threadrippers both nearly double the 1800X’s performance in writes and copies, but fall a bit short of that increase in reads. It’s interesting to observe that bandwidth generally doesn’t scale with core count for Threadripper, as it does to some degree for Skylake-X. Still, applications that found themselves memory-bandwidth-constrained on the AM4 platform get plenty more throughput to play with on X399.
We also tested memory latency using AIDA64’s built-in benchmark. It should be noted that the above results are a worst-case scenario for latency, thanks to our choice to run the chip in its default Distributed Mode, or as a UMA node. AMD officially says near memory access will be around 78 ns on the near die for a given application and around 133 ns for the far die, for an average latency of about 105.5 ns. Our use of DDR4-3200 with fairly typical 15-15-15-35 1T timings cuts a few nanoseconds off that figure, but on average, it appears applications should expect considerably higher memory latency from Threadrippers in their default distributed mode versus Skylake-X and its mesh architecture.
Some quick synthetic math tests
AIDA64 offers a useful set of built-in directed benchmarks for assessing the performance of the various subsystems of a CPU. The PhotoWorxx benchmark uses AVX2 on compatible CPUs, while the FPU Julia and Mandel tests use AVX2 with FMA.
Normally, we would let these results pass without comment, but AIDA64’s CPU Hash test gets a curious (and massive) speedup on Ryzen CPUs. That’s because the Zen architecture has what seems to be little-publicized support for Intel’s SHA Extensions. These extensions permit hardware acceleration of some of the SHA family of algorithms, and CPU Hash uses SHA-1 as its algorithm of choice. SHA-1 isn’t particularly useful in practice any longer, but SHA-256 is, and the folks at SiSoft report similar speedups for that algorithm. AVX implementations of other SHA versions might help Intel processors close the gap, though.
The Threadripper 1950X’s 16 cores seem to allow it to go toe-to-toe with the wider-but-less-numerous AVX FMA units in the i9-7900X in the Julia and Mandel tests. The 1920X is more or less on par with the i7-6950X and i7-7820X here, as well. If there’s one spot where throwing more cores at the problem seems to have helped Threadrippers, this is it.
Now that we’ve seen how these chips stack up on a synthetic playing field, it’s time to let them out of the corral and see how they chew through real-world work.
Doom‘s Vulkan renderer might not put the most stress on a single core, but when a game runs this fast, interesting things can still happen. We tested Doom with all of its settings maxed at 2560×1440 using the GeForce GTX 1080 Ti Founders Edition graphics card.
Vulkan is usually the great equalizer for Doom performance, so there’s not a ton of difference between the slowest and fastest CPUs in this test. Having more cores on tap is still good for a small boost in average frame rates for our many-threaded chips, though. Surprisingly, the Core i7-7820X delivers the worst 99th-percentile frame times in this test by a wide margin. In absolute terms, a 14-ms 99th-percentile frame time isn’t the worst thing in the world, but it’s a bit lackluster in this company.
Our “time-spent-beyond-X” graphs can be a bit tricky to interpret, so bear with us for just a moment before you go rocketing off to our productivity results or the conclusion. We set a number of crucial thresholds (or bins) in our data-processing tools—50 ms, 33.3 ms, 16.7 ms, 8.3 ms, and 6.94 ms—and determine how long the graphics card spent on frames that took longer than those times to render. Any time over the limit ends up aggregated in the graphs above. Those thresholds correspond to instantaneous frame rates of 20 FPS, 30 FPS, 60 FPS, 120 FPS, and 144 FPS, and “time spent beyond X” means time spent beneath those respective frame rates. We usually talk about these results as a proportion of the one-minute test runs we use to collect our data. If that’s still too much to bear, just understand that more time spent in these graphs means worse performance.
If even a handful of milliseconds make it into our 50-ms bucket, we know that the system is struggling to run a game smoothly, and it’s likely that the end user will notice severe roughness in their gameplay experience. Too much time spent on frames that take more than 33.3 ms to render means that a system running with traditional v-sync on will start running into equally ugly hitches and stutters. Ideally, we want to see a system spend as little time as possible past 16.7 ms rendering frames, and too much time spent past 8.3 ms or 6.94 ms is starting to become an important consideration for gamers with high-refresh-rate monitors and powerful graphics cards.
While all of our chips spend a little time holding up the graphics card beyond 16.7 ms, those holdups should be practically invisible. For the fast-running Doom, the more interesting results can be found past 8.3 ms and 6.94 ms. The Threadripper 1950X and the Ryzen 7 1800X lead the pack here, while the i9-7900X, the i7-6950X, and the Threadripper 1920X trade blows. The i7-7820X spends over a second under 90 FPS overall, thanks to a strange (and repeatable) pattern of stuttering that’s absent from the Core i9-7900X. These stutters mean a gamer could have a potentially less smooth experience on the i7-7820X than any other chip here.
The same story continues past 6.94 ms, although our contenders are more evenly matched here. The Threadrippers and the Core i9-7900X all spend about four seconds of our one-minute test run on tough frames that take more than 6.94 ms to render, but the i7-7820X’s curious performance means it spends another second yet on that hard work—and it’s therefore the least smooth chip here. Just goes to show that even in this largely GPU-bound test, the CPU can still matter.
Watch Dogs 2
With the right settings, Watch Dogs 2 can be a CPU benchmark. It can also be a graphics benchmark. It’s the combination CPU and graphics benchmark. For this review, we enabled the game’s temporal filtering support, dialed up the graphics quality, and set extra details to 50% to produce the most punishing CPU test we could muster.
Even with these CPU-melting settings, Watch Dogs 2 remains mostly GPU-bound at 2560×1440. In both average FPS and our 99th-percentile metric of smoothness, the Threadrippers and the Core i9s are quite closely matched. A couple more FPS on average isn’t worth worrying about that much. Strangely, the i7-7820X bookends our results. This is one title where faster memory seems to help the chip, for whatever reason.
Our time-spent-beyond-X metrics reveal a wide gulf between the 16.7-ms and 8.3-ms buckets. The best thing that can be said for our test CPUs is that none of the Core i9s or Threadrippers spend an appreciable amount of time holding up the graphics card at the critical 16.7-ms mark. This should come as a relief for AMD, since Watch Dogs 2 has historically underperformed on Ryzen chips at lower resolutions. Any of these uber-expensive CPUs are a fine companion for the GTX 1080 Ti at 2560×1440.
Deus Ex: Mankind Divided (DX11)
Deus Ex: Mankind Divided‘s geometrically rich environments pose a challenge for CPUs at 1920×1080, so we were curious what would happen with one of the world’s most powerful graphics cards at 2560×1440 with our usual test settings.
As we saw with Watch Dogs 2, cranking the resolution of Deus Ex: Mankind Divided produces largely GPU-bound results. The Core i7-76820X and i9-7900X demonstrate ever-so-slightly higher performance potential, but the differences just aren’t that large. Both Threadrippers turn in better 99th-percentile frame times than the i7-7820X, though.
The most meaningful results in our time-spent-beyond-X graphs for DXMD arise at the 8.3-ms mark. Perhaps because it packs fewer cores into the same TDP, the Threadripper 1920X spends about a second less under 120 FPS than its bigger brother. The Core i9-7900X spends about a second less still under 120 FPS, and the i7-7820X with DDR4-3200 continues its frustratingly inconsistent performance by coming in just behind the i7-6950X. All of these chips will provide a satisfying gaming experience with the GTX 1080 Ti at 2560×1440 in this title, but the Intel chips are just a bit better.
Grand Theft Auto V
Grand Theft Auto V can still put the hurt on CPUs as well as graphics cards, so we ran through our usual test run with the game’s settings turned all the way up at 2560×1440. Unlike most of the games we’ve tested so far, GTA V favors a single thread or two heavily, and there’s no way around it with Vulkan or DirectX 12. In that way, it’s a perfect test of whether a CPU can keep the graphics card fed.
GTA V doesn’t play well with Ryzens in general, and that’s still evident at 2560×1440. Despite their lower average frame rates versus the Intel competition, the Threadrippers both deliver a fine 99th-percentile frame time, so it’s not all bad. The weird stutter problem we observed in Doom returns in GTA V for the i7-7820X, though, causing its 99th-percentile frame time to trail the rest of the pack significantly. These stutters are hard to see in-game, but they are present.
Although the i7-7820X’s stuttery performance is evident at the 16.7-ms mark, the bulk of its time is aggregated beyond 8.3 ms. Even so, the i7-7820X spends about three-and-a-half seconds less under 120 FPS compared to the Threadripper 1920X, and the Threadripper 1950X trails further behind yet. The fine performance of the Ryzen 7 1800X suggests these results might be mitigated by flipping the Threadrippers into Game Mode, but that’s an extra bit of fiddling that’s simply not necessary with the single-chip solutions here.
Even among the Intel chips, the i9-7900X performs slightly worse than the i7-6950X, core-for-core. I doubt it’s a noticeable difference, but it just goes to show that progress is not always forward.
Hitman (DirectX 12)
Hitman‘s DX12 mode can take advantage of every core and thread we can throw at it. We used the same max settings that we usually do for graphics-card reviews, but keeping a GTX 1080 Ti fed is a different task than handling a GTX 1080 at 1920×1080. Let’s see how these CPUs manage with it.
Here’s an example of why it’s essential to consider both average FPS and 99th-percentile frame times together. The Ryzen CPUs trail the Intel chips in the measure of overall performance potential that average FPS affords, but the Threadrippers and the Ryzen 7 1800X turn in better 99th-percentile frame times than the Skylake-X chips do.
Our time-spent-beyond-16.7-ms graph would seem to favor the Threadrippers, but none of these chips would be distinguishable at this threshold in the real world. The 8.3-ms mark is more meaningful here once again, and the Core i9 CPUs spend somewhat less time holding up the graphics card at this threshold. That result should translate to a somewhat smoother and more fluid experience with the Intel CPUs overall.
It may be getting up there in years, but Crysis 3‘s lush “Welcome to the Jungle” level is still one of the most punishing tests of CPU gaming performance we’re aware of. We tested the game at 2560×1440 with its “Very High” preset.
With these settings, Crysis 3 seems to take advantage of every thread it can get, but the net result of more than 20 threads seems to be a wash at 2560×1440 with these settings. The only outlier—once again—is the i7-7820X, and only with DDR4-3600 memory.
Even with that weird 99th-percentile frame time, neither the i7-7820X or any other chip here spends an appreciable amount of time holding up the graphics card past the 16.7-ms mark. The 8.3-ms threshold shows that all of these parts are about equally matched here—the difference from top to bottom shows that the best and worst chips in this test are separated by about a second (except for the i7-7740X, which we should expect to lag in this heavily-threaded test).
Crysis 3 closes out our gaming results, and the numbers are largely positive for Threadrippers. In fact, I’m most worried about an Intel chip after seeing these numbers. The i7-7820X exhibits bizarre stuttering that the i9-7900X simply doesn’t, for the most part. We have a hunch why this might be, and it could involve the speed of the chip’s on-die fabric versus that of its main memory. We’ll need to examine that hunch in a separate review and with more CPU-directed game testing, though.
With only a couple exceptions, AMD’s many-core chips deliver a similar gaming experience to that of the Core i9-7900X, even without Game Mode enabled. We’d be equally happy pairing a GTX 1080 Ti and a high-refresh 2560×1440 monitor with the X399 platform as we would be with X299. We figure if you’re a gamer with the scratch for a $2000-or-more PC before a graphics card enters the equation, a $500-or-more monitor and a $700 graphics card probably aren’t much of a stretch. The flip side is that if you’re thinking about 1920×1080 gaming, these server-class chips with complex on-die interconnects will probably perform worse at 1920×1080 than cheaper, simpler chips with fewer cores and higher clocks. Spending $2500 or more on one of these machines for 1920×1080 gaming alone isn’t just unnecessary, it’s daft.
Now that we know Threadrippers have game, let’s give them plenty of threads to rip in our productivity tests.
No surprises here: a Zen core is a Zen core, and even the 4.2 GHz XFR boost on the Threadripper CPUs doesn’t move the needle in these benchmarks compared to the Ryzen 7 1800X. Still, the distance between the Threadrippers and Skylake-X chips in this test aren’t that large—about 11% to 12% at most. Since the Core i7-7740X is basically a Core i7-7700K, it exhibits that chip’s world-beating single-threaded performance here.
Compiling code with GCC
Our resident code monkey, Bruno Ferreira, helped us put together this code-compiling test. Qtbench records the time needed to compile the Qt SDK using the GCC compiler. The number of jobs dispatched by the Qtbench script is configurable, and we set the number of threads to match the hardware thread count for each CPU.
The Threadripper chips come out of the multithreaded performance gate with a win over the Core i9-7900X. Once you start throwing 20 threads or more at this problem, though, bottlenecks seem to appear elsewhere. Regardless, Threadrippers are the fastest things going overall.
File compression with 7-zip
7-zip’s compression test is another example where the battle between more cores and faster cores generally comes out in the wash. Decompress some files, though, and the Threadrippers scorch the i9-7900X, never mind the rest of the contenders here.
Disk encryption with Veracrypt
Encryption is another task that benefits from many cores, and in the accelerated AES portion of our Veracrypt testing, the Threadrippers enjoy a healthy win (although the difference between the 1920X and 1950X isn’t large). Use an algorithm that can’t be accelerated, though, and the 1920X and 1950X pull even farther ahead of the pack.
The Cinebench benchmark is powered by Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. The test runs with a single thread and then with as many threads as possible.
Intel’s chips take a clear win in the single-threaded portion of Cinebench, but that’s not why we’re here. Multi-threaded performance is where it’s at for CPU rendering, and the Threadrippers easily take the top spots in this test.
Blender is a widely-used, open-source 3D modeling and rendering application. The app can take advantage of AVX2 instructions on compatible CPUs. We chose the “bmw27” test file from Blender’s selection of benchmark scenes to put our CPUs through their paces.
Blender is another workload that heavily favors Threadrippers. The Threadripper 1950X shaves an amazing 22% off the i9-7900X’s time to completion, while the 1920X matches it. Just goes to show that if your app can take advantage of as many cores as possible, Threadrippers could allow you to do the same task in less time or more work for a given amount of time. We also call that “more bang for the buck.”
Handbrake is a popular video-transcoding app that recently hit version 1.0. To see how it performs on these chips, we’re switching things up from some of our past reviews. Here, we converted a roughly two-minute 4K source file from an iPhone 6S into a 1920×1080, 30 FPS MKV using the HEVC algorithm implemented in the x265 open-source encoder. We otherwise left the preset at its default settings.
Although the x265 encoder should take advantage of AVX2 instructions, that alone isn’t enough to let the Core i7-7820X and the Core i9-7900X outpace the Threadripper competition. The 1950X comes very close to matching the i9-7900X, while the 1920X is in a dead heat with the i7-7820X. Should the x265 developers add AVX-512 support to the encoder, this performance picture could change, but the Threadrippers are as good as the best chips out there with today’s software.
CFD with STARS Euler3D
Euler3D tackles the difficult problem of simulating fluid dynamics. It tends to be very memory-bandwidth intensive. You can read more about it right here. We configured Euler3D to use every thread available from each of our CPUs.
The extra bandwidth and cores afforded by Threadrippers helps elevate their Euler3D performance beyond that of the Ryzen 7 1800X, but even the i7-7820X can open a significant lead over the Threadripper 1950X. The i9-7900X and i7-6950X are even faster still.
It should be noted that the publicly-available Euler3D benchmark is compiled using Intel’s Fortran tools, a decision that its originators discuss in depth on the project page. Code produced this way may not perform at its best on Ryzen CPUs as a result, but this binary is apparently representative of the software that would be available in the field. A more neutral compiler might make for a better benchmark, but it may also not be representative of real-world results with real-world software, and we are generally concerned with real-world performance. Within those constraints, Skylake-X chips still seem to be the superior platform for this kind of work.
Digital audio workstation performance
One of the neatest additions to our test suite of late is the duo of DAWBench project files: DSP 2017 and VI 2017. The DSP benchmark tests the raw number of VST plugins a system can handle, while the complex VI project simulates a virtual instrument and sampling workload.
We used the latest version of the Reaper DAW for Windows as the platform for our tests. To simulate a demanding workload, we tested each CPU with a 24-bit depth and 96-KHz sampling rate, and at two ASIO buffer depths: a punishing 64 and a slightly-less-punishing 128. We then added VSTs or notes of polyphony to each session until we started hearing popping or other audio artifacts. We used Focusrite’s Scarlett 2i2 audio interface and the latest version of the company’s own ASIO driver for monitoring purposes.
A very special thanks is in order here for Native Instruments, who kindly provided us with the Kontakt licenses necessary to run the DAWBench VI project file. We greatly appreciate NI’s support—this benchmark would not have been possible without the help of the fine folks there. Be sure to check out their many fine digital audio products.
In the DSP test, the Core i9-7900X and Threadripper 1950X are neck-and-neck at our most demanding settings, and the i7-7820X and 1920X are also locking horns. Relax the buffer depth, and the Threadrippers take the lead. One quirk of this test was that the Threadrippers were competitive with SMT on, but they performed best with SMT off. Some reading around suggests that SMT or Hyper-Threading can often be a source of reduced performance in audio applications, so the SMT-off numbers are the results we’re presenting for the DSP benchmark. We note this as something to be aware of if you’re considering a Threadripper for this kind of work. The Intel chips didn’t seem to mind either way.
The VI test isn’t as rosy for the AMD corner. While it’s hard to say for sure, we suspect this test tends to favor low memory latency, high cache bandwidth, single-threaded performance, and raw clock speed at our test settings. Skylake-X chips have higher performance in all of those areas, and it seems to show in our results. The dominating performance of the Core i9-7900X is no fluke in these tests, either. We retested it several times to be sure, but the gap remained.
Readers will rightly perk up at a gap like this one, and when we see such a gap between two otherwise closely-matched parts, we do our best to get to the bottom of it. We tried overclocking the Threadrippers, to no effect. We flipped SMT off, to no effect. At the suggestion of one reader, we even played with the number of active cores in Reaper, also to no effect.
More testing may be required to pinpoint the source of this bottleneck, but we aren’t alone in seeing it. Scan Pro Audio observed similar DAW performance with Threadripper, and the folks there have published an exhaustive analysis of why the chips might perform the way they do. The crux of the matter seems to be that if you can tolerate higher buffer depths (greater than 256), lower sampling rates, and ultimately higher latency, Threadripper may be competitive with Skylake-X chips for this type of workload. Using our demanding settings, however, the monolithic design of the Intel chips makes them superior—and in the case of the i9-7900X, far superior—for the most demanding virtual instrument fanatics.
Power consumption and efficiency
We now know that Ryzen Threadripper CPUs offer a ton of performance, but that’s not much good if their energy consumption causes their owners to run up a ton of kilowatt-hours on their power bills. We can get a rough idea of how efficient these chips are by monitoring system power draw in Blender. Our observations have shown that Blender consumes about the same amount of wattage at every stage of the bmw27 benchmark, so it’s an ideal guinea pig for this kind of calculation.
Since a joule is simply a watt expended over a second, we can use this convenient fact to estimate the total task energy in kilojoules expended over the course of each CPU’s Blender run. So that this data can be more readily understood, we’ve plotted it using one of our famous scatter charts. The most efficient chips complete the task in as little time as possible while expending the least energy, so the winner will be toward the origin of the chart.
Slicing up the data this way, the Threadripper 1950X finishes our Blender workload far faster than any other chip here. Despite its relatively high system power draw, the chip gets done fast, so its total task energy makes it the most efficient among our contenders. The Threadripper 1920X and Core i9-7900X both need a little longer to complete the job, but they’re consuming just as much power under load as the 1950X, so their efficiency is less favorable. The Core i7-7820X may take longer than the 1920X and i9-7900X to finish the job, but its total power consumption isn’t any higher than those chips. In fact, the i7-7820X is the efficiency winner compared to the Ryzen 7 1800X despite the Skylake chip’s higher power draw.
In any case, the Threadripper 1950X boasts impressive efficiency to go with its impressive performance. Its high power draw will be of minor concern for folks whose workloads are continuous, like gamers and streamers. Programmers and artists will find that their compiles and renders will finish both quickly and frugally, though.
Before we issue a verdict on the Ryzen Threadripper 1920X and Threadripper 1950X, it’s time once more to condense our results using our famous value scatter plots. We take the geometric mean of all of our real-world test results and plot that figure against the retail prices of the chips at hand.
The Ryzen Threadripper 1950X and Ryzen Threadripper 1920X are remarkable capstones for AMD’s CPU renaissance. For the same price as the Core i9-7900X, the 1950X can usually match and sometimes handily outperform the Skylake-X CPU in our real-world productivity benchmarks. In our measure of estimated task energy using the Blender bmw27 benchmark, the 1950X was both much faster and more frugal than the Skylake-X chip. That’s a heck of a way to break back into high-end desktops.
As the 1950X’s index in our full breadth of tests indicates, however, its sheer performance potential isn’t enough to let it take the absolute performance crown. That’s because of its somewhat split personality in digital audio workstation tasks. The 1950X performed quite well in my DAWBench DSP testing once I turned off SMT, but SMT or no, the 1950X seemed to hit a wall well before Skylake-X parts in our DAWBench VI testing. Even so, this is one of the only tests in our suite where the $1000 Core i9-7900X was clearly superior to the AMD competiton. Take DAWBench VI out of the equation (as we did in an alternate scatter above) and the results fall more where you might expect.
The Ryzen Threadripper 1920X, on the other hand, packs a more inconsistent performance wallop against the eight-core Core i7-7820X. In tasks like rendering and compiling code, the 1920X takes the lead. In other tasks, like transcoding and digital audio, the i7-7820X holds its own or beats out the 1920X. Whether those wins justify the 33% higher price tag of the 1920X will depend on your individual needs. Our overall performance index hides these nuances.
Gaming performance in our test configuration is another bright spot for Threadripper. Going by the performance potential suggested by average frames per second, Intel’s chips pull a bit ahead of Ryzen CPUs across the board, even at 2560×1440. In our 99th-percentile FPS measure of delivered smoothness, though, Threadrippers are on par with the Core i7-7900X. The frustratingly inconsistent i7-7820X should be considered an outlier at this stage. We’ll need to perform further testing on that chip to try and pinpoint why its 99th-percentile frame times don’t match its considerable performance potential.
Threadripper CPUs are only one half of the X399 platform, of course. Motherboards are equally important. Unlike my bumpy experience with the AM4 platform in its infancy, though, Socket TR4 and the X399 chipset seem quite mature already. Once I learned how to install a Threadripper CPU, I got up and running without any instability or unpleasantness from my Gigabyte X399 Aorus Gaming 7 mobo. I also didn’t experience any drama getting my 32GB quad-channel kit of DDR4-3200 RAM up and running on that board, another marked improvement in user-friendliness.
The virtues of X399 don’t stop there, either. It’s refreshing not to have to think about mind-bending PCIe lane-routing diagrams and ports going dark with different Threadrippers, since every TR4 CPU will function the same way with every X399 motherboard. Lights and bling and gaming pretenses of those boards aside, that ease of use will be important for semi-professional or pro users without precious time to waste.
Even if multi-GPU configurations are on the wane for gaming, X399 offers a unique canvas to folks who need gobs of PCIe lanes for multiple graphics cards, storage devices, capture cards, or fast network cards. Even if I have trouble imagining a use for every one of them, 60 PCIe 3.0 lanes from the CPU on an ATX motherboard isn’t a figure Intel’s Core X CPUs will be able to match this generation. The X299 platform can offer more chipset PCIe lanes to close the gap, to be sure, but those chipset lanes will be bottlenecked by the relatively paltry bandwidth of the DMI 3.0 link between CPU and PCH. Threadripper has enough PCIe lanes direct from the CPU to avoid this bottleneck.
Ultimately, recommending high-end CPUs isn’t an all-or-nothing process. First and foremost, folks just shouldn’t buy $1000 CPUs for gaming or lightweight desktop tasks. I imagine the people who know they need a $1000 chip are already scribbling out the necessary return-on-investment calculations for their particular workloads. Others will want to carefully consider whether their needs are best served by AMD’s copious PCIe lanes, high core count, and prodigious memory bandwidth, or by the slightly higher per-thread performance, the future potential of AVX-512 support, and the equally prodigious memory bandwidth of Intel’s Skylake-X parts.
If the particular applications you depend on do catch the waves Threadripper is making in high-end desktop performance, AMD has put together a platform that’s nearly flawless for surfing them. The cherries on top for enthusiasts and professionals alike are touches like ECC RAM support and soldered heat spreaders that simply aren’t available on Intel’s high-end desktop chips for now. The Ryzen Threadripper 1950X generally offers more for the money than the Intel competition, and that makes it a TR Editor’s Choice.