Memory subsystem performance
To get a sense of how Threadripper's quad-channel memory architecture affects performance in the move from the AM4 platform to X399, we rely on AIDA64's built-in memory benchmark suite.
Compared to the Ryzen 7 1800X and its two channels of memory, the Threadrippers both nearly double the 1800X's performance in writes and copies, but fall a bit short of that increase in reads. It's interesting to observe that bandwidth generally doesn't scale with core count for Threadripper, as it does to some degree for Skylake-X. Still, applications that found themselves memory-bandwidth-constrained on the AM4 platform get plenty more throughput to play with on X399.
We also tested memory latency using AIDA64's built-in benchmark. It should be noted that the above results are a worst-case scenario for latency, thanks to our choice to run the chip in its default Distributed Mode, or as a UMA node. AMD officially says near memory access will be around 78 ns on the near die for a given application and around 133 ns for the far die, for an average latency of about 105.5 ns. Our use of DDR4-3200 with fairly typical 15-15-15-35 1T timings cuts a few nanoseconds off that figure, but on average, it appears applications should expect considerably higher memory latency from Threadrippers in their default distributed mode versus Skylake-X and its mesh architecture.
Some quick synthetic math tests
AIDA64 offers a useful set of built-in directed benchmarks for assessing the performance of the various subsystems of a CPU. The PhotoWorxx benchmark uses AVX2 on compatible CPUs, while the FPU Julia and Mandel tests use AVX2 with FMA.
Normally, we would let these results pass without comment, but AIDA64's CPU Hash test gets a curious (and massive) speedup on Ryzen CPUs. That's because the Zen architecture has what seems to be little-publicized support for Intel's SHA Extensions. These extensions permit hardware acceleration of some of the SHA family of algorithms, and CPU Hash uses SHA-1 as its algorithm of choice. SHA-1 isn't particularly useful in practice any longer, but SHA-256 is, and the folks at SiSoft report similar speedups for that algorithm. AVX implementations of other SHA versions might help Intel processors close the gap, though.
The Threadripper 1950X's 16 cores seem to allow it to go toe-to-toe with the wider-but-less-numerous AVX FMA units in the i9-7900X in the Julia and Mandel tests. The 1920X is more or less on par with the i7-6950X and i7-7820X here, as well. If there's one spot where throwing more cores at the problem seems to have helped Threadrippers, this is it.
Now that we've seen how these chips stack up on a synthetic playing field, it's time to let them out of the corral and see how they chew through real-world work.