Single page Print

Memory subsystem performance
Since Skylake-X has such a drastically different cache allocation versus its predecessors, we simply had to run it through SiSoft's bandwidth and latency tests. We'll start with the Sandra utility's cache and memory bandwidth test.

To examine the effect of the increased L2 cache size in an intuitive way, we started with the single-threaded test. To interpret these results, remember that the block size identifies which level of cache the test is exercising. From 2KB to 32KB on all of these CPUs, we're in the L1 cache. Past that point, we're generally in L2 for most of these CPUs out to 256KB for Haswell, Broadwell, and Kaby Lake, 512KB for Zen, and 1MB for Skylake-X. Finally, we spill over into L3. 

The most interesting result in the chart above unsurprisingly ocurs between the 512KB mark and 1MB, where we're testing most of these chips' L2 caches. Thanks to their 1MB L2s, each Skylake-X core can sustain much higher bandwidths at the 1MB block size than the competition. Save for the Zen core, which is still in its L2 at 512KB, all of the older Intel chips are already into their L3 caches.

Sandra's multithreaded cache and memory bandwidth test shows the combined bandwidth from every core on each chip. In turn, we have to account for the combined size of each cache level on each chip when mapping block sizes to the cache that we're in at a given size. That's because the same source block is being split among multiple cores. If that sounds complicated, it is, and there are a lot of moving parts to account for here between core counts and clock speeds. Still, this measurement is a good way of showing just how much data these chips can move around in total.

Because Sandra supports AVX-512 instructions, and because Skylake-X doubles the L1 cache load bandwidth available per core, we get much more throughput compared to older Intel CPUs at most block sizes. Since the test is still hitting the i9-7900X's L1 cache from 64KB out to 256KB, for example, the increased load bandwidth for that cache lets the 7900X turn in incredible throughput increases across the chip. That trend continues as the test moves into the 7900X's L2 cache, where it can move roughly 50% more data compared to the Core i7-6950X, itself no slouch. All told, the i9-7900X sets a high new bar for cache throughput—not an easy task in this company.

Next, let's look at some tests of main memory bandwidth from the popular AIDA64 utility.

Interesting. Though the i9-7900X takes a commanding lead over the i7-6950X in AIDA64's read tests with both DDR4-2666 and DDR4-3200 RAM, the Skylake-X chip falls slightly behind Broadwell-E in write bandwidth, and it needs DDR4-3200 to match the i7-6950X in the copy test. None of these chips are slouches, to be sure, but the results for the i9-7900X aren't quite as eye-popping as they were in Sandra's cache tests. Time to talk latency.

Sandra also offers a detailed benchmark for cache and memory latencies. To reduce the effect of prefetching on this result, we use the "in-page random" access pattern. Like the single-threaded cache bandwidth test above, this test isn't multithreaded, so it's easy to keep track of what cache is being measured at each block size. 

As Intel suggested it would, the L2-L3 rebalancing on Skylake-X offers more bandwidth from its L2 caches at the same latency as before, at the cost of slightly higher latencies in the L3. Seems like a fine tradeoff to us, given the apparent benefits of the larger L2.

Compared to past Intel architectures, Skylake-X lags in main-memory access latencies. That increase may be offset somewhat by the potentially much higher hit rate in the L2 cache, though.

Some quick synthetic math tests
AIDA64 offers a useful set of built-in directed benchmarks for assessing the performance of the various subsystems of a CPU. The PhotoWorxx benchmark uses AVX2 on compatible CPUs, while the FPU Julia and Mandel tests use AVX2 with FMA.


PhotoWorxx doesn't seem to benefit from the improvements in the Core i9-7900X, but the Skylake-X CPU enjoys a commanding lead in our tests of floating-point prowess. At least in these synthetics, the i9-7900X seems to justify its more-than-twice-the-price tag of the Ryzen 1800X. Let's see if this strong opening translates into real-world performance now.