Intel’s Core X CPUs are here, and we’re kicking off this new era with the highest-end chip in the lineup so far: the Core i9-7900X. As it traditionally does for its high-end desktop platform, the company is repurposing silicon from its upcoming Skylake Xeons to serve as Skylake-X chips. That means some unusually large changes are in store for us enthusiasts as Skylake makes its transition from mainstream desktops to the data center.
The Core X family of CPUs needs a new socket, LGA 2066, and a new platform, called X299. We’ve already covered the Core X lineup and the X299 platform, as well as the entry-level Kaby Lake-X CPUs for that platform, in a dedicated article. The short take is that X299 is an evolutionary step forward from the X99 platform. It keeps that platform’s quad-channel memory support (out of a total of six on at least some Skylake-X dies) and pairs it with a chipset powered by a lot of the same DNA present in the Z270 platform. If you need to brush up on Core X before reading on, feel free.
Now, back to Skylake-X. The fundamental pipeline of this chip isn’t much different from the various Skylake and Kaby Lake desktop parts that we’ve known and loved for almost two years now. We never did a deep dive into the Skylake architecture, but compared to Haswell and Broadwell, the basic Skylake integer pipeline is wider and can have more going on at once. To prevent this wider engine from burning power on execution of the wrong instructions, Skylake also features a better branch predictor than Haswell and Broadwell, according to Intel.
That’s a gross oversimplification, of course, but it’s generally how Intel has improved its chips over the past few years. Let’s have a look at the big differences between mainstream and server Skylake CPUs now.
AVX-512 unhinges its jaw
The first big change in Skylake-X is support for the AVX-512 instruction set. These new instructions add important new capabilities to Intel’s SIMD implementation, including scatter-gather support, dedicated state and mask registers, and much, much more. To support this generation of AVX, the chip’s vector data registers are now twice as wide, and there are twice as many of them. These wider registers are fed with more load and store bandwidth. Skylake-X can now handle two 64-byte loads and one 64-byte store per cycle, compared to a single 64B load and a single 32B store per cycle in mainstream Skylake.
On top of its wider and more numerous registers, the Skylake-X core also has a dedicated AVX-512 fused multiply-add unit (FMA) on top of the pair of 256-bit-wide AVX FMA units in Skylake-S. This unit can only handle AVX-512 FMAs, and it resides on port five of the Skylake-X unified scheduler. Since the pair of 256-bit FMAs can execute a single AVX-512 FMA in parallel alongside the dedicated AVX-512 FMA unit, throughput for that common instruction is effectively doubled in the best case compared to mainstream Skylake.
While those performance improvements may sound impressive, real-world performance of AVX-512 has caveats. Firing up those monster SIMD units requires large amounts of power (and therefore produces more heat), so Skylake-X CPUs might be forced to clock below Intel’s specified Turbo speeds (or not Turbo at all) when executing AVX-512 instructions. That clock-speed tradeoff might result in lower-than-expected performance from AVX-512 code, and Intel says developers will need to be able to amortize the expected performance gain of their AVX-512 applications over time versus the clock-speed drop their code might incur. Mixed workloads with a small proportion of AVX-512 instructions in the overall mix are apparently not an ideal case for speedups from that SIMD hardware, either.
What’s more, not every Core X chip in the lineup will enjoy the same boost in SIMD performance from AVX-512. Only the Core i9 series of CPUs will ship with the dedicated AVX-512 FMA. The Core i7-7800X and Core i7-7820X will still have the wider registers for AVX-512, but they’ll only execute instructions using the pair of 256-bit AVX units common to all Skylake chips. This exercise in segmentation might surprise people expecting a uniform performance increase from AVX-512 across all the CPUs that support it. (The Kaby Lake-X Core i5-7640X and Core i7-7740X won’t support AVX-512 at all.)
Because of those caveats, we may be waiting a while for mainstream desktop applications that can really take advantage of all the extra parallelism on offer from these new instructions. Scientific-computing, deep-learning, and financial-services folks will probably be drooling for AVX-512, but regular Joes and Janes probably won’t see any major speedups until companies recompile their software (at the very least). That assumes AVX-512 is coming to mainstream Intel CPUs, as well.
Bigger L2 caches for better performance
Skylake-X also has a much different cache allocation per core compared to its mainstream counterparts. Instead of the relatively small 256kB L2 cache (or mid-level cache, in Intel parlance) in Skylake-S and Broadwell-E, each Skylake-X core enjoys a whopping 1MB of private L2. In support of AVX-512, the bandwidth between the L1 data cache and the L2 cache has been increased to 128 bytes per cycle. On top of that size increase, Intel quadrupled the associativity of the cache from four ways in to 16 ways in Skylake-X. Intel says the move to a larger private cache lets programmers keep usefully large data structures close to the core, and the result is higher performance. Pretty cut and dry.
Intel says it undertook this change because it felt its older architectures placed too much emphasis on data sharing through the L3 caches. In turn, Skylake-X’s architects reduced the shared last-level cache allocation to as much as 1.375MB per core, compared to as much as 2.5MB per core for Broadwell-E chips. This last-level cache isn’t inclusive of the L2 caches, and it serves as a victim cache for the L2. The tradeoff for this rebalancing of cache allocations is a higher L3 cache access latency, according to Intel.
Finally, Intel is abandoning the ring topology it’s used to connect CPU cores in its many-core CPUs for several generations. In place of its ring, the company is introducing a new (or at least new outside of the Knights Landing accelerator) mesh interconnect topolgy that promises several improvements. First off, Intel says its mesh interconnect delivers lower latency and higher bandwidth than the ring bus, all while operating at a lower frequency and voltage. Those last two characteristics are important, because they should result in less power consumption from the interconnect portion of the chip as it scales up.
Intel also says that the mesh design also allows it to include units like I/O, inter-socket interconnects, and memory controllers in a modular, scalable way as core counts increase. The company claims the distribution of these elements across the chip using the mesh minimizes undesirable “hot spots” of activity that might ultimately constrain cores’ access to those critical resources, limiting performance.
The mesh design should also offer a boon for applications that need to do a lot of inter-core communication. The last-level cache in Skylake-X is distributed across each core, and thanks to the more uniform access characteristics of the mesh, Intel claims that application developers no longer have to worry about non-uniform latencies when accessing data in those caches. Cores should also enjoy more uniform access characteristics when accessing the die’s I/O and memory controller, as well.
Previously, the shared L3 caches on a chip might have resided on different rings, requiring cores to communicate across the buffered switches formerly used to join discrete rings on the die. These switches added latency on top of that incurred by traversing the ring bus in the first place—something that Intel gave customers the opportunity to avoid in past chips with a “cluster-on-die” mode that turned each ring into something resembling a NUMA domain of its own. The mesh topology in Skylake-X should make headaches from the non-uniform distribution and access latencies of resources among rings a thing of the past.
As for characteristics of the Skylake-X silicon itself, Intel honchos clammed up when we asked about die size and transistor count. The company believes that disclosing this information will lead to unfounded conclusions from its competitors about the quality of their chip designs and process technologies compared to Intel’s. Only the paranoid survive, we suppose.
The Core i9-7900X and LGA 2066 in the flesh
We’ve already touched on the Core X CPU family and the X299 platform in depth, but it’s good to get an up-close look at the Core i9-7900X and Asus Prime X299-Deluxe motherboard that Intel has provided us for testing.
Outwardly, little has changed that would help us identify the Core i9-7900X at first glance. The eagle-eyed will note more rounded corners on the edges of the integrated heat spreader, but that’s about it.
Flipping these chips over reveals a dense forest of surface-mounted components, but it’s otherwise hard to notice the extra 55 lands on the Core i9-7900X. If you want to count, we’ll wait.
That outward similarity might lead one to believe that LGA 2011 and LGA 2066 chips are interchangeable among X99 and X299 motherboards, and the LGA 2066 socket doesn’t help. The dimensions of the socket are the same as those of LGA 2011, but chips for that socket absolutely will not work with LGA 2066. Don’t make an expensive mistake by eyeballing it. The only things that builders can carry over from LGA 2011 systems are their DDR4 kits and cooling hardware.
Intel sent us home with Asus’ ultra-ritzy Prime X299-Deluxe motherboard to host the Core i9-7900X. This board boasts everything one might want out of a high-end platform: four USB 3.1 Gen 2 ports, built-in 802.11ac and 802.11ad Wi-Fi, tasteful RGB LEDs, and a bevy of PCIe x16 slots. Asus also offers two M.2 slots, one of which allows gumstick SSDs to stand up vertically for better cooling. Even if both M.2 slots are occupied, the horizontal M.2 SSD can still enjoy plenty of heat-dissipation potential thanks to an integrated heatsink for the slot and chipset. This mobo even comes with a Thunderbolt 3 card and an add-on fan control board with several extra headers.
Our testing methods
As always, we did our best to collect clean test numbers. We ran each of our benchmarks at least three times, and we’ve reported the median result. Our test systems were configured like so:
|Ryzen 7 1800X
|Gigabyte Aorus AX370-Gaming 5
|16 GB (2 DIMMs)
|G.Skill Trident Z DDR4 SDRAM
|3866 MT/s (rated)
3200 MT/s (actual)
|Intel 750 Series 400GB NVMe SSD
|Intel Core i7-5960X
|Intel Core i7-6950X
|Intel Core i9-7900X
|Intel Core i9-7900X
|Intel Core i7-7700K
|Gigabyte GA-X99-Designare EX
|Asus Prime X299-Deluxe
|Gigabyte Aorus GA-Z270X-Gaming 8
|G.Skill Trident Z
|G.Skill Trident Z
|G.Crucial Ballistix Elite
|G.Skill Trident Z
|G.Skill Trident Z
|3600 MT/s (rated)
2666 MT/s (actual)
|3866 MT/s (rated)
3200 MT/s (actual)
|Corsair Neutron XT 480GB SATA SSD
|Samsung 960 EVO 500GB NVMe SSD
|Samsung 850 Pro 512GB SATA SSD
|Samsung 960 EVO 500GB NVMe SSD
They all shared the same common elements:
|2x Corsair Neutron XT 480GB SSD
1x Kingston HyperX 480GB SSD
|Nvidia GeForce GTX 1080 Ti Founders Edition
|Graphics driver version
|Windows 10 Pro with Creators Update
|Seasonic Prime Titanium 1000W
Thanks to Intel, Corsair, Kingston, Asus, Gigabyte, Cooler Master, G.Skill, and AMD for helping us to outfit our test rigs with some of the finest hardware available.
Some further notes on our testing methods:
The test systems’ Windows desktops were set at a resolution of 3840×2160 in 32-bit color. Vertical refresh sync (vsync) was disabled in the graphics driver control panel.
- For our Ryzen systems, we used the AMD Ryzen Balanced power plan included with the company’s most recent chipset drivers. We left our Intel systems on Windows’ default Balanced power plan.
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Memory subsystem performance
Since Skylake-X has such a drastically different cache allocation versus its predecessors, we simply had to run it through SiSoft’s bandwidth and latency tests. We’ll start with the Sandra utility’s cache and memory bandwidth test.
To examine the effect of the increased L2 cache size in an intuitive way, we started with the single-threaded test. To interpret these results, remember that the block size identifies which level of cache the test is exercising. From 2KB to 32KB on all of these CPUs, we’re in the L1 cache. Past that point, we’re generally in L2 for most of these CPUs out to 256KB for Haswell, Broadwell, and Kaby Lake, 512KB for Zen, and 1MB for Skylake-X. Finally, we spill over into L3.
The most interesting result in the chart above unsurprisingly ocurs between the 512KB mark and 1MB, where we’re testing most of these chips’ L2 caches. Thanks to their 1MB L2s, each Skylake-X core can sustain much higher bandwidths at the 1MB block size than the competition. Save for the Zen core, which is still in its L2 at 512KB, all of the older Intel chips are already into their L3 caches.
Sandra’s multithreaded cache and memory bandwidth test shows the combined bandwidth from every core on each chip. In turn, we have to account for the combined size of each cache level on each chip when mapping block sizes to the cache that we’re in at a given size. That’s because the same source block is being split among multiple cores. If that sounds complicated, it is, and there are a lot of moving parts to account for here between core counts and clock speeds. Still, this measurement is a good way of showing just how much data these chips can move around in total.
Because Sandra supports AVX-512 instructions, and because Skylake-X doubles the L1 cache load bandwidth available per core, we get much more throughput compared to older Intel CPUs at most block sizes. Since the test is still hitting the i9-7900X’s L1 cache from 64KB out to 256KB, for example, the increased load bandwidth for that cache lets the 7900X turn in incredible throughput increases across the chip. That trend continues as the test moves into the 7900X’s L2 cache, where it can move roughly 50% more data compared to the Core i7-6950X, itself no slouch. All told, the i9-7900X sets a high new bar for cache throughput—not an easy task in this company.
Next, let’s look at some tests of main memory bandwidth from the popular AIDA64 utility.
Interesting. Though the i9-7900X takes a commanding lead over the i7-6950X in AIDA64’s read tests with both DDR4-2666 and DDR4-3200 RAM, the Skylake-X chip falls slightly behind Broadwell-E in write bandwidth, and it needs DDR4-3200 to match the i7-6950X in the copy test. None of these chips are slouches, to be sure, but the results for the i9-7900X aren’t quite as eye-popping as they were in Sandra’s cache tests. Time to talk latency.
Sandra also offers a detailed benchmark for cache and memory latencies. To reduce the effect of prefetching on this result, we use the “in-page random” access pattern. Like the single-threaded cache bandwidth test above, this test isn’t multithreaded, so it’s easy to keep track of what cache is being measured at each block size.
As Intel suggested it would, the L2-L3 rebalancing on Skylake-X offers more bandwidth from its L2 caches at the same latency as before, at the cost of slightly higher latencies in the L3. Seems like a fine tradeoff to us, given the apparent benefits of the larger L2.
Compared to past Intel architectures, Skylake-X lags in main-memory access latencies. That increase may be offset somewhat by the potentially much higher hit rate in the L2 cache, though.
Some quick synthetic math tests
AIDA64 offers a useful set of built-in directed benchmarks for assessing the performance of the various subsystems of a CPU. The PhotoWorxx benchmark uses AVX2 on compatible CPUs, while the FPU Julia and Mandel tests use AVX2 with FMA.
PhotoWorxx doesn’t seem to benefit from the improvements in the Core i9-7900X, but the Skylake-X CPU enjoys a commanding lead in our tests of floating-point prowess. At least in these synthetics, the i9-7900X seems to justify its more-than-twice-the-price tag of the Ryzen 1800X. Let’s see if this strong opening translates into real-world performance now.
Turbo Boost Max 3.0 doesn’t go all the way toward eliminating the single-threaded performance gap we tend to see between mainstream and high-end Intel chips using the same architecture, but the feature does let the i9-7900X get most of the way there—and in Kraken, the Skylake-X chip does catch the i7-7700K. That’s good news for folks who like a responsive machine in both lightly-threaded and all-out workloads.
Compiling code in GCC
Our resident code monkey, Bruno Ferreira, helped us put together this code-compiling test. Qtbench records the time needed to compile the Qt SDK using the GCC compilers. The number of jobs dispatched by the Qtbench script is configurable, and we set the number of threads to match the hardware thread count for each CPU.
Bang. The i9-7900X neatly shaves 20% off the compile time of our previous champion, the Core i7-6950X, and it does it for hundreds of dollars less.
7-Zip file compression
Put the i9-7900X to work zipping and unzipping archives, and it delivers a small-to-moderate boost over its Broadwell predecessor. Memory speeds don’t seem to make the least bit of difference to the 7900X’s performance, though.
VeraCrypt disk encryption
VeraCrypt is a continuation of the handy TrueCrypt project. This disk-encryption utility lets us test throughput for both the hardware-accelerated AES algorithm and the implemented-in-software Twofish cipher.
The AES half of our VeraCrypt testing does appear to be memory-bound for once, so DDR4-3200 allows for a whopping 2.4 GB/s higher throughput. Twofish doesn’t enjoy any such speedup from faster memory, although the i9-7900X still handily pulls away from the i7-6950X.
The Cinebench benchmark is powered by Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. The test runs with a single thread and then with as many threads as possible.
In Cinebench’s single-threaded test, the i9-7900X once again proves its Turbo Boost mettle against the Core i7-7700K. Core for core, the Skylake-X chip also hands in a roughly 16% improvement over the Core i7-6950X. Pretty darn good considering the low-single-digit performance improvements we’ve come to expect from Intel’s generation-to-generation advances.
Blender is a widely-used open-source 3D modeling and rendering application. The app can take advantage of AVX2 instructions on compatible CPUs. We chose the “bmw27” test file from Blender’s selection of benchmark scenes to put our CPUs through their paces.
The i9-7900X carves up our test file in fine form with a 12.5% decrease in render time over its predecessor. The i7-6950X is probably looking at its original suggested price tag and sweating a bit by now. Once again, faster memory makes no difference in performance for the i9-7900X.
Handbrake video transcoding
Handbrake is a popular video-transcoding app that recently hit version 1.0. To see how it performs on these chips, we’re switching things up from past reviews. Here, we converted a roughly two-minute 4K source file from an iPhone 6S into a 1920×1080, 30 FPS MKV using the HEVC algorithm implemented in the x265 open-source encoder. We otherwise left the preset at its default settings.
Noticing a pattern yet? The i9-7900X completes this job in 14% less time than the i7-6950X needs, and its performance doesn’t vary with memory speed.
It’s been a while since we tested CPUs with picCOLOR, but we now have the latest version of this image-analysis tool in our hands courtesy of Dr. Reinert H.G. Mueller of the FIBUS research institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. In its current form, picCOLOR supports AVX2 instructions, multi-core CPUs, and simultaneous multithreading, so it’s an ideal match for the CPUs on our bench. Check out FIBUS’ page for more information about the institute’s work and picCOLOR.
picCOLOR offers an interesting shake-up in our results for once. Although the i9-7900X still totally outruns every other chip in our stable, this time it’s trailed by the Ryzen 7 1800X instead of the Core i7-6950X. Faster memory still doesn’t do squat for the 7900X, though.
Euler3D tackles the difficult problem of simulating fluid dynamics. It tends to be very memory-bandwidth intensive. You can read more about it right here. We configured Euler3D to use every thread available from each of our CPUs.
Interesting. Euler3D seems to lean on the memory subsystem in ways that are particularly amenable to high performance with the i7-6950X and not so much so for the i9-7900X. Not every workload benefits from Intel’s rebalanced cache hierarchy with Skylake-X, it seems.
Digital audio workstation performance
DAWBench is a popular addition to our CPU test suite, and we’re now working directly with the creator of DAWBench, Vin Curigliano, to refine our testing methods. As part of that collaborative effort, Vin provided us with a beta version of the DAWBench DSP 2017 benchmark. We’re leaving DAWBench’s virtual instrument test on the bench this time around, however, since these high-powered Intel CPUs tend to max out the benchmark and an updated version of the test isn’t quite ready yet.
DAWBench DSP 2017 relies on the freely-available Shattered Glass Audio SGA1566 VST plugin. We used the 64-bit version of this VST in our testing. DAWBench DSP lets us enable instances of this plugin until the session becomes unresponsive. We used Reaper as our host DAW for the test, and we monitored the project using a Focusrite Scarlett 2i2 interface with the company’s latest USB ASIO drivers. We set a 96 KHz sampling rate and used two ASIO buffer depths: a punishing 64 and a slightly-less-punishing 128.
As we’ve come to expect, the i9-7900X delivers a modest performance improvement in this test versus the Core i7-6950X. Strangely, the 7900X-and-DDR4-3200 pairing actually performs worse than its DDR4-2666-equipped configuration, though. We’ll have to monitor this test and see whether it’s a behavior that changes as the X299 platform matures.
A quick look at power consumption and energy efficiency
Skylake-X’s performance improvements wouldn’t be worth much if they came with a corresponding decrease in energy efficiency. We can get a rough idea of whether the Core i9-7900X is as efficient as it is fast by monitoring our test system’s power consumption at the wall with our trusty Watts Up power meter and estimating the total amount of energy it needs to complete a task. Our observations have shown us that Blender consumes about the same amount of power at every stage of the bmw27 benchmark we test with, so it’s an ideal guinea pig for this kind of calculation. First, though, let’s check idle and peak load power consumption numbers.
At idle, the X299 platform paired with the Core i9-7900X sips only a bit more power than the Core i7-7700K. Under load, however, the 7900X system consumes the most power at peak load by a wide margin. Of course, peak power draw only tells part of the efficiency story.
To really get a sense of how efficient the Core i9-7900X is, we need to take the task energy consumed over the course of our Blender benchmark into account. Not only does the Core i9-7900X cut 94 seconds off the Ryzen 7 1800X’s bmw27 render time, it does it while expending only just a bit more power to do so. That’s impressive performance per watt.
It’s time once more to sum up our results using our famous scatter plots. To spit out this final index, we take the geometric mean of each chip’s results in our real-world productivity tests, then plot that number against retail pricing gleaned from Newegg. Where Newegg prices aren’t available, we use a chip’s original suggested price.
Our value scatter tells the entire story of the Core i9-7900X: if your workload scales to many threads, this chip is generally the one to run it on. The server version of Skylake delivers an unusually large performance boost for a modern Intel CPU revision in many tasks. Core for core and thread for thread, the already-beastly Core i7-6950X can sometimes lag the 7900X in the range of 10% to 20%. All that oomph comes for a jaw-dropping $724 less than the 6950X’s initial suggested price, too. Competition is a wonderful thing.
In a milestone for Intel’s high-end desktop platform, the Core i9-7900X mostly ends the tradeoff between single-threaded swiftness and multi-threaded grunt typical of some older Intel high-end desktop chips. For lightly-threaded workloads, the i9-7900X’s improved Turbo Boost Max 3.0 behavior lets it trail our single-thread-favorite Core i7-7700K by only a few percentage points at most. In typical desktop use, then, the i9-7900X and its TBM 3.0-enabled brethren should feel about as snappy as their mainstream desktop cousins. I need to get the i9-7900X paired up with a GeForce GTX 1080 Ti soon to see whether that single-threaded performance translates to similar gameplay smoothness.
I’ll also need to explore Skylake-X overclocking in depth soon. Thanks to immature firmware and monitoring utilities, I didn’t feel comfortable pushing my 7900X too hard at this point. That said, I see a lot of promise for overclocking this chip. My test system made it to the Windows desktop without a hiccup at an astounding 4.7 GHz on all cores, and thermal limits seem as though they’ll be the primary obstacle to fully exploiting that speed. I’ll explore Skylake-X’s overclocking potential more once the X299 platform has had a bit more time in the oven (and once I’ve picked up some seriously beefy cooling hardware in the meantime).
We’ve always been loath to recommend the top-end CPU in Intel’s high-end desktop family (and yes, that is this chip for the moment). Despite the Ryzen-inspired price reshuffling that’s coming with Core X, the i9-7900X still isn’t a great value. The star of the Core X lineup may actually be the Core i7-7820X, whose eight cores and 16 threads have clocks similar to those of the i9-7900X. You may lose a couple of cores in the bargain, but even so, the i7-7820X should perform better than a Ryzen 7 1800X for not that much more money. We hope to play with one of these more attainable Skylake-X CPUs soon.
Of course, the performance of the Core i9-7900X is beyond question: it’s the fastest single-socket CPU we’ve ever tested. The X299 platform may need a little polish yet to let Core X chips really shine, but the performance bar the i9-7900X is already setting promises an exciting standoff this summer as AMD prepares its Ryzen Threadripper CPUs for launch. If you need as many cores and threads as possible from your desktop, times have never been more exciting. Stay tuned as we see whether the i9-7900X has got game.