Intel's Core X CPUs are here, and we're kicking off this new era with the highest-end chip in the lineup so far: the Core i9-7900X. As it traditionally does for its high-end desktop platform, the company is repurposing silicon from its upcoming Skylake Xeons to serve as Skylake-X chips. That means some unusually large changes are in store for us enthusiasts as Skylake makes its transition from mainstream desktops to the data center.
The Core X family of CPUs needs a new socket, LGA 2066, and a new platform, called X299. We've already covered the Core X lineup and the X299 platform, as well as the entry-level Kaby Lake-X CPUs for that platform, in a dedicated article. The short take is that X299 is an evolutionary step forward from the X99 platform. It keeps that platform's quad-channel memory support (out of a total of six on at least some Skylake-X dies) and pairs it with a chipset powered by a lot of the same DNA present in the Z270 platform. If you need to brush up on Core X before reading on, feel free.
Now, back to Skylake-X. The fundamental pipeline of this chip isn't much different from the various Skylake and Kaby Lake desktop parts that we've known and loved for almost two years now. We never did a deep dive into the Skylake architecture, but compared to Haswell and Broadwell, the basic Skylake integer pipeline is wider and can have more going on at once. To prevent this wider engine from burning power on execution of the wrong instructions, Skylake also features a better branch predictor than Haswell and Broadwell, according to Intel.
That's a gross oversimplification, of course, but it's generally how Intel has improved its chips over the past few years. Let's have a look at the big differences between mainstream and server Skylake CPUs now.
AVX-512 unhinges its jaw
The first big change in Skylake-X is support for the AVX-512 instruction set. These new instructions add important new capabilities to Intel's SIMD implementation, including scatter-gather support, dedicated state and mask registers, and much, much more. To support this generation of AVX, the chip's vector data registers are now twice as wide, and there are twice as many of them. These wider registers are fed with more load and store bandwidth. Skylake-X can now handle two 64-byte loads and one 64-byte store per cycle, compared to a single 64B load and a single 32B store per cycle in mainstream Skylake.
On top of its wider and more numerous registers, the Skylake-X core also has a dedicated AVX-512 fused multiply-add unit (FMA) on top of the pair of 256-bit-wide AVX FMA units in Skylake-S. This unit can only handle AVX-512 FMAs, and it resides on port five of the Skylake-X unified scheduler. Since the pair of 256-bit FMAs can execute a single AVX-512 FMA in parallel alongside the dedicated AVX-512 FMA unit, throughput for that common instruction is effectively doubled in the best case compared to mainstream Skylake.
While those performance improvements may sound impressive, real-world performance of AVX-512 has caveats. Firing up those monster SIMD units requires large amounts of power (and therefore produces more heat), so Skylake-X CPUs might be forced to clock below Intel's specified Turbo speeds (or not Turbo at all) when executing AVX-512 instructions. That clock-speed tradeoff might result in lower-than-expected performance from AVX-512 code, and Intel says developers will need to be able to amortize the expected performance gain of their AVX-512 applications over time versus the clock-speed drop their code might incur. Mixed workloads with a small proportion of AVX-512 instructions in the overall mix are apparently not an ideal case for speedups from that SIMD hardware, either.
What's more, not every Core X chip in the lineup will enjoy the same boost in SIMD performance from AVX-512. Only the Core i9 series of CPUs will ship with the dedicated AVX-512 FMA. The Core i7-7800X and Core i7-7820X will still have the wider registers for AVX-512, but they'll only execute instructions using the pair of 256-bit AVX units common to all Skylake chips. This exercise in segmentation might surprise people expecting a uniform performance increase from AVX-512 across all the CPUs that support it. (The Kaby Lake-X Core i5-7640X and Core i7-7740X won't support AVX-512 at all.)
Because of those caveats, we may be waiting a while for mainstream desktop applications that can really take advantage of all the extra parallelism on offer from these new instructions. Scientific-computing, deep-learning, and financial-services folks will probably be drooling for AVX-512, but regular Joes and Janes probably won't see any major speedups until companies recompile their software (at the very least). That assumes AVX-512 is coming to mainstream Intel CPUs, as well.