At its Next Horizon event today, AMD gave us our first look at the Zen 2 microarchitecture. As one of AMD's first 7-nm products, Zen 2 will be making its debut on board the company's next-generation Epyc CPUs, code-named Rome.
According to AMD CTO Mark Papermaster, TSMC's 7-nm process offers twice the density of GlobalFoundries' 14-nm FinFET process. It can deliver the same performance as 14-nm FinFET for half the power, or 1.25 times the performance for the same power, all else being equal.
AMD is using those extra transistors to improve the basic Zen blueprint in at least two major ways. Zen 2 has an improved front-end with a more accurate branch predictor, smarter instruction pre-fetch, a "re-optimized instruction cache," and a larger op cache than its predecessor.
AMD also addressed a major competitive shortcoming of the Zen architecture for high-performance computing applications. The first Zen cores used 128-bit-wide registers to execute SIMD instructions, and in the case of executing 256-bit-wide AVX2 instructions, each Zen floating-point unit had to shoulder half of the workload. Compared to Intel's Skylake CPUs (for just one example), which have two 256-bit-wide SIMD execution units capable of independent operation, Ryzen CPUs offered half the throughput for floating-point and integer SIMD instructions.
Zen 2 addresses this shortcoming by doubling each core's SIMD register width to 256 bits. The floating-point side of the Zen 2 core has two 256-bit floating-point add units and two floating-point multiply units that can presumably be yoked together to perform two fused multiply-add operations simultaneously.
That capability would bring the Zen 2 core on par with the Skylake microarchitecture for SIMD throughput (albeit not the Skylake Server core, which boasts even wider data paths and 512-bit-wide SIMD units to support AVX-512 instructions.) To feed those 256-bit-wide execution engines, AMD also widened the load-store unit, load data path, and floating-point register file to support 256-bit chunks of data.
At the system level, Zen 2 also represents a major change in the way Epyc CPUs are constructed. Only the CPU core complexes and associated logic will be fabricated on TSMC's 7-nm process. To talk to the outside world, next-generation Epyc packages will feature an I/O die bound to as many as eight Zen 2 "chiplets," for as many as 64 cores and 128 threads per package. This I/O die will contain memory controllers, Infinity Fabric interfaces, and presumably as much other "uncore" as AMD can get onto this cheaper, more mature silicon.