So there is an increasingly solid set of rumors about the next generation of Skylake Xeons for workstations that is leaking in various locations including a recent post over in the Anandtech forums.
In brief: There's a "Gold" series that goes up to 22 cores (18 cores with the highest clockspeeds) and the "Platinum" series that goes all the way up to 28 cores (although you can buy a cut-down version with a mere 24 cores too). I have no direct confirmation, but I strongly suspect that at least the "Gold" series is designed for the new LGA-2066 socket with quad-channel RAM. That's the "smaller" socket and a direct successor to LGA-2011. As for the Platinum series, I'm on the fence as to whether these chips are also on LGA-2066 or maybe they are designed for the big-time LGA-3647 socket that's shared by the very high-end parts like the 32 core Skylake EP chips.
I would expect the lower core counts to appear on LGA-2066 due to bandwidth concerns with the high core count models. By lower core counts, I'm implying <20 cores here for LGA 2066. I think all of these are going to be LGA 3647 with the LGA 2066 parts taking the Silver title. Bronze I presume would be Xeon D or replace the E3 line up.
As for what distinguishes Platinum and Gold would be the usage of memory buffers and 8+ socket support on the Platinums. This mirrors the difference between E5 and E7 today.
A few interesting points:
1. There have been rumors about a radically different cache hierarchy in these chips floating around since at least January. I'm not talking about the assumption that the L2 cache would go to 512KB/core, I'm talking something much more unusual. At this stage these are still rumors but I've seen enough consistent information online to at least play around with the possibility that they are real.
First: Each core now has not 256KB, not 512KB, but 1MB for the L2 cache.
Second: Each slice of the L3 cache is now only 1.375MB compared to 2.5MB in Broadwell Xeon & Broadwell HEDT parts.
Third: Putting those two numbers together on a core-for-core basis actually gives you a total L2 + L3 cache pool that's approximately 15% smaller that what you would get on an existing Broadwell Xeon (or a Broadwell-E HEDT part). To put things into perspective, an 8 core RyZen part has a total of 20 MB of L2 (4 MB) and L3 (16 MB) cache. The equivalent 8-core Skylake Xeon Gold/Platinum part would only have a total of 19 MB (8MB L2 + 1.375 * 8 = 11 MB of L3).
Fourth: I have a few theories but with this cache hierarchy it looks like Intel has abandoned a fully-inclusive L3 cache that maintains a complete copy of the contents of the L2 cache. Either that or the L3 on these chips actually stores an extremely small amount of data outside of what's already in the L2 caches...
Going to 1 MB of L2 cache is a very big change. I was expecting 512 KB since they halved the associativity in the 256 KB consumer SkyLake. This would imply a 16 way associative design. I'm think the more likely scenario for the L3 cache is that Intel simply ran out of die space and scaled it back. More L2 cache per core, AVX-512 per core, higher core counts and six DDR4 memory channels per die imply a very large die for the 32 core part.
The new cache design could tie into a new on-die fabric. Intel's old ring bus topology is showing its age with these high core counts. A 2D torus (aka grid) topology would cut down on latency for on-die cache-to-cache coherency. This is where I see the big performance gains from the new cache topology.
The inclusive nature of the caches helped keep coherency sane as only the L3 cache had to be queried for data. With so little L3 cache, it wouldn't make sense to keep it inclusive but going exclusive adds considerably complexity to coherency. More so if they go to a new on-die fabric.
Having less cache than Ryzen is odd it may not be an issue dependent upon the topology. Ryzen can have duplicate entries on-die between the two clusters. If Intel has low enough latency to not need duplication, then the capacity difference isn't going to an issue.
2. Notice how there's an 18 core Xeon Gold 6154 CPU with a 200 watt(!) TDP in there? Now look down the list to the 18 Core Xeon Gold 6134 with the higher base clock speed but the rather pedestrian 130W TDP... At first you might be thinking that can't be right, but there's a reason for it: it appears that some Xeon Gold parts don't have AVX-512 activated while others do have it turned on. One major factor affecting the maximum theoretical power consumption beyond the core counts and the clockspeeds is AVX-512.
Not sure if AVX-512 is going to be the big power dictator. Granted AVX does have an impact on clock speeds but not that
much for that
much power. What doesn't make sense is the Gold 6150 consuming nearly twice as much power at a lower clock speed than the Gold 6136 despite having the same core count.
Intel has been toying with additional on-package logic which would account for some of the power difference. SkyLake-EP is going to have several interesting options like on-package FPGA or Omnipath fabric. This could also mean why Intel is going the color route to help distinguish what is what is what in terms of model naming. Secret decoder ring still required.
3. All of these parts, while clearly capable of being run in servers, at least have the branding of workstation-type CPUs and not traditional Skylake server parts. These parts are clearly not merely Skylake-EP, which incidentally goes up to 32 cores, "only" has a TDP of up to 165 watts or so, and is already publicly available in a limited manner via Google Cloud.
I wouldn't use the Google Cloud chips as a prime example: those are custom SKU specific to Google. Amazon and Microsoft get similar customer-only SKUs for their data centers too.
Intel has been consolidating their socket and chipset infrastructure. The EP and EX line ups are to share a common socket at the high end but the EX line was to continue to utilize the memory buffers. (Note that Intel could do an IBM thing here and include L4 cache with the buffers. There have been some very specific cache changes for consumer Sky Lake's eDRAM cache on GT3e/GT4e parts that would make more sense here.)
Similarly Intel is purportedly going to be using the same chipset interface so the IO hub from the Z270 chipset can be used (or rather its server version). The only thing missing there for servers are 10 Gbit Ethernet options which would hang off a CPU socket anyway for both latency and bandwidth reasons.