Qualcomm announced that it's begun commercial shipments of its Centriq 2400 family of server CPUs today. Fabricated on Samsung's 10-nm FinFET process, the Centriq 2400 die crams a whopping 18 billion transistors into an area of 398 mm². Centriq CPUs will have as many as 48 single-thread, 64-bit-only ARM v8-compliant cores (a custom design that Qualcomm calls Falkor). Those cores will have 2.2 GHz base frequencies and peak frequencies of up to 2.6 GHz on the Centriq 2460 product.
Each Falkor core has a 64 KB L1 instruction cache paired with what Qualcomm calls a "24 KB single-cycle L0 cache" designed for low-power operation, for a total of 88 KB of I-cache per core. Those instruction caches sit alongside a 32-KB L1 data cache with a three-cycle load-use latency. Qualcomm pairs two of these cores in 24 "duplexes" across the chip. Each duplex has a shared 512 KB of L2 cache, for a total of over 12 MB of L2 across the chip.
The Centriq 2400 SoC makes available up to 60 MB of shared L3 cache to those cores on a fully-coherent, bi-directional multi-ring interconnect with more than 250 GB/s aggregate bandwidth, and its memory controller offers as many as six channels of DDR4 memory running at speeds up to 2667 MT/s. Maximum memory per SoC tops out at 768 GB.
Three Centriq 2400 SoCs are launching today: the Centriq 2460, the Centriq 2452, and the Centriq 2432. Clock speeds will remain largely the same across these three chips; Qualcomm is instead relying on core count and L3 cache tweaks to set these products apart from one another. The two highest-core-count Centriq CPUs will carry 120W TDPs, while the Centriq 2432 and its 40 cores will have a 110W TDP. Perhaps even more interesting is how Qualcomm is positioning these chips: the Centriq 2460 against the 205W Xeon Platinum 8180, the Centriq 2452 against the 140W Xeon Gold 6152, and the Centriq 2434 against the 85W Xeon Silver 4116.
Qualcomm claims a number of impressive-sounding performance wins for the Centriq 2400 family in its press materials compared to those Xeon processors, but as usual, I'm skeptical of any estimated or otherwise projected performance results in that context, especially because ARM's own Drew Henry touts the ISA's future in the data center as something other than a mere alternative to x86 CPUs. My sense is that Xeons and their high-performance computing DNA are not natural comparisons for ARMv8 cores.
Instead, my gut tells me that putting a ton of ARM v8-compatible cores on a single SoC will be good for both absolute performance and performance-per-watt for the exact type of workloads where Qualcomm expects the Centriq 2400 family to excel: highly-threaded applications, microservices and containers, and any application that needs a "scale-out" platform, i.e. lots and lots of cores per node and per rack.
Qualcomm is positioning the Centriq 2400 as a key part of the 5G equation, where a titanic number of high-speed wireless devices will need to communicate with large numbers of cloud compute resources positioned near the edge of future networks. Qualcomm claims it already has a wide range of cloud-services providers, software partners, and OEMs under its umbrella as it readies for that future. I'll be keeping a close eye on the role Centriq plays in data centers as it joins a freshly-competitive Intel and AMD in that space.