Intel Xeon E7-8894 v4 crunches 48 threads at up to 3.4 GHz

Take a break from fantasizing about the purported price-performance ratio of Ryzen to gawp with us at Intel's new Xeon. Designated the Xeon E7-8894 v4, the new chip sports 24 hyper-threaded Broadwell cores that start at 2.4 GHz and can boost all the way up to 3.4 GHz. The chips are designed to drop into servers with up to eight sockets and 24 TB of memory per node, and they're specced for a 165W TDP.

Intel says the E7-8894 v4 is its fastest Xeon E7 processor ever. Specifically, Intel says the E7-8894 v4 is "3.69x faster than the previous generation," although Intel curiously uses the Westmere Xeon E7-8870 as its frame of reference. That was a ten-core processor running at up to 2.80 GHz, so given the core count and clock rate improvements alone we're more or less at 3.69x.

Further comparisons in the press release are made against the Xeon E7-8890 v3—a Haswell chip with 18 cores and the same 165W TDP as the new processor—which is a much more apt comparison. Against that competition, the new chip's wins are more in the 25-50% range. When we're talking about the kind of massive number-mulching that these chips are capable of, gains like those are still pretty impressive.

Naturally, top-end performance commands top-tier pricing, and Intel's recommended price for the E7-8894 v4 is nearly the same as its model number: $8898. A full complement of these chips for an 8-socket node will start at over $71,000 for CPUs alone, nevermind the 24TB of memory. I would say I'd like to have such a machine, but frankly I have no idea what I'd do with it.

Comments closed
    • Kougar
    • 3 years ago

    For $8898 you guys are going to buy one and de-lid it just so we can measure the die size, right? Right?

    • kuttan
    • 3 years ago

    There is an AMD server chip called Naples with 32 cores having similar TDP.

    • lycium
    • 3 years ago

    I would do some crazy rendering with 8 of these chips in a computer… on the other hand, rendering is so parallel that it becomes way more cost effective to just build a bunch of computers and do it over the network.

    • Tristan
    • 3 years ago

    AMD Zen was created to compete with such Xeons, at first place. Ryzen is server core, just retrofited somewhat to be used in client CPU. The proof is that Zen cores are very small, and have some 30-40% less transistors (excluding cache) than Intel cores. Zen also have only 128bit SIMD FPU, while Skylake/Kabylake have 256 bit, again that was picked because server do not need strong FPU, and cores are smaller. If you want to put 32 cores on single die, then every square milimeter counts, and Naples CPU was real Zen project, while Summit Ridge is secondary target.

      • maxxcool
      • 3 years ago

      I am a uncrunched Dorritos kind of guy, I like my chips big…

      • kuttan
      • 3 years ago

      RyZen have 2 X 256bit FMA units which can work independently or combined if needed.

      [url<]http://www.fudzilla.com/media/k2/items/cache/ab52b7918cc21e5906593780d1cc7065_XL.jpg[/url<]

        • jts888
        • 3 years ago

        That pic is wrong/out of date on at least two significant points:[list<] [*<] Zen has 2 128b FMAC units, divided internally into 2 each of mul and add units [/*<][*<] The FP unit's register file goes through L1D, not sitting directly on L2 as shown.[/*<][/list<] Here's the most recent architectural slide set: [url<]https://cdn.arstechnica.net/wp-content/uploads/sites/3/2016/08/slide_2-980x551.png[/url<] [url<]https://arstechnica.com/gadgets/2016/12/amd-zen-performance-details-release-date/[/url<]

          • kuttan
          • 3 years ago

          The slide I posted above is the official AMD slide. It could be Zen architecture meant for server side and the slide you said is the consumer desktop Ryzen. Intel’s server chips supports 512 bit AVX whereas desktop version don’t.

            • jts888
            • 3 years ago

            There’s only one Zen dye this half of the year, codenamed Zeppelin. Server processors will be multi-chip-modules using Zeppelin and codenamed Naples. Raven Ridge APUs (4 Zen cores plus a Vega-based GPU) will be coming later.

            Your pic is from 2015, so it may have actually been accurate according to AMD’s plans then, but it doesn’t reflect reality now.

    • Anonymous Coward
    • 3 years ago

    Repeat after me… [i<]the cloud[/i<]. 128GB of RAM per core, thats crazy. Where I am renting my server time, 16GB per core is presented as the high memory option. Low memory is 4GB per core.

    • Krogoth
    • 3 years ago

    I’m kinda impressed that Intel managed to cram that much core logic onto a Socket 2011 package.

    • I.S.T.
    • 3 years ago

    I have literally no use for that many threads(Well, I might re-encode a few videos I got lying around to H.265… Japanese porn is usually like two gigs I mean the videos I shot of my dog playing in a park), but I still want one of these for the sheer novelty factor.

    • llisandro
    • 3 years ago

    Perfect timing! I’ve been meaning to put together a Minecraft Server!

      • Goty
      • 3 years ago

      I don’t think Minecraft server is multithreaded, actually (at least not extensively.) I don’t know, maybe that adds to the humor?

        • llisandro
        • 3 years ago

        😐

    • chuckula
    • 3 years ago

    When you are talking about the price of 24 TB of RAM being non-trivial, you aren’t joking.

    Here’s the best I could do from Crucial’s online calculator: [url<]http://www.crucial.com/usa/en/ct2k64g4lfq424a[/url<] 8 of those to the terabyte, multiplied by 24 gives us: [s<]$1800 * 8 * 24 = $345,600. So believe it or not, $71,000 in CPUs isn't even the biggest line item.[/s<] Oh wait, strike that, I found a sale: [url<]http://www.memory4less.com/samsung-128gb-ddr4-pc19200-m386aak40b40-cuc5?rid=90&origin=pla&gclid=CJr3j5bmg9ICFdgXgQodzOIKxg[/url<] That's only $1652 per 128 GB of RAM, so we have: $1652 * 8 * 24 = $317,184 per server. With that level of savings we can go back to griping at Intel! Good job everybody!

      • Waco
      • 3 years ago

      Ha, so it’d only be about $450k for the whole server, even loaded up with PCIe cards of various flavors.

      Hmm…

        • chuckula
        • 3 years ago

        Hey there’s a new president…. MWSFGA (make waco’s server farm great again)

          • Waco
          • 3 years ago

          Surprisingly, I have use cases that demand as much RAM as possible on a single node. 24 TB isn’t quite there, but a bit of creativity with swap and it’s getting there…

            • derFunkenstein
            • 3 years ago

            Sounds like the kind of thing 3D XPoint is made for, though not in the capacities/speeds [url=https://www.google.com/amp/s/www.techpowerup.com/226789/intel-8000p-the-first-consumer-grade-3d-xpoint-products%3Famp?client=safari<]currently advertised[/url<].

            • Waco
            • 3 years ago

            The latency is too high, sadly. I’m looking forward to it pushing down flash prices though. 🙂

            • derFunkenstein
            • 3 years ago

            Isn’t it supposed to make its way into DIMMs and replace DRAM for very specific server uses? I had been under the impression it’d drive latency down as density rises.

            • Waco
            • 3 years ago

            It’s still more than an order of magnitude slower in access latency. Even if they halved it from the first samples, it would still be an order of magnitude slower (7 microseconds now, .2 microseconds for RAM).

            • jts888
            • 3 years ago

            DRAM latencies are actually closer to .02 Ξs, counting just the DIMM response times at least.
            E.g., DDR4-2400 CL17 ≈ 14 ns for the first/critical word in a burst.

            I never followed up with what Optane released with (though I am aware that latency, durability, and density were all incredibly worse than initially promised), but it’s more than one order of magnitude discrepancy AFAIK.

            • Waco
            • 3 years ago

            I didn’t want to make a statement on the real latency, I just know it’s more than an order of magnitude. It may very well be nearly 2 orders at this point…

      • jts888
      • 3 years ago

      Wow, given the $/GiB markup that 64 GiB DIMMs had for so long, I’m frankly amazed to see 128 GiB modules selling for south of $2k.

        • chuckula
        • 3 years ago

        Yeah, bear in mind that I’m obviously doing this for fun and that you’d probably be stuck paying a bunch of extra markup through your server vendor to spec the RAM out in real life.

        It does put things in perspective as to what a “high end” server can mean (until you graduate to renting out a mainframe that is).

          • Waco
          • 3 years ago

          Just for funzies, I specced out a server with 12 TB of RAM.

          The 128 GB sticks through Dell (list price, which you generally can’t get much more than 25% off of) is…

          128GB LRDIMM, 2400MT/s, Quad Rank, x4 Data Width [$7,067.75]

            • yuhong
            • 3 years ago

            Yea, mostly based on TSV I think.

        • the
        • 3 years ago

        There is a small hike in prices when you go from ECC -> registered ECC and again from registered ECC -> load reduced ECC due to additional components. Now that DDR4 is mainstream, prices have finally equalized. 256 GB is the new high end and talk of 512 GB DIMMs are on the horizon.

        Then there are NVDIMMs which are promising capacities in the TB range.

        The amusing thing is that Broadwell-E has a 64 TB maximum memory supported due to the number of address lines supported. Some of the crazy NUMA systems with additional glue logic can hit that today.

      • ImSpartacus
      • 3 years ago

      What impresses me is that these CPUs can deal with 6 DIMMs per channel in a way that’s remotely efficient.

      Like each CPU only has enough controllers to do a quad channel thing. So you’re only rocking 32 channels all around, but you need 192 64GB DIMMs to get to 24TB.

        • jts888
        • 3 years ago

        You’re off by a factor of 2 on either the slot count or DIMM sizes. 24 TiB = 384 * 64 GiB = 192 * 128 GiB.

    • Mr Bill
    • 3 years ago

    [cough] Server Chip [/cough]

    • brucethemoose
    • 3 years ago

    Wow, 60MB of L3… That’s enough to hold relatively complicated programs entirely in cache.

      • UberGerbil
      • 3 years ago

      Yeah, the problem is the data. Which is why for many problem sets (aka “Big Data”), throughput becomes a limiting factor.

      • jts888
      • 3 years ago

      It’s not really safe to consider L3 pools uniform in the many-core era of today. While it’s more obvious to expect separate NUMA islands in MCM designs like Naples, even Intel has had multiple “cluster on die” domain support on their larger Xeons the last few generations.

      Striping data across fewer L3 slices and allowing duplicated cache lines on a package/die trades lower hit rates for better latency, which can be highly advantageous in some workloads. As core counts continue to rise and inter-cache tranfer times continue to grow, support for multiple NUMA islands per chip will increase as well.

        • chuckula
        • 3 years ago

        That’s not to mention the fact that caches in modern processors don’t act anywhere near like a standard contiguous memory buffer like normal RAM.

        The reason that Intel made a big deal about having the crystalwell cache in Skylake Iris graphics parts have flat address modes is that these types of cache normally don’t act like that.

          • jts888
          • 3 years ago

          What are you trying to say exactly? That Crystalwell eDRAM has flat latency characteristics, or that it maps cleanly on top of the processor’s physical address address space?

          I know that Broadwell had the eDRAM hanging off the L3 and acting as L4 whereas Skylake has it sitting between the system agent and the memory controllers as a sort of a-coherent high bandwidth buffer, but I would interpret that more as a “transparency” benefit than a “flatness” one.

            • chuckula
            • 3 years ago

            For crystalwell in Skylake it’s more the transparency plus the bonus feature that a large portion of the on-chip L3 cache no longer has to be used as tags for an L4 victim cache (which is why Broadwell-C parts have smaller L3 caches than you would expect). It gets confusing because the eDRAM is just that: a memory chip that at least in isolation can be directly addressed just like any other DRAM device, which is of course a much different structure than internal CPU caches.

            For flat addressing, I got crystalwell confused with the HMC memory in Knights landing that can be dynamically reconfigured to act as a cache or as flat addressable memory.

            • jts888
            • 3 years ago

            I think it would be NUMA snoop filter tables that could be removed in the new design, not tag/index entries, assuming it really still operates as an n-way cache of some sorts. Even though the new design would shunt more traffic through the SA (= more power and fractionally higher latency for local eDRAM access), it has to be a lot less wasteful in terms of NUMA overhead, where cache coherency with PCIe devices is the only possible beneficiary on a single socket platform anyway.

            As for your other point, I do expect emerging high bandwidth/same package memories to increasingly blur the lines between physical address space pools and local caches. I think Vega’s HBCC will almost certainly include a huge tag CAM, and though I’m rather surprised GP100 didn’t already try something similar, I strongly expect Volta will.

            • the
            • 3 years ago

            This is indeed one of the changes from Haswell/Broadwell to Sky Lake/Kaby Lake. The L4 eDRAM cache in Sky Lake is controlled by the memory controller directly and can only cache information from that NUMA node. This is very similar in concept to the Centaur L4 cache chip with POWER8.

            The implication here is that Intel may have at one point explored the idea of adding large L4 eDRAM caches to Xeons considering this change in cache topology.

            • Anonymous Coward
            • 3 years ago

            So “may have at one point explored” or perhaps is still working on it? Seems like there would be interested customers.

    • chuckula
    • 3 years ago

    Pfft…. nice try Intel.
    You won’t get a Ryze out of us with that thing.

Pin It on Pinterest

Share This