Intel’s Xeon D brings Broadwell to cloud, web services

Although Intel’s Xeon processors currently have an overwhelmingly dominant share of the market in servers and data-center applications, they stand to face competition from a host of new players in the coming months and years. Everyone from Applied Micro to Qualcomm has been making noise about providing ARM-based SoCs for servers, and most of them look to win business away from Intel by being more cost-effective and power-efficientβ€”or by tailoring a product for a specific niche.

Intel has become very aware of this threat, and it has countered by releasing products like its Atom-based Avoton processors for low-cost and low-power applications. That’s all well and good, but the truly effective counter-punch hits today in the form of a new Xeon processor lineup dubbed Xeon D.

The Xeon D incorporates the very latest technology from Intel into a single package intended for blade servers and other dense configurations with a single processor per computing node. The product itself isn’t really one chip; it’s two pieces of silicon sharing a common package. Almost everything important is on the main die, with the exception of legacy I/O, and the package itself should contain nearly everything needed for a complete server node in a compact footprint. In keeping with this mission, Xeon D processors will sport TDP ratings from “under 20W” to 45W, well below where the larger Xeon EP chips top out.

The simplified block diagram above shows the Xeon D’s basic setup. The primary chip is fabricated using Intel’s world-beating 14-nm process with tri-gate transistors. Its eight CPU cores are based on the Broadwell microarchitecture, and each core can track and execute two threads thanks to Intel’s SMT implementation, known as Hyper-Threading. Compared to the prior-gen Haswell core, Broadwell generally achieves about 5.5% higher instruction throughput per clock cycle thanks to a combination of microarchitectural tweaks and new instructions.

Each core is associated with 1.5MB of last-level cache. This cache is shared across all cores, so the Xeon D effectively has a 12MB unified L3 cache. That cache sits in front of a dual-channel memory controller capable of supporting either DDR4-2133 or DDR3L-1600 type memories. The memory subsystem supports four DIMMs and up to 128GB of capacity via registered modules (or 64GB via unbuffered DIMMs or SO-DIMMs).

The Xeon D has a bunch of high-speed I/O bandwidth on tap, most of it courtesy of 24 lanes of Gen3 PCI Express connectivity. These lanes can be aggregated into two big links, one x16 and one x8, or broken down into various smaller configs, with the finest being six separate PCIe x4 connections. Also integrated on chip are two Intel 10Gbps Ethernet MACs.

The main Broadwell D die shares a package with a separate south bridge chip that provides various forms of legacy I/O, including six SATA3 ports, eight lanes of PCIe Gen2, and USB. These slower external interfaces were likely easier to implement in a larger fabrication process.

In short, this thing is a 8C/16T processor with 12MB of L3 cache, 24 lanes of PCIe Gen3, and dual 10GigE links.

This chip isn’t just a re-purposed version of a mobile Broadwell part. The Xeon D includes a host of server class-features like ECC memory protection, and it incorporates innovations from Haswell-EP like per-core power states that don’t always make sense in client workloads. Since this is new silicon, the TSX errata has been corrected in its CPU cores, and the product fully supports TSX extensions for production use. The Broadwell architecture has a few other benefits over Haswell in this context. Among them are further reduced latencies for switching between virtual machines.

On the power management front, the Xeon D carries over big-ticket items from Haswell-EP like integrated voltage regulation and energy-efficient turbo. (In this latter feature, the CPU monitors the effectiveness of increased clock speeds using stall counters in the core. When added frequency doesn’t help reduce stalls, the CPU will reduce core clocks and shift the power budget elsewhere to improve overall performance.)

This chip adds one more power-related feature to the mix, something blandly named “hardware power management.” This optional feature allows the hardware to make decisions about which P-states and C-states the system should be using rather than taking its cues from the operating system. I don’t have many details about this mechanism yet, but I suspect it borrows from the Power Optimizer work Intel has done for its mobile Core products. If so, the implementation is surely tuned differently for server-class deployments.

I have to say, the Xeon D looks like the future of the Xeon lineup to me, the product poised to ship in the largest numbers overall once the market comes to understand it. Big cores like Broadwell are usually most power-efficient when operating at the lower end of their possible voltage and frequency ranges, and Intel’s latest process tech innovations have offered their biggest benefits at lower voltages. Dual-socket servers can run higher clock speeds, but those speeds come at less efficient operating points. 2P systems also tend to burn lots of power on their socket-to-socket interconnects.

The Xeon D should be excellent for providers of cloud and web services. I’d expect firms like Google and Facebook to snatch them up quickly. Intel also points to applications like web caching, storage, and networking as key for this product. The chipmaker expects the big data, HPC, and enterprise markets to stick with the Xeon EP. I suppose that makes sense, but I expect the Xeon D to be powerfully appealing in any case where a single application doesn’t require more memory or compute power than a Xeon D can supply in a single node. That kind of makes 2P the new 4P, if you follow my meaning.

Of course, Intel wins something else by introducing this product. The market climate just grew quite a bit more hostile for ARM-based server SoCs, which will have to justify themselves against a much more formidable x86-based incumbent.

We don’t yet have full pricing and specs on the various Xeon D models Intel will offer. We do know that the Xeon D-1450 will have eight cores with a base clock of 2.0GHz, an all-core Turbo peak of 2.5GHz, and a single-core Turbo peak of 2.6GHz. Meanwhile, the Xeon D-1520 will feature four cores with a 2.2GHz base frequency, 2.5GHz all-core Turbo, and a 2.6GHz single-core peak. Both chips should be available this month.

Intel has provided us with a few preliminary benchmark results for the Xeon D compared to Avoton. They show the Xeon D to be as much as 3.4x faster with up to 1.7x higher performance per watt. However, those numbers are based on pre-production hardware and look kind of shaky. I suspect we’ll see better numbers published in the coming weeks.

Comments closed
    • NeelyCam
    • 5 years ago

    [quote<]The Xeon D should be excellent for providers of cloud and web services. I'd expect firms like Google and Facebook to snatch them up quickly. [/quote<] [url=http://www.theregister.co.uk/2015/03/10/facebook_open_compute_yosemite/<]Facebook[/url<] already announced that they are using Xeon D on their new Open Compute platform.

    • raddude9
    • 5 years ago

    What about Avoton?

    Is there any official word about what is going to the line of chips formerly known as Avoton, i.e. the Atom C2750 and it’s Kin. They were some interesting chips, but a bit expensive looking and they didn’t seem to gain much traction, so I imagine they are destined for End-of-life status in view of these new Xeon soc’s. But anybody got any more info?

      • NeelyCam
      • 5 years ago

      A while ago Intel was talking about Denverton as the follow-up for Avoton. It’s still listed [url=http://www.intel.com/content/dam/www/public/us/en/documents/roadmaps/public-roadmap-article.pdf<]here[/url<] on page 12. Motley Fool's Ashraf Eassa talks about Denverton a bit [url=http://m.fool.com/investing/general/2015/03/10/intel-corporations-xeon-d-gets-official-looks-comp<]here[/url<] and [url=http://m.fool.com/investing/general/2015/03/10/applied-micros-x-gene-is-no-match-for-intel-corpor<]here[/url<]. Although there hasn't been much info from Intel about Denverton lately, he still believes it's coming.

        • WillBach
        • 5 years ago

        Charlie says that Intel confirms that Denverton is the custom Atom server SoC that’s yet to be released semiaccurate.com/2015/03/09/intel-re-enters-1s-server-market-xeon-d-1500-line/

    • terminalrecluse
    • 5 years ago

    There’s so much win here. I’m looking forward to building my home vm setup around this platform.

    • rika13
    • 5 years ago

    I can’t wait to see this monster in 2011 v3 flavor. AMD gonna get rekt.

    • cygnus1
    • 5 years ago

    I think I see these ending up in a lot of higher end NAS units. And I’m definitely looking forward to rebuilding my home servers with these. 10g Ethernet is on the way.

    • the
    • 5 years ago

    The people wondering what Intel could do die space similar to consumer chips but without integrated graphics now have an answer. These Xeon D chips look very nice on paper. Just need a motherboard maker to come out and produce a version for more mainstream consumption. It’d sit nicely between socket 1150 and socket 2011 boards. Granted these chips are all BGA parts but depending on price, this could be forgivable.

      • esc_in_ks
      • 5 years ago

      I was trying to figure out whether these are socketed. Where did you see that they are BGA only? The ARK page doesn’t have that information and I couldn’t find it in an article. But, being BGA only was what I figured anyway.

      I’ve not been thrilled with the Avoton boards that have come out in terms of the price-to-peformance ratio. Here’s hoping these are a bit more effective.

        • chuckula
        • 5 years ago

        They aren’t socketed. These chips are designed for very low profile motherboards that go into blades and other very-high density servers. You don’t swap out a CPU, you swap out a whole module at a time.

        You can see some photos here: [url<]http://anandtech.com/show/9070/intel-xeon-d-launched-14nm-broadwell-soc-for-enterprise[/url<]

    • yuhong
    • 5 years ago

    Can you confirm that Intel has changed their position on 8Gbit DDR3 and 16GB unbuffered DDR3 DIMMs?

    • balanarahul
    • 5 years ago
    • balanarahul
    • 5 years ago
    • balanarahul
    • 5 years ago

    Boring. Boring. Boring. Boring.

    [quote<]This feature allows the hardware to make decisions about which P-states and C-states the system should be using rather than taking its cues from the operating system. [/quote<] Now this is interesting. Details please.

    • UberGerbil
    • 5 years ago

    And coincidentally, today [url=http://anandtech.com/show/8357/exploring-the-low-end-and-micro-server-platforms<]AT published an interesting comparison[/url<] of "microservers" including benchmarks of the (64bit ARMv8-based) XGene1. Spoiler alert: the ARM camp has some work to do. I came away impressed by HP's "Moonshot" chassis, however. And HP hasn't done much to impress me in any product category for quite a while.

      • NeelyCam
      • 5 years ago

      And those comparisons were done with older 22nm Avoton and non-Broadwell Xeons. The 14nm ones are even more efficient. XGene1 was 40nm, though, so not really a fair comparison..

        • chuckula
        • 5 years ago

        Bear in mind that X-gene is optimistically promising a 50% power efficiency improvement for their next generation part. Assuming that’s true.. even their own predictions put the next generation part well behind even the Atoms at performance/watt.

        If these Xgene guys can’t even get a 28nm part that’s basically just an oversized cellphone chip to market in 2015, that says more about the rather lackluster state of the so-called ARM server market than it does about Intel “cheating” with a 22nm process that they’ve had in full production since 2012….

          • NeelyCam
          • 5 years ago

          Eventually, ARM-based server parts will be available at 14nm, once TSMC gets the process going (or once Samsung surprises everyone and announces a 14nm microserver chip built on their own process).

          It’s not about Intel “cheating” – they are ahead because of the process advantage, and because they’ve been doing server chips much longer than ARM entrants – e.g., AnandTech’s article points out the sophistication of Intel’s power management schemes as one reason why those chips are more efficient.

          But in 2-3 years, ARM-based server chips are built no more than one process node behind Intel, and most of the low-hanging power-management fruits have been picked at that point. Once the ARM chips are close to Intel’s in efficiency, the question becomes that of ecosystem. Intel has a couple of years to strengthen theirs, in order to prevent ARM to get a foothold in the market.

          Personally, I think [url=http://www.fudzilla.com/news/processors/37184-amd-hopes-15-of-servers-will-be-arm-based<]AMD[/url<] has the best chance to make a competitive ARM server chip. They have server expertise, partnerships and brand recognition in server space, and have been playing in the performance end of the spectrum much longer than any of the other ARM licensees.

        • WillBach
        • 5 years ago

        That and a lot depends on the changes going from V1 to V2 to V3 for the XGene. V1 wasn’t faster than Cortex-A15 so if thought they could go with their own architecture they’re either screwed or not showing their full hand. The unexpected wrinkles for them are Intel arriving more seriously in the micro server space than anticipated with the Broadwell server SoCs so they can’t only beat Atom and ARM upping their architecture game so that their custom cores may not be as competitive as previously projected.

    • UberGerbil
    • 5 years ago

    Oh, hey look, [url=http://ark.intel.com/compare/87039,87038<]specs on ARK[/url<].

      • raddude9
      • 5 years ago

      All of specs except for the most important, Price!
      Anybody got any indications as to how much this is going to cost?

        • yuhong
        • 5 years ago

        And the specs have errors too (no mention of DDR3). Hopefully they will be corrected.

        • chuckula
        • 5 years ago

        I found this chart:
        [url<]http://www.theplatform.net/wp-content/uploads/2015/03/intel-xeon-d-vs-e5-e3-table.jpg[/url<] An 8-core for $581 and a 4-core for $199. By server-world prices those are extremely reasonable and the CPU will only take up a comparatively small fraction of the total system cost*. However, the CPU price by itself doesn't mean much. These are high-density parts that are sold with motherboards (or in complete server packages). *For example... filling those servers with 128GB of DDR3 or (especially) DDR4 could be more than the price of the CPU.

          • blastdoor
          • 5 years ago

          wow, 8 core for $581 really is good!

            • smilingcrow
            • 5 years ago

            Xeon E5-2630 V3 has a tray price of $667 and works in dual socket systems.
            Released Q3/14
            8/16 C/T 2.4 – 3.2GHz 85W

            [url<]http://ark.intel.com/products/83356/Intel-Xeon-Processor-E5-2630-v3-20M-Cache-2_40-GHz[/url<]

            • chuckula
            • 5 years ago

            Note that the Xeon D parts also include the southbridge in that price and the southbridge consumption is acccounted for in the 45 watt TDP number. It’s a rather nice southbridge that includes two 10Gbit ethernet PHYs.

            • smilingcrow
            • 5 years ago

            Very different platforms, just pointing out that you can already get 8 cores for a lot less than the Xtreme edition $1k Haswell-E for desktops.

    • NTMBK
    • 5 years ago

    Smart product. Intel cut the legs out from under the ARM microservers before they even made it to market.

      • UberGerbil
      • 5 years ago

      And yet AMD microservers have been under development for over 5 years — Calxeda (then known as Smooth-Stome) uncloaked in 2010 IIRC, and was making big claims for years after that. Obviously they were premature (that was well before ARM had a 64bit ISA, let alone working IP blocks or silicon) but it’s clear this market isn’t as ripe for the picking as all the folks backing ARM seemed to think it was. And now Intel has just put a much deeper moat around it.

        • blastdoor
        • 5 years ago

        I wonder if maybe it’s a bad idea to announce products 5 years before they are ready to ship in a mature state.

          • UberGerbil
          • 5 years ago

          Technically they didn’t announce a [i<]product[/i<], just the category they were going to (try to) compete in. But when you announce you're attacking an already-existing category with a variation of an already-existing architecture, you're right: you're pretty much putting your cards on the table. I'm absolutely certain they thought they would have an actual product in less than five years, just like I'm certain they didn't expect to be "restructuring" the company before they shipped anything.

      • jihadjoe
      • 5 years ago

      But do the people want Intel’s D?

    • terminalrecluse
    • 5 years ago

    The fully working TSX is huge! Atomic access to data in memory without copying or locking will make multithreaded applications a lot better and probably easier to write and manage. Very, very exciting.

      • yuhong
      • 5 years ago

      They actually released the new Broadwell stepping sometimes ago.

      • WillBach
      • 5 years ago

      Kind of pedantic but borne from a love and excessive (for my current job) knowledge of the workings of HLE/TSX: there are still situations where data will be copied as it would have been traditionally, just without locking. If a line of memory is in Core 0’s L1 or L2 data cache (D$) and Core 1 accesses it with HLE or TSX the data line will be evicted from Core 0’s L1 and L2 D$ (but remain accessible in the shared L3 D$). If Core 0 uses that data again before Core 1 finishes (or even after) it’ll be copied back up to L1 D$. So there is *some* copying, just potentially without the overhead of traditional locks.
      Just to clarify for the sake of others reading this πŸ™‚

        • UberGerbil
        • 5 years ago

        But no copying in main memory, which is the really expensive kind. And no* locking, which can be orders of magnitude more expensive in the worst case. (And while you’re right, it is kind of pedantic nitpicking, the cache behavior of TSX is where all the interesting implementation lives so insights from people actually using it are certainly worth reading)

        * up to the transaction limits of the architecture, which seem pretty low at this point.

    • chuckula
    • 5 years ago

    Bear in mind that the closest competition to these chips from anybody is probably AMD… they are hoping to have 35 watt TDP ARM chips available some time later this year…

    Who wants to place bets on the relative performance and power efficiency between 8 Broadwell cores and 8 [s<]overclocked smartphone cores[/s<] A57 cores?

      • blastdoor
      • 5 years ago

      I think Broadwell would be the safer bet. I’m skeptical that anybody can beat Intel using off-the-shelf generic ARM cores.

      If AMD survives, maybe they can compete with the custom ARM cores they’re working on. I won’t be holding my breath, though.

      • maxxcool
      • 5 years ago

      Power efficiency ? slight edge to Arm. using off the shelf code… huge advantage to Intel.

      • NeelyCam
      • 5 years ago

      Depends on the process.. if AMD A57 cores are on 14nm, it’s close. If not, it’s not.

        • chuckula
        • 5 years ago

        Uh.. the A57 cores at reasonable clock speeds on Intel’s 14nm process* might consume somewhat less power at peak load.

        However, as we’ve clearly seen from the A57 architecture, the performance levels, the performance-per-watt numbers massively favor Broadwell.

        * Yeah, I mean a *real* 14nm process, not a powerpoint marketing slide from Samsung because they finally got finfets working on a limited set of smartphones that cost more than most ultrabooks…

          • NeelyCam
          • 5 years ago

          Define a “real 14nm process”…? Those Samsung finfets will start yielding sometime soon, and in the server market it doesn’t really matter that TSMC/Samsung 14nm area scaling might not exactly match Intel’s.

          [quote<]However, as we've clearly seen from the A57 architecture, the performance levels, the performance-per-watt numbers massively favor Broadwell. [/quote<] Link? Are you referring to Exynos? AnandTech said that Exynos had power management issues. And where have you seen an A57 vs. Broadwell performance-per-watt comparison?

Pin It on Pinterest

Share This