Intel unveils native eight-core Nehalem-EX

Alas, poor Itanium; it may not be long for this world. At the very least, one would think so after hearing what Intel said today about Nehalem-EX, its upcoming (and fastest yet) multi-socket x86 processor.

Nehalem-EX is based on the same underlying technology as the Core i7 and the Xeon 5500 series, but it’s supercharged in almost every way. Intel has outfitted it with eight cores, 16 threads, 24MB of shared cache, four QuickPath links, four memory channels, and support for up to 16 memory modules per socket (with the help of external Intel Scalable Memory Buffer chips). Most impressively, Intel has crammed all of the above into a single, massive die made up of 2.3 billion transistors. Yikes.

In a four-processor server, the chip’s four QuickPath links allow it to talk directly to any one of the other three CPUs. The quadruple QuickPath links also enable eight-way configs with a whopping 128 threads (that’s eight times eight cores with two threads per core, thanks to the magic of Hyper-Threading):

And that’s not all. Intel says Nehalem-EX has even inherited some key reliability features from its Itanium cousins:

Nehalem-EX will add new reliability, availability and serviceability (RAS) features traditionally found in the company’s Intel® Itanium processor family, such as Machine Check Architecture (MCA) Recovery. Together with new levels of performance, both high-end processors should speed the move away from more expensive, proprietary RISC-processor based systems.

Incidentally, Intel’s presentation included some numbers showing how Nehalem-EX stacks up against the company’s previous multi-socket x86 processors, the six-core Xeon 7400 series (a.k.a. Dunnington). According to the company’s internal tests, Nehalem-EX delivers up to nine times the memory bandwidth, 2.5 times the database performance, 1.7 times the integer throughput, and 2.2 times the floating-point throughput.

So, when can you get some Nehalem-EX systems delivered to your server farm? Intel says the CPU will enter production "in the second half of 2009," with more than 15 eight-socket designs coming from eight different server makers.

Comments closed
    • ish718
    • 10 years ago

    Edit: Reply to #97

      • UberGerbil
      • 10 years ago

      Current widely-used 64bit OSes aren’t even making use of the full 64bit address space (the AMD and Intel programming guides talk about “cannonical” addresses for virtual memory, which are 48 bits — though that can be expanded when necessary), nor is the hardware equipped to handle it (current 64bit CPUs are designed for various amounts of physical memory — 36bits, 40bits, 52bits — but none of them go all the way to 64bits). Even big iron — with a few fairly experimental exceptions — makes do with “mere” 64bit OSes (full 64bit virtual and physical address space) without running into painful limits. Back in the PC world, we’ve only recently started bumping into the limits of 32bits, and 64bits isn’t twice as much as 32bits, it’s ~4 billion times as much. Even Microsoft can’t bloat software that much in a decade.

      That’s address space, which is what most reasonable people would mean by the “bittedness” of an OS. There’s also register size. Current CPUs operate on 128bit chunks of data today using the XMM registers with SSE; that will be expanded to 256bits with SSE5. And Larrabee will operate on 512bits at a time. But all of those processors are still 64bit, and don’t require anything more than a 64bit OS (in fact, SSE code today is operating on 128bit chunks in a 32bit OS).

      In any case, I don’t see the connection between the increase in core count from 4 to 8 and speculation about future OSes with larger address space.

    • cybot_x1024
    • 10 years ago

    At this rate we may start seeing 128bit operating systems popping up

      • UberGerbil
      • 10 years ago

      Why would you think so?

      • swaaye
      • 10 years ago

      You see 16 exabytes of RAM as a limitation we’re fast approaching? 😉

    • travbrad
    • 10 years ago

    And it still can’t run Crysis 😉

      • DrDillyBar
      • 10 years ago

      Maybe with the DirectX 10 software rasterizer.

        • Meadows
        • 10 years ago

        At 9 fps with medium settings.

        • ish718
        • 10 years ago

        +1
        Meadows beat me to it.

    • maroon1
    • 10 years ago

    24MB L3 cache and four memory channels !!

    It looks to me that this update is much more that adding four cores

    • Meadows
    • 10 years ago

    The last picture shows why we don’t use glossy screens.

      • MadManOriginal
      • 10 years ago

      It shows why you don’t place an LCD, regardless of screen coating, directly opposite a window. A matte LCD would just show fuzzy washed out areas instead of sharp reflection washed out areas.

        • Meadows
        • 10 years ago

        Sharp reflections are still worse, as far as your focus is concerned.

          • MadManOriginal
          • 10 years ago

          No they’re not, in fact it can be the opposite. Glossy allows your plane of focus to be on the image itself rather than the reflection or antiglare coating. It’s not something you can ever tell just by looking at pictures of monitors though. Maybe some people lack this ability.

          In any case it just shows a poorly set up LCD as I first said, not anti-glossy FUD you like to spread. An antiglare screen in the same picture would have just have fuzzy reflections on its antiglare coating rather than sharp ones, there would still be something covering the image so the screen coating isn’t the issue, it’s putting an LCD, any LCD, directly opposite a window.

          It’s just personal preference, neither one is better in absolute terms.

            • Nomgle
            • 10 years ago

            Are you really trying to say you prefer the bright, sharp, ULTRA-distracting reflections you see on a glossy screen, to the dim, unfocussed, barely-visible bright patches you see when light shines on a matt screen ?

            Seriously ?

            Meadows is absolutely right – reflections *[

            • khands
            • 10 years ago

            For me, it depends on how bad it is, I can almost subconsciously filter out pure reflections unless it’s incredibly glaring (like in the picture) of glossy screens, but I can’t do the same with Matte, I have to think about it. Maybe it’s just because I’m way to used to glossy.

            • Meadows
            • 10 years ago

            What you say is like pressing your face against a mirror and “filtering out” your face.

            • Mithent
            • 10 years ago

            I do agree with you actually; I used a glossy monitor for several years and never really noticed reflections (although they were there, although it was hardly mirror-like), but when I switched to a matte monitor, sunlight would cast a haze over the whole screen that was quite hard to see through.

        • ImSpartacus
        • 10 years ago

        At least your eye isn’t drawn to that near perfect reflection. Matte FTW.

      • shaq_mobile
      • 10 years ago

      I SPOTTED SOMEONES NOSE!!!

    • dneonu
    • 10 years ago

    nice and awesome cpu by intel, however that monster cpu will not be able
    to fully flex it’s muscles until fitted with a ssd. hdd’s simply are too much
    of a bottleneck on high end systems thus limiting the other high computer
    components. remember, a computer is only as fast as it’s slowest
    component and the slowest ram memory, video card and cpu are
    all much faster than the fastest hdd.

      • I.S.T.
      • 10 years ago

      You’re overstating your point by a lot… Plus, you forget that many tasks exist that are always hungry for CPU power but don’t need much HD power(So to speak).

      It’s not as black and white as that.

      • tfp
      • 10 years ago

      What you are saying is all app dependent. Unless there are heavy read/writes to/from a disk faster harddrives will not improve system/application performance by as much as some people think. As long as a system has enough ram there will not be huge HD usage.

      • wibeasley
      • 10 years ago

      When you flex your muscles, who is your favorite SSD manufacturer?

      • esterhasz
      • 10 years ago

      In many Webserver workloads all data (or the data that makes for nearly all the traffic) is basically cached into system memory…

        • UberGerbil
        • 10 years ago

        Yes, and some computationally-intensive HPC loads have relatively small datasets also. Some are even small enough to fit into the 24MB of on-die cache. And even when the datasets are large, it’s quite possible for most of the work to still be using what is in main memory, so that the overflow of data streaming in and out is well inside the capabilities of the storage subsystem.

        So, no, an all-solid-state storage system isn’t necessary for the CPU to “fully flex” its capabilities under all circumstances.

      • maxxcool
      • 10 years ago

      after i stuff a Terabyte of ram in it drives wont matter much…

      • SomeOtherGeek
      • 10 years ago

      Yea, if the company knows what it is doing, there would be no bottlenecks.

      You remember Jurassic Park, the movie? It was made on all CPU and RAM, no HDD. Cuz they knew that they couldn’t make smooth animation using HDD. Go look it up and see how they did it, it is pretty impressive. They basically revolutionized the industry of animation via computers.

    • OneArmedScissor
    • 10 years ago

    Poor AMD, waiting on their 6 core Opteron. 🙁

    The tables sure have turned quickly in the server market. Competition is great and all, but this is more like murder.

      • sydbot
      • 10 years ago

      AMD should have their 12-core chips in the same time frame. They, uh, could be competitive. You know, might be. Per dollar.

        • OneArmedScissor
        • 10 years ago

        Might be, if they slash their prices and eek by with even lower profit margins lol…

        They only have themselves to blame here, though. They really messed up not going with hyper threading. It’s almost all of the difference between the two.

          • UberGerbil
          • 10 years ago

          Nehalem EX has higher IPC, more memory bandwidth, interconnect bandwidth, and larger cache. It’s not just SMT.

            • OneArmedScissor
            • 10 years ago

            I mean that if you disable hyper threading on a Nehalem, at the same price points, they’re pretty even, even if the Nehalems are running lower clock speeds in such a case.

      • mototime
      • 10 years ago

      You will see their 6 core opteron released on june 1. they work quite well..

        • dpaus
        • 10 years ago

        June 1st? Well, that explains the timing of this announcement, then 🙂

      • Anonymous Coward
      • 10 years ago

      AMD has lots of profitable space to play in below Inte’s 8-core wonder. There is no doubt that AMD can sell 6 cores at a nice profit, even if they end up pushing their quad core Opterons into the low end bracket.

        • Hattig
        • 10 years ago

        I imagine that AMD will be really pumping out the Istanbuls, indeed they’ve probably been stockpiling them for a while ready for this release. The quad-cores are surely going to get totally replaced fairly rapidly, at least in terms of being manufactured. Also surely it’s only a matter of time until there is a Phenom II X6…

        This octocore (with big lips) CPU is just a beast though. Alas poor Itanium, we never liked you anyway.

          • Anonymous Coward
          • 10 years ago

          I think 6 cores is the most sensible thing AMD has done since K8.

          Dunno about Phen X6. Quad cores is already a bit over the top, I think. If I were running AMD I think I would have transitioned to native 3 cores for the high end desktop and brought down higher core counts from six-core Opteron dies, for people who render, or who are insane.

          65nm would definately have been tripple core only, if I ran AMD.

            • Hattig
            • 10 years ago

            From the GlobalFoundries article at BSN yesterday I saw that they’re now 100% 45nm in their Dresden fab, so I presume that they’re making (or soon will be making) native 6-cores (Opteron), quad-cores (Phenom II X4 and X3) and dual-cores. The latter could be interesting in terms of overclocking. I think they’re also making the quads and duals in native no-L3 variants (rather than binning the ones with duff L3). There must be enough single-core defects to make the X3 variants from the X4s without needing to make a native tri-core (although I think that would be a viable ongoing replacement for the dual-core product line beyond 45nm).

            • Anonymous Coward
            • 10 years ago

            I wonder how much it saves to make a 45nm X2 vs a 65nm X2. Probably most of the cost is in testing and packaging.

            • OneArmedScissor
            • 10 years ago

            You are probably right, but they end up with something like twice as many of them, so it depends on how you want to look at it. It undoubtedly helps them a lot with their disabling cores business.

            I hope the non-L3 versions are being made natively that way. That could potentially cut close to 1/3 of the price, as the L3 is a huge chunk of the chip.

            I don’t think they’re doing native dual cores, at least, not yet. That’s a bit odd, since I figured the laptop market would justify it, but AMD doesn’t seem particularly interested in going anywhere there. I guess it’s low on the list of priorities, as their chipsets walk all over Intel’s as it is, and in reality, it’s probably not going to save any battery life or money going from a 65nm Athlon 64 X2 to a 45nm Phenom II.

            • sydbot
            • 10 years ago

            What I want to know is if they will make an actual 45nm X2, or just salvage X4 dice.

    • toxent
    • 10 years ago

    Virtualization on a server with these will be ridiculously good.

    • Shinare
    • 10 years ago

    I wonder how well it folds…. Would there be any reason to go this route over some GTX295’s on a single i7 mobo?

    • Ricardo Dawkins
    • 10 years ago

    I didn’t know that was a task manager window. Looked like some type of window in a office.

    • Krogoth
    • 10 years ago

    I rather wait for Westmere-generation parts to come.

    • Palek
    • 10 years ago

    I wonder if the default Task Manager shipping with Windows can actually recognize this many cores, or will it freak out and throw some “overwhelmingly powerful computer exception” error?

    Edit: grr, meant to reply to #8.

    • blastdoor
    • 10 years ago

    “proprietary RISC-processor based systems”

    Not a lot of those left these days. I guess they have to be talking about POWER, or do people still buy SPARC systems?

    Anybody know how this compares to POWER?

    • wibeasley
    • 10 years ago

    This question is inspired by UG’s #16 post: Suppose you have a trivially parallel task that is compute-bound. Does anyone know if OMP scales well to 128 threads? Or is it likely you’d have to do something in addition to OMP?

    I assume it still would be more cost efficient for typical HPC centers to get 4 times as many 2 socket nodes (per one of these 8 socket nodes), and then rely on something like MPI. But I’d love to hear an more educated opinion.

    • A_Pickle
    • 10 years ago

    Nom nom nom. I want a desktop variant of this?

      • derFunkenstein
      • 10 years ago

      not sure what you’d do with it. I guess render boxes or media encoding boxes could benefit, but those are workstations, not desktops.

        • burntham77
        • 10 years ago

        While I love my Phenom x4, and I can make those four cores dance, I have yet to think, “I could use a couple more cores.”

        Not yet.

      • Krogoth
      • 10 years ago

      This chip is meant to be server-only. I suppose you could get it for a desktop (LGA1366 only). That is only if you are willing to soak the “steep” premium that server-grade chips command.

      It would be wiser to wait for Westmere-generation to come out. (32nm, sextal core + HT, more architecture tweaks etc).

        • yuhong
        • 10 years ago

        Though to be honest, building a 16 core system with 2 of this chip would still be cheaper than building a 32 core system with 4 of them. But I wonder how much this chip cost compared to 2 quad-core Nehalem-EP chips.

    • ssidbroadcast
    • 10 years ago

    Question: “Intel Scalable Memory Buffer” chips? Is this going to be another FB-DIMM situation or???

      • UberGerbil
      • 10 years ago

      Yes, it’s buffered memory. That’s pretty standard and expected in this segment when you’re dealing with memory in these quantities.

      Intel’s mistake in the Core 2 era wasn’t buffered memory in itself, it was requiring buffered memory even for their low-end Xeons. That made 2S workstations/servers/blades unnecessarily expensive, hot, and power-hungry. They didn’t make that mistake this time around: the 2S Nehalem Xeons (Gainestown) use regular DDR3.

        • derFunkenstein
        • 10 years ago

        Is it really regular DDR3? I thought Nehalem Xeons had some setting in or other requirement in their integrated memory controller that required ECC or registered memory. I can’t find any supporting evidence, though.

          • georgeou
          • 10 years ago

          Just about all servers require ECC at a very minimum. This is true of Intel or AMD.

            • derFunkenstein
            • 10 years ago

            No, I’m pretty sure Opterons don’t REQUIRE ECC, though it’s a good idea in a machine that is supposed to have uptime measured in weeks or months.

            • georgeou
            • 10 years ago

            I’m not sure if dual-socket Opterons can technically operate with plain old desktop memory, but they generally use registered DDR2-667 or DDR2-800 with ECC. However, the Opteron systems are limited to 8 DIMMs per CPU and if you go past 4 DIMMs per CPU, your memory gets bumped down a notch. So DDR2-800 would operate at DDR2-667 and DDR2-667 would be bumped down to DDR2-533.

            With the specially buffer chips that Intel is using (1 buffer per 4 DIMMs), they’ll be able to put in 16 DDR3 DIMMs per CPU at full speed. The IBM 4P servers already use this architecture and the power consumption is lower than FBDIMM and you don’t need to spend as much money on memory. Furthermore, each Nehalem-EX CPUs will operate as fast or faster than two Shanghai processors.

          • Krogoth
          • 10 years ago

          I am sure that Nehelam Xeons can *run* with unbuffered, non-ECC memory. There are practically identical to desktop Bloomfield chips (both share the same socket, and QPI links).

          I would recommend against doing it if you intent of using the rig as a real-world, production server.

          • UberGerbil
          • 10 years ago

          You know, after I wrote that I thought /[

            • ssidbroadcast
            • 10 years ago

            Aw. This is the internets. We forgive you.

            • DrDillyBar
            • 10 years ago

            That’s why he’s called Uber

        • tfp
        • 10 years ago

        It’s not exactly buffered memory in the way FB-DIMMs were or Refistered Memory is. It’s a buffer on the MB between the memory and the CPU. Instead of putting the buffer on each DIMM it puts one buffer (per channel if I’m remembering correctly) on the MB itself. As I understand it, this allows the ram behind the buffer to be standard DDR3 (or Unbuffered ECC DDR3).

    • UberGerbil
    • 10 years ago

    Of course Tukwila isn’t the only design looking over its shoulder. AMD is well-positioned with Istanbul vs Dunnington this year, but they’re going to be at a disadvantage once Beckton is available. Lower IPC, fewer memory controllers/bandwidth, less cache, lower interprocessor bandwidth also.

    Magny-Cours might be a decent stop-gap, at least for loads where more cores matters more than anything else, but AMD needs to take the next architectural step to ultimately keep up at the profitable high end.

    • Zorb
    • 10 years ago

    It’s Intel at it again and most software will not benefit from 4 let alone 8 cores…. they can’t improve performance so more will do….

      • UberGerbil
      • 10 years ago

      This is a *[

        • Zorb
        • 10 years ago

        the ‘key’ word here is most…..

          • UberGerbil
          • 10 years ago

          And /[

            • Zorb
            • 10 years ago

            Great Answer…. TR has it…

          • Byte Storm
          • 10 years ago

          Since the word ‘most’ is rather synonymous with the word ‘majority’ (ie the largest grouping)…

        • wibeasley
        • 10 years ago

        UG, I disagree that he shouldn’t have posted that. If it weren’t for new TR members posting their gaming-centric views, we’d miss out on 8% of your posts.

        • ludi
        • 10 years ago

        Now we know what finally drove Paul DeMone crazy :’o

          • UberGerbil
          • 10 years ago

          Some are born crazy. Others have craziness thrust upon them.

          The internet collects all of both kinds.

            • eofpi
            • 10 years ago

            …and creates most of the second.

        • SomeOtherGeek
        • 10 years ago

        I’ll buy one and run 10 folding@home and leave the other 6 to run my other 6 apps.

    • XaiaX
    • 10 years ago

    That task manager picture is hilarious.

      • ssidbroadcast
      • 10 years ago

      For some reason I thought it was a solar panel at first glance…

        • blastdoor
        • 10 years ago

        the phrase “laugh out loud” gets thrown around quite a bit these days. But you, sir, made me LOL.

    • UberGerbil
    • 10 years ago

    Adding to Itanium woes — Tukwila is delayed until Q1 of next year, so Beckton will ship first, too
    §[<http://news.cnet.com/8301-13556_3-10246293-61.html<]§ Of course Tukwila at least gets to share some of the infrastructure with x86 this time around, which may keep the OEMs (HP at least) on board for at least a while longer. (Only folks familiar with Seattle will appreciate this, but considering this design was first talked about over five years ago, I don't think anybody would've predicted that light rail would reach Tukwila the city before Intel shipped Tukwila the chip) That TaskMan photo is classic.

    • DrDillyBar
    • 10 years ago

    It’s like that scene from Half Baked when the guy gets that thing for the scientist.

      • albundy
      • 10 years ago

      haha, i love that movie! jim breuer rocked back then.

      • VILLAIN_xx
      • 10 years ago

      lol, its like that scene when he’s signing off for that brick at the desk.

    • Trymor
    • 10 years ago

    Boy, one of these paired with todays virtualization tech…

    IT today sure has a lot more power and useability per square foot available for new “environment designs”. Gotta love it.

    • ClickClick5
    • 10 years ago

    Only starting at $2999 for the intro version and $5999 for the medium. The flagship cost $9999 however.
    Pre-orders start now. 😉

      • moose17145
      • 10 years ago

      Well considering the target market that really isn’t a huge surprise.

      • djgandy
      • 10 years ago

      $2999 is nothing when you are spending $1M and the SAS drives are $2k a piece.

      • UberGerbil
      • 10 years ago

      When the bottom of your market is machines like this…
      §[<http://h10010.www1.hp.com/wwpc/us/en/en/WF31a/15351-15351-3328412-241644-3328423-3716072.html<]§

        • ClickClick5
        • 10 years ago

        That server looks like something you could walk into.
        I work with smaller machines then that sadly. 🙁

        Although, I would really like to read up on someone who will eventually build a gaming rig from Newegg using this proc. Indeed I would.

    • moose17145
    • 10 years ago

    *droooooooooool*

    • Richie_G
    • 10 years ago

    But can it put the kettle on and bring me breakfast in bed?

      • ludi
      • 10 years ago

      No, but if you can afford one of these, you’ve already got the waitstaff to do all that.

      • MadManOriginal
      • 10 years ago

      Wrong question: Can it play…you know what?

        • Kharnellius
        • 10 years ago

        the violin?

          • khands
          • 10 years ago

          on Extra Real

      • Byte Storm
      • 10 years ago

      If you wrote a program for it to manipulate the robots built with it, maybe. hmm…..

      • Vasilyfav
      • 10 years ago

      But can it run Crysis?

        • Jambe
        • 10 years ago

        Hah. I was waiting for that one.

Pin It on Pinterest

Share This