Intel’s Nehalem to feature eight cores, 16 threads

Intel Developer Forum — In his opening keynote speech at Fall IDF today, Intel CEO Paul Otellini revealed new details about the firm’s next-generation CPU microarchitecture, code-named Nehalem. The design is now complete, and Otellini claimed it’s on track for delivery in the second half of 2008. In fact, he displayed a wafer of Nehalem chips and reported that each chip will be comprised of approximately 731 million transistors.

Chief architect Glenn Hinton made an appearance in order to demo a working Nehalem-based system, just three weeks old, running Windows XP. Otellini also claimed the team managed to get Mac OS X booting just today.

In its “largest configuration,” Nehalem will pack eight CPU cores onto a single die. Each of those cores will present the system with two logical processors and be able to execute two threads via simultaneous multithreading (SMT)—a la HyperThreading. So a single Nehalem chip will be able to execute 16 threads at once.

Hinton said the design team put quite a bit of effort into improving single-threaded performance in Nehalem, and claimed that each feature added to the chip had to meet stringent power-efficiency guidelines, as well. Nehalem will integrate a high-performance memory controller and a new chip-to-chip interconnect known as QuickPath—both provisions similar to AMD’s Opteron processors.

Comments closed
    • Prospero424
    • 12 years ago

    I wonder how many geeks out there are saving that image and just zooming in out of curiosity.

    /guilty

      • evermore
      • 12 years ago

      Why would you bother zooming in on such a low-res image?

    • Krogoth
    • 12 years ago

    Ladies and Gentleman, the Nehalem’s existence confirms what Intel engineers knew about since the Prescott but didn’t want to admit it at the time. Silicon is running out of steam and the days of rapid clockspeed progression are over. There are buying some time by throwing cores at the problem. They are going to hit some other interesting limits with silicon as the gains from die shrinking keep diminishing.There is only so many tricks that you can do to increase IPC. Parallelism has its own limits.

    I honestly doubt that the mainstream market even needs 8-cores or will ever find a use. It is difficult enough to get programmers to code programs to be multi-threaded and efficient with two-cores.

    There is lot more to Nehalem die than what meets the eye. I suspect that not all of those eight cores are actually general purpose. However, time and Intel will tell.

    BTW, I think Shintei, Progesterone, Porkster and tombguy had a collective eorgaism. 😉

      • Pax-UX
      • 12 years ago

      Haha, yeah… they’re rebranding, calling it Cell

      • btb
      • 12 years ago

      I thought so too, until i read this:

      §[<http://msdn.microsoft.com/msdnmag/issues/07/10/Futures/default.aspx<]§ With those kind of new libraries, developing applications that take advantage of multiple cores will be ALOT easier. Very impressive to get a 7x speedup on an 8-core machine by changing one line of code.

        • Krogoth
        • 12 years ago

        Nope, that is marketing PR that nicely sugar-coats the massive challenges that parallelism presents to programmers.

        This is real reason why parallelism has some inherent limitations a.k.a laws of diminishing returns.

        §[<http://en.wikipedia.org/wiki/Amdahl%27s_law<]§

          • btb
          • 12 years ago

          Ahh another ignorant Microsoft hater… Obviously the amount of speed-up you can get is dependant on how much of the work can be parallelized. In the case of the raytracer the answer is pretty much all of it! Hence the near linear speedup. I guess you didnt even bother reading the wiki article you linked to either.

            • Krogoth
            • 12 years ago

            It only applies to certain niches that certainly most of fall into the realms of scientific computing, render farms and such.

            I didn’t say the road of parallelism was completely useless, but for mainstream computing it offers some trying challenges to future programmers because not all code can be parallelized.

            That news blub was about refining the tools to make the painful transition easier.

            It is more likely that future of personal computing will make a visit to past. Welcome back terminal computing! You can have a single system with tons of cores that does everything with several dumb terminals link to it via WLAN or LAN. The power user on the other hand will have the option of running several virtual PCs for security and legecy reasons.

      • snowdog
      • 12 years ago

      Confirmed long ago with dual core from AMD/Intel and stated plans to go quad core and higher.

    • VTOL
    • 12 years ago

    Dumb question: why are those things always cut round instead of square?

      • Mr Bill
      • 12 years ago

      Silicon is grown like a candle suspended and rotating in a molten bath. That gives a long rod with a circular cross section. They slice that rod to make the wafers.

      • UberGerbil
      • 12 years ago
      • IntelMole
      • 12 years ago

      Good question. One of the answers (others have been posted here) is that some of the treatments involve centrifuge processes i.e., they slice the silicon up real thin, apply a drop of solution, and then spin the thing really fast to get even coverage.

      If the thing was square, those forces would be way unbalanced, and the distribution of those treatments would probably be rubbish.

    • Archer
    • 12 years ago

    The Emperor’s voice just popped into my head…something about a fully functional battle station!

    AMD just wet its pants.

    • flip-mode
    • 12 years ago

    No AMD toast?

    This is a huge announcement. AMD better have an 8 core MCM in the works.

    • ssidbroadcast
    • 12 years ago

    I’m not skeptical about the science, but I’m skeptical about real-world performance. The scientific principles of NetBurst, even in the improved Prescott core, were sound but in reality performance just didn’t put out.

    • snowdog
    • 12 years ago

    This better be a big improvement over Netburst hyperthreading which was rarely beneficial and often slowed down some tasks.

    At 4 cores this will require a minimum of 5 threads to show an improvement in theory. In practice, with task swapping between processing, I can’t help think this is something that causes more hassle than benefit. 4 real cores is likely plenty for a while until software catches up. Which I think will take some serious time.

      • UberGerbil
      • 12 years ago

      It’s for servers, where there’s (almost) always another thread. (See Sun’s Niagara). And the Core Microarchitecture is much better-suited to SMT than Netburst ever was. (See my comment #25)

    • ew
    • 12 years ago

    SMT = simultaneous multithreading

      • Damage
      • 12 years ago

      But of course. Sorry, too much travel, too little sleep. Fixed.

    • 1970BossMsutang
    • 12 years ago

    I suppose i should start saving for one!

    • Ricardo Dawkins
    • 12 years ago

    boring…we dont need cores..we need gigahurts !!!

      • Sargent Duck
      • 12 years ago

      Actually, more cores is where it’s at. Looking at fps games, if game devs could offload your character movement to 1 core, AI to another, physics to another core, sound to another one. That’s 4 core’s right there, and guaranteed if all 4 cores were utilized to the fullest, it would be *much* faster than a 6 or 7ghz single core.

        • UberGerbil
        • 12 years ago

        Except your threading isn’t going to be structured that way. It’s easy to wave your hands and say “oh, put that task into another thread” but in practice, because of serialization of resources and shared state, it’s rarely that simple.

      • Krogoth
      • 12 years ago

      Physics pwns silicon.

    • Spotpuff
    • 12 years ago

    I thought hyperthreading was bad for performance?

      • evermore
      • 12 years ago

      Only if both processes need the same resources at the same time. In which case, theoretically, it should just mean one of them has to wait until they’re free so you basically just get single-core performance out of that HT-core. They may have improved it considerably with Nehalem so that it gets closer to that ideal. I believe the OS also favors the physical cores but not sure on that.

      If you wanted every thread to have the best performance possible at all times, you could disable HT. But having it on allows at least part of another thread executed in tandem.

      • UberGerbil
      • 12 years ago

      SMT isn’t necessarily bad in general. The “hyperthreading” implementation on the P4 wasn’t a particularly good one for a couple of reasons: the P4 wasn’t an especially wide design (there were only two execution ports, one for integer and one for FP/SSE, vs 3 generalized ones on Core2) and it had a long pipeline that required a “replay” in certain cases (basically flushing the pipe and starting over). As a consequence, hyperthreading only worked well on certain workloads (streaming operations, particularly those where there was no more than one FP-heavy thread and one integer-heavy thread).

      But as a generalized approach, SMT is not necessarily bad at all: on wide designs, it offers a means to employ otherwise-dormant functional units. For this reason SMT is a feature of most higher-end architectures: IBM’s POWER, Intel’s Itanium, and Sun’s Niagara all offer SMT (technically, Itanium’s approach is Coarse MultiThreading, which is better suited to an in-order CPU). Even the Xenon (PowerPC-derived) CPU in the XBox employs SMT.

      When the Core Microarchitecture was introduced, a lot of people looked at its shorter pipe, more-generalized functional units, and wider design and speculated that SMT would make a reappearance somewhere down the road. It’s far better suited to it than Netburst ever was.

      SMT does have its downsides, of course: it effectively halves the cache and can increase the demand for memory bandwidth. It also tends to drive up power consumption. And being able to execute large numbers of simultaneous threads only makes sense on servers anyway (at least for now). So I would expect HT to be a feature of the Xeons only, and not be offered on mobile or desktop chips (except perhaps for special “Extreme” offerings).

    • evermore
    • 12 years ago

    Can XP actually address 8 cores, let alone 16 logical CPUs? I know it can work with two dual-core processors for 4 functioning cores, and I presume 8 cores if you had two quad-core.

    It seems strange that Windows actually counts how many physical processors are in a machine, how does it know the difference between 2, 4, or 8 physical CPUs and one CPU with multiple cores? Particularly with Intel chips which are on a single shared bus (or two lately). Do multiple cores all share a common address or something which Windows just counts as 1?

    Oh I reread. In its “largest” configuration. So they may not have been demoing an octo-core there.

      • UberGerbil
      • 12 years ago

      Yes, XP can address 8 cores — the SP2 kernel is derived from Server 2003, which can go higher than that, and dual Xeon Quad workstations exist today, ie the “OctoMac” — but whether it does so efficiently is a different question. In practice the (initial) market for this is people running a server OS, particularly when SMT is enabled (which I suspect will be a Xeon-only feature). Server 2K3 R2 DataCenter supports up to 64 processors IIRC; Longhorn Server (Server 2K8 or whatever it ends up being called) has a bunch of kernel improvements specifically addressing machines with large amounts of memory and large numbers of cores. And most of the software that can make actual use of 8 (or 16) cores is server software.

      When you say you think it “strange” that Windows counts sockets and not cores, do you mean that from a technical standpoint or a business one?

      Technically, ACPI defines, and the BIOS (or EFI) provides, a bunch of System Description tables that are inspected as part of the boot process; these allow the OS to determine, among other things, how many processors and cores there are (on Opteron motherboards there is also something called a Static Resource Affinity Table that describes the topology of the processor nodes in a NUMA system, so the OS can make intelligent decisions wrt the memory attached to each CPU). This enables the OS to know exactly how many physical CPUs there are, how many cores each has, and how many “virtual cores” there are when the processor is capable of SMT.

      As a business practice, Microsoft made a decision to count sockets, not cores, for the purpose of software licensing; this was contrary to how software is generally licensed in the “big iron” world (Oracle, for example, doesn’t license that way even on PCs) but was an extremely savvy move in terms of growing marketshare for Windows servers from the bottom up.

        • Flying Fox
        • 12 years ago

        Yup, I seriously don’t want to get into calculating Oracle’s licensing fees for multicore processors. That formula can’t be more convoluted IMO.

          • stdPikachu
          • 12 years ago

          Oracle did actually produce an app designed to analyse your hardware and calculate your licenses for you.

          Unfortunately, the license calculation algorithm required at least eight CPU’s to work things out within a quarter year timeline and was itself subject to per-core licensing, so the product flopped. These days Oracle just tell people to put another 0 on the end of the cheque ever year.

        • evermore
        • 12 years ago

        I did mean from a technical standpoint. I knew there was a way for the system to notify the OS how many physical CPUs there were, but multiple cores is pretty recent, did ACPI have to be updated to include an item for how many cores per CPU?

      • snowdog
      • 12 years ago

      XP showing sixteen cores at IDF. §[<http://www.fudzilla.com/index.php?option=com_content&task=view&id=3122&Itemid=51<]§ Nehalem not working with Vista yet...

    • gratuitous
    • 12 years ago

    That’s a little over 200 cores he’s holding up there. Or, depending on when you price them, $50,000 – $100,000 worth of chips. Would make a very expensive mousepad. 🙂

      • Peffse
      • 12 years ago

      I jokingly thought at first… “I wonder if that’s how big the processor will be”
      Then I thought “nah, it’s not the 80’s anymore”

        • echo_seven
        • 12 years ago

        or… if you could build a socket for “it”, 15-20kW of power draw

        0_o

        • UberGerbil
        • 12 years ago

        If it was an Itanium, yeah…. 😉

      • evermore
      • 12 years ago

      That one wafer may represent more processing power than some supercomputers. I hope it was an already defective one, not a good one they let him get his fingerprints on. I’d also put the upper bounds on the value at at least $200,000 if they’re only dual-core, thinking of a price per chip of $1000. If that’s an 8-core wafer, that might be a million bucks worth of silicon and copper.

        • dextrous
        • 12 years ago

        assuming 100% yield?

        • sluggo
        • 12 years ago

        There are no functional dies at the edges of the wafer. Consequences of mapping large rectangles onto a circle.

    • Mr Bill
    • 12 years ago

    I wonder if 8 cores is the best interconnect geometry for internal “quick” links.

      • Lord.Blue
      • 12 years ago

      Might actually work better with 9. Depends entirely on how the interconnect works. If you were to arrange the cores like so:
      XXX
      XXX
      XXX
      and have a cross connect link from each outer core to the center as well as the neighboring 2 cores, then you would never have to go more than 1 hop to get info from another core. This is one of the reasons why a 3 core chip can, at times, out-pace a quad core. It’s also why AMD is releasing a server with a 3 socket design as well. You could also do a similar set up with 7 cores, and 5 cores as well. If the 45nm designs from AMD and the 32nm designs from Intel work out as well as they both hope, we will be in for a fun ride over the next 2 years.

        • tfp
        • 12 years ago

        They could always put the memory controller in the middle spot.

          • Usacomp2k3
          • 12 years ago

          That’s what I was thinking. Either that or some sort of co-processor that divies up processes (Actually, I guess that’s what the OS does, so nevermind).

            • tfp
            • 12 years ago

            Or they could put a switch in the middle.

            However that or the memory controller would imply that each CPU to talk to the other would need to go though a controller which is not the case for the Core2 chips now (not counting the quads).

            I would expect the Cores will cache snoop each other as means of access, I don’t think they need a link between each core like is needed because each CPU socket.

    • Helmore
    • 12 years ago

    q[

Pin It on Pinterest

Share This