Intel talks six-core processors, future prototypes

In a press conference earlier today, Intel provided a sneak peek at some of the papers it will present next week during the International Solid-State Circuits Conference in San Francisco. The chipmaker revealed a few juicy details about Gulftown, its upcoming six-core, 32-nm processor, as well as some interesting research prototypes kicking around in its labs.

Gulftown, also known as Westmere 6C, is essentially a six-core version of the dual-core Westmere design that recently debuted in Core i3 and Core i5 processors. Intel has fashioned Westmere 6C not out of three dual-core dies, but out of a single piece of silicon featuring six cores, a generous 12MB of L3 cache, a couple of 6.4GT/s QuickPath Interface links, and a triple-channel DDR3-1333 memory controller. (Westmere 2C, by contrast, comes with a companion die that houses its memory controller and graphics processor.) Hyper-Threading and Turbo Boost capabilities are part of the formula, too.

All told, Westmere 6C packs 1.17 billion transistors and measures 240 mm². That’s actually smaller physically than the 45-nm Bloomfield die from quad-core Core i7-900 processors, which spreads out 731 million transistors over 263 mm². Intel has no plans for a native quad-core 32-nm chip, as far as we know, but it does intend to release quad-core versions of Westmere 6C. Those products will simply have a couple of cores disabled.

Left: Westmere 2C. Right: Westmere 6C. Source: Intel.

Intel has also done quite a bit of work to make Westmere, in both its 2C and 6C forms, more power-efficient than previous offerings. For example, Westmere is the first Intel processor design that can do power gating with not just CPU cores, but also the "uncore" elements of the chip. Those uncore parts include the cache, QuickPath links, and memory controller; power gating, meanwhile, cuts off power delivery to certain areas of the chip, which helps improve power efficiency at idle. Intel has gone so far as to make the processor flush the contents of its L3 cache into an SRAM so the cache can power down completely. Also, Westmere 6C supports low-voltage DDR3 memory, which runs at 1.35V instead of the standard 1.5V. Using low-voltage DIMMs purportedly brings a 20% reduction in memory power draw.

Intel also gave us a peek into some of the advanced projects taking place in its research labs. There, the chipmaker is working on package-to-package connectivity, as well as "digital intelligence" to get the most performance out of many-core designs.

On the data connectivity front, Intel has developed a 47-channel interconnect that links chip packages directly and enables 470Gb/s (58.8GB/s) of bandwidth using only 0.7W of power. The interconnect takes the form of a ribbon cable—not unlike flexible multi-GPU bridge connectors—connected directly to each package. Intel says this approach saves considerable amounts of power over a more conventional design.

Intel sees this type of interconnect as especially handy for future, so-called "tera-scale" devices, which might need to move a terabyte per second from one chip to another. A conventional interconnect design might require about 150W of power to maintain that bandwidth, but the ribbon-cable approach could do it with "about 11W."

The ribbon interconnect in the flesh. Source: Intel.

Speaking of tera-scale processors, Intel brought up that 80-core proof of concept it originally revealed at its fall 2006 developer forum. Such many-cores designs would be particularly vulnerable to inconsistencies in clock and voltage scaling between different cores, but Intel has come up with some interesting ways to get around that problem and maximize performance.

These days, both Intel and AMD set the speeds and voltages of their CPUs based on what the slowest core can achieve. However, Intel says it could define those parameters on a per-core basis in many-core designs, so some cores would run at higher clock speeds or with lower voltages than others. Then, in light workloads, the processor could intelligently map threads to its most capable cores. That approach alone could result in a 6-35% energy saving over simply assigning threads to cores on a random basis. Intel thinks it could achieve an additional 20-60% energy savings for "certain tasks" by using a more aggressive scheme called "thread hopping," which would move threads to faster cores as soon as those cores became free.

The firm is also looking into what happens in harsh conditions, when a chip can’t always produce correct results every time. In some of the research scenarios, as we understand it, processor cores could be pushed to run at higher clock speeds or lower voltages than usual.  The processors could then be allowed to make errors and correct them either by running instructions again at half the speed or by running the same instructions multiple times. Purportedly, if an error were flagged in such a case, the instructions would only have to be run twice to ensure a correct result. The chipmaker didn’t go into too much detail here, but it claims running at nominal settings with error correction could improve performance by 40% or power efficiency by 21%.

Intel’s research projects can sometimes produce real innovations that get incorporated into future products, but they’re usually many years from production when they’re first presented in a context like this one. Still, the work itself is often fascinating, and it gives us a sense of what to expect many years down the road.

Comments closed
    • Vaughn
    • 11 years ago

    This is intel not AMD. I highly doubt you will be able to unlock those cores. They have the performance lead and see no point in them allowing you to do so. On amd’s side I can see the advantage because they can sell more to make up for the performance gap alittle.

    • Monchkrit
    • 11 years ago

    Can someone please tell me what a GT/s is?

      • sigher
      • 11 years ago

      “In computer technology, transfer and its more common derivatives gigatransfer (GT) and megatransfer (MT) refer to a number of data transfers (or operations). ”

      /[http://en.wikipedia.org/wiki/Transfer_%28computing%29<]§

      • Meadows
      • 11 years ago

      One billion Tesla per second.

    • johnrreagan
    • 11 years ago

    Early Tandem NonStop systems had specialized CPUs that did self checking and systems that ran in lock-step to validate each others results via a special inter-CPU bus. More recent systems do that via firmware and/or system software if I remember correctly.

      • d0g_p00p
      • 11 years ago

      My old job had a couple of Compaq Himalaya NonStop servers. They were awesome. Everything was redundant, error correcting up the wazoo. I remember doing a CPU and memory swap while the server was live. It felt really strange pulling stuff out of a running machine.

      It would be interesting to see a current version of those machines if they still make them.

    • ritsu
    • 11 years ago

    Long ago, a Japanese research team designed a self-checking version of an 8080. It took 30% more transistors, and of course it would have bigger circuit delays. Doesn’t seem worthwhile until shrinking makes operation really dicey. Even then, why not just back off? Higher design time/cost; there’s a lot to speak against it.

    • Anonymous Coward
    • 11 years ago

    Nice to see Intel try its hand in the “disable CPU cores for fun and profit” game.

      • NeelyCam
      • 11 years ago

      Yeah; unfortunately I doubt Intel will get caught allowing customers to enable the extra cores…

      And I feel bad for those people who try to replace their Lynnfields with these shiny new 32nm quadcores…

        • OneArmedScissor
        • 11 years ago

        These should be socket 1366 with no integrated PCIe bus, though.

        It would still be a pretty lame upgrade for a Core i7 900, but for once, their mess of socket types accomplishes something.

          • NeelyCam
          • 11 years ago

          I was thinking of the scenario where someone thinks he can just switch an i7-860 quad-core with a cool new i7-935 32nm quad-core, only to realize that the new chip he just bought doesn’t fit into the socket.

          But it’s all his fault, though. It is completely stupid to expect a new i7 quad core to fit into a socket for another i7 quad core. Dumbass.

            • UberGerbil
            • 11 years ago

            The Good/Better/Best/Extreme! branding (i3/i5/i7/i9) is for end users who don’t even know CPUs go into sockets, much less ever contemplate opening the machine up to swap them. Folks who are actually pulling off heatsinks and plugging in new CPUs should have no problem with multiple sockets and sorting out what fits and what won’t. If they can’t, well, it’s kind of like a “you must be this tall to be on this ride” sort of threshold competency test.

            • yuhong
            • 11 years ago

            Yea, espicially as Intel still do make it easy to distinguish by using the first digit of the processor number that comes after the name.

      • ihira
      • 11 years ago

      If these disabled 6cores are much cheaper than natives and can be unlocked via BIOS, then I’m all over it.

        • MadManOriginal
        • 11 years ago

        Much cheaper, probably, unlockable, not a chance. So for all intents and purposes for 1P (ie – desktop) use they’ll be the same as 4c/8t i7 CPUs. Maybe they will have more cache left enabled to differentiate them but the difference that makes is pretty small for desktop use as well.

          • ihira
          • 11 years ago

          I was hoping maybe you can access the 5th/6th core ala Phenom II X2/X3 style.

          I don’t know enough about CPU architecture from an engineer perspective but is there a reason why it can’t be possible on Intel CPUs?

            • MadManOriginal
            • 11 years ago

            It depends upon how they’re disabled. They can be ‘soft disabled’ as in some AMD CPUs or ‘hard disabled’ where necessary components are physically disabled. A while back there were various graphics cards that used larger chips with disabled functional units that could be ‘unlocked,’ it happens occasionally still but very rarely (there were some 9600GSO that could be unlocked) but it was more common 5-8 years ago. Modern chips have disabled sections too, CPUs might have cache disabled and GPUs functional units, but there hasn’t been a compelling unlock aside from the AMD CPUs for a long time.

            I don’t see Intel making the mistake of allowing these chips to be unlocked to 6 core CPUs because they have no reason to tacitly allow it like AMD. It’s not a chip engineering thing it’s just a matter of how they disable the cores – ‘soft’ which can be recovered or ‘hard’ which can’t be.

            • NeelyCam
            • 11 years ago

            Yes; most likely the extra cores will be “fused out”

            • OneArmedScissor
            • 11 years ago

            Ever seen a Core 2 where you can activate more cache, or a Core i5 Lynnfield where you can activate hyper threading?

            AMD CPUs can unlock cores and cache because it’s pretty much a feature that’s built in, much as they intentionally sell “black edition” CPUs with unlocked multipliers even below the $100 range.

            Intel, however, charges a minimum of $999 for just an unlocked multiplier, and they sure as heck make sure you can’t turn on the disabled parts of their chips.

          • OneArmedScissor
          • 11 years ago

          I really doubt they even leave the extra cache. That would give them the same advantage in game benchmarks as the 6 core versions will appear to have.

          The chips with borked cache have to go somewhere, and I really doubt it’s going to be the trash.

    • NeelyCam
    • 11 years ago

    Is that *[

      • loophole
      • 11 years ago

      Yep it does appear that this Westmere 6C die shot has two QPI links.

      It’s probably a die shot of the Westmere 6C Xeon part rather than the Gulftown desktop part.

      • FuturePastNow
      • 11 years ago

      It had better have two of them if Intel plans on selling any 2p Xeon versions.

      • UberGerbil
      • 11 years ago

      All the Bloomfields had two QPI links also, but only one of them was active. They were essentially the same silicon as the Xeon-EP (Gainestown), just not tested/qualified for two-socket use — and Intel was able to get “enthusiasts” and workstation customers to pay for the privilege of doing the last round of testing.

      They’re clearly doing the same thing with Gulftown.

        • NeelyCam
        • 11 years ago

        Got it – thanks!

    • IntelMole
    • 11 years ago

    Some thoughts:
    – That ribbon interconnect reminds me of flaky IDE cables. I hope their real-world production version would use some sort of solid board instead.

    – Setting per-core clock speeds is a pretty natural end consequence of modern semiconductor fabrication. I assume they would have some guarantee of average clock speed, or clock speed distributions per model. But if this ever comes out, good luck getting consistent testing results Scott 🙂

    – Error correcting “overclocked” processors sounds perfectly insane.

      • NeelyCam
      • 11 years ago

      The ribbon interconnects is mainly to get nice, clean data channels – they are there to _replace_ the lossy boards with tons of discontinuities. This simplifies the I/O transceiver design.

      If you put back the crappy boards, you lose the power savings.

        • MadManOriginal
        • 11 years ago

        I think he means a solid board like some SLI connectors not part of the motherboard. It really doesn’t matter too much as long as it does what it has to.

          • NeelyCam
          • 11 years ago

          The problem is, that probably wouldn’t do the job. The connectors would cause pretty bad discontinuities in the in the channel impedance, requiring some sort of equalization circuitry. Power would go up, speed go down etc.

            • MadManOriginal
            • 11 years ago

            So you’re saying the ribbon is a direct connection to the CPU package with no connectors? Yeah, that will really work great in the real world. If so I see it as a proof of concept more than anything.

            • djgandy
            • 11 years ago

            This is for people buying servers though. The ribbon looks fine to me providing the board layouts allow it to be installed in the same way it is in the picture.

            • NeelyCam
            • 11 years ago

            I probably said too much.. you better go see the ISSCC presentation next week.

      • Krogoth
      • 11 years ago

      You will not see those “ribbons” in desktops any time soon.

      They are going to first come to small and big iron boxes.

      • bdwilcox
      • 11 years ago

      Yeah, I can just see snagging one of those ribbon cables when installing a DIMM in a tight server chassis. Good times, indeed.

    • Welch
    • 11 years ago

    Each core running independent of one another will definitely make overclocking interesting again :). At the same time i’m concerned how this having so many different volts and clocks could end very badly if one attempts to run at the same speed as a higher clocked one that is able to run stable when its not. Interesting stuff Intel…. interesting stuff.

    • wingless
    • 11 years ago

    Hasn’t AMD been doing all of this with Hypertransport for a while now? It is really good to see Intel catch up on the interconnect front. Their desktop Hex-cores will be awesome for Folding@Home!

      • OneArmedScissor
      • 11 years ago

      Well it certainly doesn’t use 0.7w. That’s really something.

      Most parts have been improved to use less electricity over time, but as they become more powerful and need to move more information around to make any use of it, that keeps eating more electricity.

      It’s my understanding that the X58 platform can eat so much electricity because of the QPI links. Take that out of the equation, and suddenly, energy hog servers are down to the desktop level.

      • NeelyCam
      • 11 years ago

      Intel caught up with HyperTransport back then when QPI came out (a year ago?). This is research, but would zoom past HyperTransport – each HT3.1 lane runs at 6.4Gb/s, while in this 47-lane monster, each lane is running at 10Gb/s.

      470Gb/700mW=1.5mW/Gbps – this is extremely power-efficient.

    • Meadows
    • 11 years ago

    Instruction error correction sounds like a great way to make “unstable” overclocks work just fine, with their occasional issues negated.

    Would be a great step forward.

      • Shining Arcanine
      • 11 years ago

      If they run the same instructions twice, wouldn’t that involve using double the hardware? How much of a performance increase can you get from increasing the clock speed before it can no longer have an acceptable power consumption? I think that the power consumption increases as a square, so I do not think doubling the hardware and then doubling the clock speed to compensate for the loss that doing calculations twice incurs would fit into an acceptable power envelope. It seems to me to be a way to do things 2 times faster for 8 times the power, which does not seem acceptable.

        • MadManOriginal
        • 11 years ago

        I think they mean they would rerun the code not have twice the hardware. How they would accomplish the task of knowing the answer before having the answer I’m not sure, and if you already have the answer why calculate it! 😉

        It may be a way to push Turboboost (overclocking) even further as alluded to by Meadows. They’d just have to get the algorithms right to make sure it’s a beneft and not a hindrance.

          • Meadows
          • 11 years ago

          It’s not so much “knowing the answer” as simply detecting invalid or out-of-bounds answers, I guess.

            • Shining Arcanine
            • 11 years ago

            I have heard of probablistic processors before, but they are designed on the premise that is okay for the answer to be off within a certain margin of error. I am sure many scientific applications and business applications would not be very tolerant of that, as the scientists want to be able to repeat their calculations to verify that some reason was not caused by a hardware error (which becomes impossible when things start becoming random) and the businesses want to keep track of every penny that moves in a transaction.

            • Anonymous Coward
            • 11 years ago

            I think that “small” random errors would pretty quickly desync multi-player games too, since AFAIK they pretty much always run models of what the other guy is doing in order to prevent cheating.

            • Meadows
            • 11 years ago

            That’s not likely.

            • ew
            • 11 years ago

            That isn’t how cheating is prevented in client/server type multiplier games. All decisions that affect game play come from the server. I client can only cheat if it can convince the server that it is doing something it normally shouldn’t be able to do. So, for example, if you told the server you were moving forward at 5ft/sec but the games rules say a player can only move at 3ft/sec then the server can say I don’t think so and tell the other clients that that particular user has just spontaneously combusted instead.

      • sigher
      • 11 years ago

      Proof there is a god and he rides a unicorn: §[<http://somerandomlinktoapciturethatprovesnothing.com/654_321.jpg<]§

        • Convert
        • 11 years ago
          • End User
          • 11 years ago

          Have TR readers suddenly lost interest in overclocking?

          The original link shows the OC potential of the Gulftown. That example is with liquid cooling but my i7-920 D0 runs stable @ 4.2 on air with Cinebench R10 scores of 6050/24400. On air the Gulftown should be able to get a score of roughly 6000/37000. That is cool in my books.

            • Rakhmaninov3
            • 11 years ago

            Seriously! This looks like a badass chip all-around. And if it OCs to 4.8GHz on liquid, imagine how much folding it could do!

            🙂

            It might cost a thousand bux, though. But think of the bragging rights. It’d be like being able to say you have a 12-cylinder race car that’s environmentally friendly.

Pin It on Pinterest

Share This