Intel talks about its architectural vision for the future

In perhaps its most wide-ranging and technically dazzling display in years, Intel offered us a look into its direction as an integrated device manufacturing company steered in part by the architectural leadership of luminaries Raja Koduri and Jim Keller. The company brought a small group to the former estate of Intel founder Robert Noyce to set the stage for a detailed look at how it plans to yoke its vast range of technology into a coherent whole for its future.

Before we go any further, I have to note that Intel opened a firehose of information for us, and the embargo window for that information was quite abbreviated. Worse, my early flight home coincides with the embargo lift, more or less. I’ll be adding to this article throughout the day as I’m able, but we weren’t able to cover the entirety of what Intel showed in one go. Thanks for your patience as we digest this banquet of information.

Raja Koduri

Although the ostensible order of discussion for the day was both system-level architectures and microarchitectures, the question of process execution came up over and over in the course of our talks. The company was refreshingly frank about the fact that by binding process and architecture together in its cadence of advancement, it had exposed both itself and its customers to risk in the case that its manufacturing group were to stumble in the delivery of that process—and stumble it did, in the case of the 10-nm node. Both Koduri and Keller forcefully stated that the company wouldn’t allow that kind of catastrophic misstep to happen again, as it had harmed not just the company’s own roadmap but also those of the customers that depended on Intel’s reliable delivery of new products to expand the capabilities, performance, and longevity of systems with Intel inside.

Rather than offer up another pithy mnemonic discussing its development cadence, the company acknowledged that future architectures would be developed independent of process and built with the fabrication technique that made the most sense in the timeframe it needed to deliver those products to customers, whether that might be a leading-edge node for density and power reasons or an older node where performance was paramount and power and density were less important.

Jim Keller

Furthermore, Keller noted that in the case that a problem did arise, the company would still be able to deliver a product that fulfilled promised improvements in performance and capability to customers by falling back on alternate manufacturing techniques in its arsenal, a position he described as being inspired by his days at Apple. The iPhone maker is well-known for delivering products on a predictable schedule every year, and Keller said that Apple always had contingency plans so that its latest and greatest stuff wouldn’t be held up by unforeseen roadblocks in production.

That approach was affirmed to me in a conversation with Intel fellow Ronak Singhal, who heads the Intel Architecture Cores Group at the company. Singal noted that Intel is now approaching its core processor design more cautiously by logically describing a core earlier on in the life cycle of architectural development with fewer baked-in assumptions about the process it might be built on. The logical description of the chip can later be married to a physical process closer to the time when the company needs to produce it.

While that strategy may seem obvious on its face, my understanding is that past Intel cores were much more closely married to the physical processes that would be used to build them, and there was little room for error in the previously unthinkable event that a core needed to be migrated to a different manufacturing process. With its new development approach, the company is apparently better positioned to produce its newer tech on older nodes if need be so that it can still give customers what they need to build around new processors with new capabilities in a predictable time frame, even in the event that the manufacturing group isn’t ready with the latest and greatest process.

That view is consistent with the industry-wide idea that leading-edge process nodes are now long-term investments and that companies plan to extract value from them for as long as possible by whatever means necessary. While Intel still plans to do the hard work and investment required to develop leading-edge processes, the company ultimately wants its future to be defined more by the products it can deliver rather than process leadership first and product second. The cynic might point out that we’ve already seen the result of this strategy in three years and counting of Skylake-derived CPUs with ever-increasing clock speeds and core counts, but Koduri and Keller both seemed adamant that the long rule of Skylake was an aberration rather than the future of the firm—assuming all goes well from this point forward.

 

Ice Lake freezes over

Although nobody would explicitly say so at the event, Intel has essentially halted any volume-production plans it might have had for the Cannon Lake microarchitecture introduced with the Core i3-8121U. The first next-generation core built on the 10-nm process that Intel is confident enough to talk about in detail is called Sunny Cove, a name that refers only to the CPU core and not the SoCs that the company plans to build around it. That said, and despite some taciturn responses to questioning on this point at the event, I’m confident in saying that the first 10-nm processors that Intel plans to introduce in volume will fly the Ice Lake code name.

Sunny Cove is the first in the company’s revised 10-nm roadmap, and it’ll presumably begin arriving in client systems starting in the second half of 2019. To be clear, the “Cove” suffix refers to the CPU core inside Ice Lake, not the entire SoC itself. The company’s core roadmap also includes Willow Cove, whose highlights may include a cache redesign, a “new transistor optimization,” and enhanced security features. The Golden Cove follow-in in the 2021 time frame returns the focus to single-threaded performance, AI performance, networking and 5G performance, and more security enhancements.

Intel-watchers have long desired better fundamental per-core performance for some time, and Sunny Cove appears positioned to deliver. Intel’s Ronak Singal noted that the best way to extract general-purpose performance improvements from a CPU is to make it deeper (by finding greater opportunities for parallelism), wider (by making it possible to execute more operations in parallel), and smarter (by introducing newer and better algorithms to reduce latency).

Sunny Cove goes deeper by expanding its caches and record-keeping infrastructure to keep more instructions and data near the core and in flight. This core moves from a 32-KB L1 data cache to a 48-KB allocation. The L2 cache per core will increase, although as we’ve seen in the divergence between Skylake client and server cores, the amount of L2 will differ by product. The micro-op cache also increases in size, and the second-level translation lookaside buffer (TLB) is also more copious than in Skylake.

Sunny Cove is a fundamentally wider core than most every Intel design since Sandy Bridge, as well, expanding from four-wide issue to five-wide and increasing the number of execution ports from eight to 10. Each of those execution units, in turn, is more capable than those of Skylake. Intel added a dedicated integer divider on port 1 to reduce latency for those operations.

The core now has two pathways for storing data, and it now has four address-generation units (up from three in Skylake). The vector side of the chip now has two shuffle units (up from one in Skylake), and every one of the four main execution ports can now perform a load effective address (LEA) operation, up from two such units in Skylake. Sunny Cove also implements support for the AVX-512 instruction set extension that was first meant to be introduced to client systems by way of Cannon Lake.

An early Ice Lake-SP package.

To bolster the idea that Intel’s 10-nm process is in a healthier place than it has been of late, we saw at least three separate implementations of Sunny Cove cores running: at least one development board using an Ice Lake-U processor, another development board featuring Intel’s Foveros 3D packaging technique (more on that later), and an Ice Lake-SP Xeon demonstrating new extensions to the AVX-512 instruction set. While the company certainly wasn’t ready to talk exact die sizes, it was heartening to see 10-nm silicon ranging from minuscule to massive in operation.

 

Gen11 graphics promise high-end features for baseline gaming

As Sunny Cove will be the next-generation building block of Intel’s general-purpose compute resources, the Gen11 IGP will serve as the next pixel-pushing engine for Ice Lake processors. Intel gave us a high-level look at the GT2 configuration of its Gen11 architecture during its event. For the unfamiliar, GT2 is the middle child of Intel’s integrated graphics processors and sits on the die of to many of the company’s mainstream CPUs.

A prettied-up representation of the Gen11 IGP

Most prominently, Intel wants to establish a teraflop of single-precision floating-point throughput as the baseline level of performance users can expect from GT2 configurations of Gen11. Compared to the roughly 440 GFLOPS (and yes, that’s giga with a G) available from the UHD 620 graphics processor in a broad swath of basic systems on the market today, that kind of performance improvement on a platform with as much reach as Intel’s integrated graphics processors could bring enjoyable gameplay to a far broader audience than ever before.  

To get there, engineer David Blythe says his team set out to cram as much performance as it possibly could into the power envelope available to it. A Gen11 IGP in its GT2 configuration has 64 execution units, up from 24 in Gen9, and squeezing that much shader power into an IGP footprint and maximizing its efficiency was a battle of inches, according to Blythe. The Gen11 team apparently had to go after every small improvement it could in the pursuit of its power, performance and area goals, and that meant touching not just one or two parts of the integrated graphics processor, but every part of it.

The net result of that work was a significant reduction in the area of the basic execution unit. Blythe claimed that implementing a Gen9 EU and a Gen11 EU on the same process would put the Gen11 EU at 75% of the area of its predecessor, partially explaining how it was able to  pack so many more of those units into the undisclosed area allocated for GT2 configs of Gen11 on Ice Lake.

In pursuit of both power savings and higher performance, Gen11 supports a form of tile-based rendering in addition to its immediate-mode renderer. According to Blythe, certain pixel-limited workloads benefit greatly from the ability to keep their data local to the graphics processor, and by invoking the tile-based renderer, those applications can save 30% memory bandwidth and therefore power from the uncore of the processor. In turn, the Gen11 GPU can take the juice saved that way and turn it into higher frequency on the shader pipeline. The tile-based renderer can be dynamically invoked as needed during the course of shading pixels and left off when it’s not needed.

To keep more data closer to those execution units, Gen11 has a much, much larger L3 cache than Gen9. Blythe says that the GT2 configuration of Gen11 has a 3-MB L2 cache, more than four times larger than the one in the GT2 implementation of Gen9 and even larger in absolute terms than the 2.3-MB L3 in even the highest-performance GT4 implementation of Gen9.

Other improvements in the memory subsystem of the Gen11 IGP include better lossless memory compression, a common focus of improvement for making the most of available memory bandwidth in graphics processors both large and small. Blythe says the Gen11 compression scheme is up to 10% more effective at its best, but real-world performance is more likely to fall around 4% on a geometric-mean measure.

The Gen11 team also separated the per-slice shared local memory in Gen11 from the L3 cache. That structure is now its own per-slice private allocation, and each of those blocks of memory has its own data path to allow the IGP to get better parallelism out of L3 cache accesses and inter-IGP memory accesses. Finally, the Graphics Technology Interface (GTI) that joins the integrated graphics processor with the rest of the CPU is now capable of performing reads and writes at 64 bytes per clock.

Classifying objects in a scene by distance for variable-rate shading

While Nvidia’s Turing architecture might boast the first practical implementation of the ability to vary shading rates in a scene on a fine-grained basis, Intel points out that it invented the idea of what it calls coarse pixel shading. The company claims to have published a paper on the concept as far back as 2014. Now, that technique will be available to programmers on Gen11 graphics processors.

While Intel and Nvidia’s implementations of variable-rate shading likely differ in granularity, the point of the technology remains the same on Gen11 as it is on Turing: to avoid performing shading work that don’t result in appreciable increases in detail for parts of the scene that might not need it. Intel has so far implemented two techniques using CPS: a global coarse-pixel-shading setting and a radial falloff function that resembles foveated rendering. The company also notes that the algorithm is also available on a draw-call-by-draw-call basis.

The company’s demos of coarse pixel shading covered two potential ways the tech can be used. One was a synthetic, pixel-bound case where the software was choosing shading rate going on distance from the camera and using level-of-detail characterizations on a per-object basis. In this demo, employing coarse pixel shading offered as much as a 2x boost in performance, but the company admitted that this was a best-case scenario.

Intel also showed an Unreal Engine demo with the radial falloff filter it had developed. In that case, the improvement from CPS was closer to 1.3x-1.4x that of the base case without CPS. Like Nvidia, Intel says its coarse pixel shading API is simple and easy to integrate, so we’ll be curious to see how much adoption this technology gets and how developers might choose to use it in the real world.

Intel’s VESA Adaptive-Sync demo system in operation

Gen11 is the first Intel graphics processor with support for the long-promised and long-awaited VESA Adaptive Sync standard. Variable-refresh-rate displays are a mature technology at this point, but it’s still welcome to see relatively modest graphics processors like GT2 driving compatible monitors in a tear-free fashion. Intel also claims that its Adaptive Sync-compatible IGPs will include desirable features like low framerate compensation from the get-go.

Overall, the GT2 implementation of Gen11 and its promise of usable gaming performance, combined with its modern display features and likely-to-be-egalitarian positioning could introduce a broad audience to some features that only high-end graphics cards enjoy today.

Comments closed
    • BorgOvermind
    • 12 months ago

    So…
    In CPUs we get the 3rd promise of a functional 10nm.
    In GPUs we get from 5 generations behind in graphics to 4 generations behind by using cheating technique that sacrifice quality for performance.

    • DavidC1
    • 12 months ago

    We got 5-10% increase over Haswell with Skylake. Let’s go over Skylake’s changes.

    -Improved decoders, using better fusion
    -Better micro-op cache throughput
    -Larger out of order buffer, scheduler, instruction window, Int/FP register files
    -Improved branch prediction

    What did that give us? 5-10% average over Haswell according to Tech Report and Anandtech review. If we assume Broadwell got few % over Haswell, Skylake brought us 0-5% gains.

    [url<]https://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed[/url<] You guys want to tell me, that Sunny Cove is going to do better than this? It's 2019. If Zen 3 really performs 10-15% faster per clock compared to Zen 2, I assure you AMD will be in a much better position even against Icelake. It's also shameful how a Smartphone chip in Apple phones beat Skylake(and likely Icelake) per clock. Not just a little, but 30-40%. Intel needs another "Conroe" desperately because Sunny Cove is coming so late, its essentially Netburst. Considering how the company seems to be slowly falling apart, I'm not betting on another Conroe. At best Sunny Cove is good for another 5-10%. Sandy Bridge is where we got more than 15% gains. But it introduced *new* ideas like Physical Register files and uop fusion. Unless Sunny Cove has a really good trick up its sleeve prepare to be disappointed.

      • tipoo
      • 12 months ago

      Where’s the 30-40% from? Vortex is impressive but I’ve seen more like 15% better IPC.

        • DavidC1
        • 12 months ago

        Anandtech’s SPEC tests.

        [url<]https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-silicon-secrets/4[/url<]

          • DavidC1
          • 12 months ago

          What Apple is doing is extremely impressive. They make Intel vs AMD comparisons irrelevant.

          Let’s not forget Skylake-in-a-phone would have been impressive(and I’m not talking about Intel’s half-assed attempts). Apple has Skylake + 5 years in a phone.

            • Anonymous Coward
            • 12 months ago

            Too bad most of the readers here have already moved on! Not so many comments.

            Is it fair to compare IPC of a design which is optimized for lower clocks vs one optimized for higher clocks? (I do note that its a bit odd to wonder if we are being fair to a [i<]desktop[/i<] processor when comparing to a mobile processor.) Also, if Apple is doing so good, that core should really be sold for use in servers. Those chips are typically not especially high clocked. AWS is starting to offer ARM servers. If Apple thinks its not kicking enough butt in mobile, then lay seige to the datacenter. Also, even if they don't, one has to assume their ARM competitors will get there eventually, harvest that revenue, and use it to finance designs that challenge Apple in mobile. (Unless the ARM licensing model and ecosystem discourage that sort of thing.)

            • tipoo
            • 12 months ago

            That’s why I’m really interested in seeing how they scale to higher wattage parts with active cooling. There’s a few short pipeline’ed designs that still post non-SIMD IPC competitive with modern high end parts, but they don’t scale high in clocks. Vortex isn’t so short that that should be an issue though, in fact again it’s fairly similar to Intel.

            What I keep seeing is a uarch that’s way too ambitious for a phone and won’t stay only there forever…

            • Anonymous Coward
            • 12 months ago

            Does seem like a bit of a waste of effort to make an outstanding design and then sit on it. That said I don’t see why they should switch IPC on desktop, they can’t possibly beat Intel by enough to make it worthwhile. I don’t believe in magic.

            • tipoo
            • 12 months ago

            I don’t believe in magic, but strongly in market forces, and the situation is a bit of a mirror of Qualcomm. They and Intel both sell silicon to other vendors, those vendors have cost requirements that are balanced with Qualcomms margins, and push things to certain chip sizes and thus core complexities. Apple doesn’t sell chips, they sell products wrapped around those chips that sell for higher prices and higher profits per unit, so if it costs them a couple of bucks extra in extra silicon, they haven’t minded.

            Vortex is already a wider core than the newly widened Sunny Cove…Now if they go and tailor the chip even more for desktops and laptops, feed it more wattage, etc…

            Then even apart from that, there’s decoupling themselves from Intel roadmaps, look at all the crap they took from everyone who doesn’t understand Intel still does not support LPDDR4 and won’t until Ice Lake, for one example, or being able to choose how many PCI-E lanes they get by themselves for another, bringing ARMs variable length SIMD to choose for themselves which product AVX-512-width simd is in rather than Intel segmentation, etc etc.

            • Anonymous Coward
            • 12 months ago

            The magical thinking that I am concerned about is expecting anyone to posses engineering or fab prowess to significantly outperform any competent competitor. Apple might well make a world-class desktop CPU, but so what? Perhaps they can gain an edge through integration of specialized components which suit their needs, they can make their own CPU, GPU, software stack up to and including their own web browser… it would be amazing if taking on the world that way made sense.

            Also, I’m not impressed by all the excitement about [i<]execution width[/i<].

      • Waco
      • 12 months ago

      Running at 2 GHz versus 4+ GHz is a very big difference in chip and logic design. :shrug:

      IPC comparisons only really work when you’re comparing similarly clocked parts. You can be 30% better at 2 GHz but if your competitor is running 2.5X the clock rate…well…you do the math.

        • tipoo
        • 12 months ago

        Conversely, Intel has the same architecture from those 4+GHz chips running in lower wattage Y series parts too (ok, some of them boost there now, but for seconds, and regularly run closer to 1.6ish), so those don’t get any IPC enhancements that may have been gleaned from a dedicated architecture for those clock targets.

          • Waco
          • 12 months ago

          Exactly! If Intel (or AMD) decided to make a super fat low-clocked chip, it would not be hard to demolish these numbers given their superior expertise.

          There’s just not a whole lot of point in doing so until they’re forced to (for power or other reasons).

            • Anonymous Coward
            • 12 months ago

            I’ve been hoping that AMD doesn’t try to match Intel exactly head-on, and rather aims at slightly lower clock speeds and lower FP throughput, in trade for a healthy gain in efficiency. Seems like a better strategy to me, a lot of the market can go for that.

    • DancinJack
    • 12 months ago

    ugh @ all the “welp they needed AMD guyz to actually build real stuff” – where are all the PA Semi fanbois? And Apple fanbois? Tesla fanbois? no S3 fanbois?

    Just because these two particular people worked for AMD in the past doesn’t mean AMD has some claim to them. Stop it.

    • Sahrin
    • 12 months ago

    It’s telling that when Intel wanted to get serious about architecture, they hired two of the most prominent AMD execs it could find.

    That should tell you everything you need to know about whose products you should buy for the next 3+ years.

      • DancinJack
      • 12 months ago

      Ahhh yes, the good ol’ “AMD guys.” The only ones that know how to build performant CPUs.

      Sorry, Sahrin, but like a few other people that said something similar in these comments, you’re wrong.

      • chuckula
      • 12 months ago

      Bonus points for every post about how it’s great that AMD dumped Raja because Vega sucked so bad (while simultaneously destroying Nvidia) and how Jim Keller has been Stalinistically retconned to have had no involvement with Zen or the original Athlon 64 after he went to Intel.

      • techguy
      • 12 months ago

      lol, yeah, buy Zen 2 with its 13% IPC improvement instead of Sunny Cove with a 25% or more improvement (and higher clocks).

      BUT MOAR COARS! Great for servers, meaningless for laptops and most desktops.

    • blastdoor
    • 12 months ago

    After reading more at anandtech, I come away thinking that the probability of Apple dumping intel in macs just went down a bit.

      • chuckula
      • 12 months ago

      blastdoor is less likely to confirm stuff…. CONFIRMED!!

      • tipoo
      • 12 months ago

      I was thinking about that reading through it too. It certainly narrows the width gap, though Apple still goes a fair bit wider, not sure about instruction reordering and stuff like that.

      I would have been very interested in digging into a desktop grade large-er ARM core from Apple, but if Intel gets good enough by that point that Apple decides it would not be different enough to warrant the switch, hey, we win either way it goes. But a switch would still have key benefits even if they’re tied for one generation, decoupling themselves from Intels roadmap, and losing Intel margins (which admittedly Tim will probably not pass on to us)

    • srg86
    • 12 months ago

    Very excited about Sunny Cove. Reading this took me back to days of the ’90s and ’00s of reading about the next cores and how they would get wider and deeper.

    I notice the comments about it just being an expansion of Sandy Bridge. But Sunny Code, Sandy Bridge, Conroe etc are really just expansions (and reworkings) of the P6 from the Pentium Pro (1995). That does not curtail my excitement to see real architectural enhancements. With this, P6 will have gone from a 3-issue, to a 4-issue and now to a 5-issue core.

    For me, as a CPU geek, easily the most interesting tech news of 2018.

      • Captain Ned
      • 12 months ago

      Now if one could only drag Hannibal out of retirement.

        • chuckula
        • 12 months ago

        If he’s coming out of retirement then I’m buying stock in Chianti and fava beans!!

        (I’ll be [i<]arsed[/i<] if I got that reference)

      • setaG_lliB
      • 12 months ago

      I always thought that Yonah/Core Duo was the last true P6 descendant, as it was basically a dual-core Pentium M, which itself could be considered a Tualatin with a giant cache and quad-pumped bus. Wasn’t Core 2 a completely different beast, with it’s 64-bit support and much faster SIMD units?

    • Amiga500+
    • 12 months ago

    Sorry lads, our new Core Frozen Stream is delayed as the 50 picometre process required to make it use less than 500W is delayed till approximately 2143.

    Architecture design cannot happen in a bubble.

    An example of an architecture that came before the process needed was 65nm Barcelona. The 2MB L3 cache really killed it. Then 45nm Shanghai was very competitive against Harpertown.

      • jihadjoe
      • 12 months ago

      Sure, but architecture doesn’t need to be as closely married to process as it is with Intel right now.

      David Kanter explains why it’s a good idea to have a little bit of separation between arch and process:

      [url<]https://www.youtube.com/watch?v=629r1Ud4Cro[/url<]

    • DavidC1
    • 12 months ago

    I don’t find Sunny Cove interesting, its an expansion that continues since Sandy Bridge. So far, no new ideas presented since then. The good bits are Gen 11 and Lakefield with Foveros.

    Gen 11 GT2 has 64EUs and 8x 3D Samplers. That’s essentially Gen 9 GT4 with 72EUs and 9x 3D Samplers. While the bandwidth is lacking on Gen 11 GT2 without the eDRAM, its improved lossless compression and tile-based rendering, along with a general improved architecture will significantly reduce the gap. Since they claim near 2x the performance with Gen 11, its going to get rather close to how Gen 9 GT4e performs. It’s not so much that Gen 11 is amazing, rather Gen 9 GT4e was a disappointment.

    Also, not to be confused with PowerVR’s Tile Based Deferred Rendering or TBDR: Intel/AMD/Nvidia’s Tile rendering is tile based immediate mode rendering. Intel actually used tile based immediate mode rendering back in the pre-GMA X3000 days. Extreme Graphics/Extreme Graphics 2, and GMA 900/950 used this technique. They abandoned it starting with the unified shader architecture chips in the X3000.

      • DavidC1
      • 12 months ago

      Some of you may be thirsting for any changes on the Intel side after they were tripping over themselves for 4 years, but I’m not.

      They are merely doing what they should have been doing. We should have got Icelake in 2017. Now we have leaks on Comet Lake-U! If they can’t make -U chips properly this year then it won’t be until next.

      4 to 5 wide issue? More LEA support? Double store units? It’s nice, but that’s what, good for maybe 10% gain max like we had with Skylake? Doesn’t the A12 of this year beat Skylake by 30-40% per clock?

    • NTMBK
    • 12 months ago

    Did anyone see the bit on Anandtech about the Intel big.little chip? (I presume Jeff will add something about it tomorrow.) One big Core (presumably Icelake?) and 4 little cores (presumably Tremont), and a 64EU GPU, implemented with the CPU/GPU die directly stacked on top of the PCH die. Sounds like an interesting little chip!

    • tipoo
    • 12 months ago

    Looks like a lot of outlets are messing up reporting on the GPU claim.

    Iris Plus was already near 1Tflop, just crossing that would not be impressive. What IS impressive is they weren’t talking about the successor to that part – they were merely talking about GT2. GT3e will scale beyond that, and if I understand, another unannounced part with another slice beyond that.

    • BigTed
    • 12 months ago

    Adaptive sync on the Gen 11 IGP seems like a pretty big deal. Will be interesting to see how this pans out for Gsync.

      • jensend
      • 12 months ago

      People keep thinking adaptive sync is for the ultra high end, but its biggest benefits are clearly in situations where the GPU can (at least mostly) stay above 30fps but cannot maintain a consistent 60fps. It will be a game changer for Intel IGPs, and I think it’s clear DP adaptive sync will become the norm even on inexpensive monitors.

        • Anonymous Coward
        • 12 months ago

        It does seem like Intel screwed up [i<]significantly[/i<] by not getting that feature out sooner. Its amazing that some form of adaptive refresh rate not just an industry standard on everything already.

          • DancinJack
          • 12 months ago

          How did they screw up? What penalty did they pay?

            • Anonymous Coward
            • 12 months ago

            You can of course not find a specific penalty, nor could we find a specific bonus/penalty for any other of the many small things Intel did to remain competitive. But the success of Intel and the entire PC ecosystem depends on continuous improvement and progress in small areas, of which adaptive refresh must count as a pretty significant deal. Smoothness if a big deal, everywhere, all situations.

          • Zan Lynx
          • 12 months ago

          Adaptive refresh is the standard on most PC sales, which are laptops with Intel iGPUs driving the screen. Adaptive Sync is a slight modification of eDP, with Panel Self Refresh, which Intel nearly *invented*.

          What I think held Intel back is crappy video driver software. Doing a good adaptive sync implementation is a bit tricky because it doesn’t want to display each frame immediately, quite. It wants to spread it out some and make it a bit more even with frame pacing. That turns out to not be a super easy thing.

            • tipoo
            • 12 months ago

            Hope that means they didn’t bung up the drivers then.

      • tipoo
      • 12 months ago

      I think the IGP part of this is underhyped. I had a soft spot for my Iris Pro 5200, the year it was released it was the little IGP that could, though game requirements quickly overloaded it after.

      But almost every non-dedicated PC publication got the GPU wrong – the exciting bit isn’t crossing 1Tflop, the Iris Plus was already sitting just under there. The exciting part is they were talking about their ho hum GT2 part crossing 1Tflop, with GT3 and a new four module part to exceed that.

    • WhatMeWorry
    • 12 months ago

    Strange photo. Looks like Keller is eulogizing Noyce. Maybe there holding a seance with him to get their mojo back.

      • derFunkenstein
      • 12 months ago

      I thought he was modeling his pants, or he’s about to correct something stupid somebody else said. “Well here’s where you’re wrong…” 😆

      • DavidC1
      • 12 months ago

      That could just be the angle Tech Report took. I don’t see that in other sites. And, what a strange comment.

    • drfish
    • 12 months ago

    [proper noun], you [adjective], [code name] is almost here! You should have [verb].

      • dragontamer5788
      • 12 months ago

      [quote<][proper noun], you [adjective], [code name] is almost here! You should have [verb]. [/quote<] [Dr. Fish], you [colorful], [Ice Lake] is almost here! You should have [run]. Am I doing this right?

        • drfish
        • 12 months ago

        I know “idiot” is a noun, but it’s also a description, so I just went with it. Feel free to take whatever liberties with the formula you please.

          • dragontamer5788
          • 12 months ago

          That’s a good point. Lemme try again.

          [[url=https://en.wikipedia.org/wiki/Buffalo,_New_York<]Buffalo[/url<]], you [[url=https://www.merriam-webster.com/dictionary/buffalo<]buffalo[/url<]], [buffalo] is almost here! You should have [[url=https://www.macmillandictionary.com/us/dictionary/american/buffalo_2<]Buffaloed[/url<]]

          • Wirko
          • 12 months ago

          Dr. [proper noun], don’t buy things!

      • KeillRandor
      • 12 months ago

      Syntactics is all well and good, but where’s the semantics? 😛

      • jihadjoe
      • 12 months ago

      while ([code name] <> ‘DDR5’) print “Meh”;

      • Krogoth
      • 12 months ago

      Krogoth is impressed by this code

      • Usacomp2k3
      • 12 months ago

      I love me some madlibs. They make for great road trip fun.

    • blastdoor
    • 12 months ago

    Presumably Intel has previously married design to process because there’s an advantage to doing so, not just to increase risk needlessly. By separating design and process, is Intel giving up some of the advantage of being an IDM? Is ceding that advantage the best way to respond to the 10nm fiasco? Could they just be a little bit more cautious in their manufacturing goals, and always have a ‘plan B’ ready in development on a proven process just in case the new process doesn’t unfold as planned?

      • techguy
      • 12 months ago

      It shows flexibility, something that is always difficult for large corporations to achieve. Intel’s own history has a rather notable example of this in Netburst. Yes, it’s likely true that there will be less optimization for clock speed as a result of this move, but it doesn’t have to be a permanent change either.

      • psuedonymous
      • 12 months ago

      Designing for process means you get the first ‘optimisation’ step as your first step, and get it faster. If you design a process-agnostic architecture, then you’re faced with either releasing a product with the bare minimum tweaking for that process, or waiting a good year to optimise for that process.

      In the run-up to 14nm that worked out pretty well, but with 10nm delayed due to process scaling issues its left them backporting bits of their 10nm design to 14nm piecemeal.

      • Anonymous Coward
      • 12 months ago

      Yeah that was my thought too, they must necessarily give up some optimizations for the flexibility, and that plan does little to answer the nagging question of how long Intel can justify standing alone.

        • blastdoor
        • 12 months ago

        I wonder if intel would be worth more as two separate companies— design and manufacturing

      • Klimax
      • 12 months ago

      Not necessary. I read it as they will co-design two different options: one dependent on new process and one for current process sharing maximum possible.

      ETA: potions->options

    • DPete27
    • 12 months ago

    Thank you Intel.
    This holiday season was difficult to hold off on my upgrade from Ivy Bridge i5. The real kicker was that Z390 boards were/are still unnaturally expensive since they launched only weeks before Black Friday. Looks like next season will be a good one.

    • rauelius
    • 12 months ago

    Anyone else getting Williamette/Bulldozer vibes?

      • DPete27
      • 12 months ago

      No
      Intel has a much stronger hold on the software side of things.
      Both Intel and AMD learned from Bulldozer.

        • Kretschmer
        • 12 months ago

        I wouldn’t say that Bulldozer was about having a “hold on software.” You have to align CPU design with the problems of the day.

      • derFunkenstein
      • 12 months ago

      No. In those designs “deeper” referred to the length of the “pipeline” that an instruction traveled to get executed, which was a lame way to add MEGA HURTS at the expense of performance. Mis-predicted branches were particularly crippling. Here, “deeper” is about the cache structure, giving a larger (deeper) pool of resources to pull data without going all the way out to system RAM.

    • jarder
    • 12 months ago

    Maybe it’s a sign of the times, but I found the more-than-likely-fake Ryzen 3000 leaks much more interesting than this corporate snoozefest exercise in investor reassurance.

      • techguy
      • 12 months ago

      So massive increases in single-thread performance over what is already the industry’s highest levels isn’t exciting to you?

        • jarder
        • 12 months ago

        define massive

          • techguy
          • 12 months ago

          25%+ with a trivial software re-compile. Last time we saw that was Sandy Bridge. It’s easy to increase performance by 50% over an already crap architecture coughBulldozercough, it’s a whole different matter when your basis for comparison is Skylake.

            • jarder
            • 12 months ago

            So what’s the percentage going to be without a re-compile? i.e. for all of the software that’s already been created.

            • techguy
            • 12 months ago

            Depends on how much instruction-level parallelism is left on the table in existing code. Again, a re-compile is a trivial matter (compared to a full-blown re-write). Literally amounts to checking a box and clicking a button (in an IDE anyway). Ever download an update for a program that includes a new executable? That was re-compiled.

            • jarder
            • 12 months ago

            If a re-compile was so trivial, windows would have a big shiny “recompile” button.

            • techguy
            • 12 months ago

            Equating an Operating System re-compile with an application re-compile is disingenuous, or ignorant.

            And yes, the actual act of re-compiling an application is trivial, as I described. It’s the testing and distribution that take a bit more effort.

            • jarder
            • 12 months ago

            I never mentioned operating system recompilation, I meant that if it was so trivial then the operating system would be able to recompile applications easily, they can’t.

            And No, the actual act of re-compiling an application is not trivial, what you described is clicking on a checkbox. I find this highly disingenuous and insulting to the efforts of developers. Knowing which compiler options to use these days is a big problem considering the massive number of options available, never mind how different options can interact with others. And you call that trivial. At least you paid some lip service to the fact that applications require testing, it can take a lot of work to come up with different test suites to test the validity and performance of non-trivial applications.

            • Klimax
            • 12 months ago

            I remember reading about Microsoft’s tool for binary-level optimization. Unfortunately since then I couldn’t find it again.

            • Action.de.Parsnip
            • 12 months ago

            Ignorant is thinking there’s +25% performance ‘just like that’. Are you seriously saying that widening the machine width by that much will make it the same amount faster? It’s fantasy land thinking.

            • dragontamer5788
            • 12 months ago

            Note that AMD Zen has a uOp issue of 6, but it is slower than Skylake’s current uop issue of 4x.

            Sunny Cove will [b<]certainly[/b<] be faster, but do not expect +25% speeds because of the increase from 4x -> 5x dispatch. EDIT: Since most code is cache-bound or memory-bound, the biggest advantages are to good uncore designs (ie: Ringbus latency, RAM latency) as opposed to core improvements. Faster cores are always welcome of course, but we're long past the point where improving the core results in dramatic improvements in practical speed.

            • Antimatter
            • 12 months ago

            Skylake also dispatches up to 6 uops.

            • Antimatter
            • 12 months ago

            Where do you get the 25%+ figure?

            • thx1138r
            • 12 months ago

            It sounds like you must not have developed much software. There is a very large number of things that can go wrong with any “trivial software recompile”.
            For example, the version of one of the very many software libraries you were using has been deprecated for some security problems. Do you A) look into the security problem to see if it is safe to continue using it because you are using part of the library that is unaffected, or B) do you look into adapting your software to use the latest version of said library. Both answers are a potential rabbit warren of knock-on problems.

            • Redocbew
            • 12 months ago

            Aside from maybe the OS its self I have yet to find a type of application which can scream and die quite like a compiler does.

            • Laykun
            • 12 months ago

            If it were trivial you’d find a lot of commercial products providing updates of “compiled with new version of compile for performance improvements”. You don’t though, because compilers are really picky complicated pieces of software. If all software was open source you could at least crowd source fixes for popular programs, but it’s not and the financial incentive for recompiling to give the user better performance is generally unfavourable for a launched product.

            Also, a recompile will likely not affect most commercial applications that regular people use because most of them don’t use intel’s fancy instruction sets, the general purpose improvements are probably what’s more relevant to most people.

            • psuedonymous
            • 12 months ago

            [quote<]If it were trivial you'd find a lot of commercial products providing updates of "compiled with new version of compile for performance improvements". You don't though, because compilers are really picky complicated pieces of software. [/quote<] You don't, but only because that's generally a worthless update note and will be rolled in under the catch-all "performance and stability improvements" that footnotes essentially every nonemergency patch note.

            • techguy
            • 12 months ago

            A re-compile IS trivial IN COMPARISON TO A RE-WRITE. Which is the only point I was making in that particular reference.

            Anyway, it’s not THE POINT. The point is that Sunny Cove will bring substantial performance boosts to virtually all software. I only mentioned re-compiling as a CYA. It is NOT necessary to see substantial performance gains with the new architecture, but could allow for even larger gains.

            Straight from the architect himself:
            [url<]https://www.youtube.com/watch?v=0s6zMQgkjGs[/url<]

            • Redocbew
            • 12 months ago

            Regardless of whatever efforts may be involved I’m not sure it’s ever happened on anything but an isolated basis. The idea that some change in hardware could inspire developers across the world to all recompile their software all at once might be a good selling point(if you’re not an engineer, I guess), but it’s not realistic.

            • techguy
            • 12 months ago

            Cool story bro. Completely missed the point. Maybe try reading past the 2nd sentence.

            • jarder
            • 12 months ago

            Watched that, Intel guy didn’t mention a 25% improvement. Any chance you can share where you got this information?

            • techguy
            • 12 months ago

            An understanding of CPU architecture. This is the biggest change since Sandy Bridge. Not hard to put 2 and 2 together.

            • Laykun
            • 12 months ago

            So your source is speculation, great.

            • jarder
            • 12 months ago

            You don’t even have some probably-fake “leaked” benchmarks to back up your assertion!

            And I thought only AMD fanbois engaged in wildly optimistic speculation, times are changing…

            • Laykun
            • 12 months ago

            It’s not trivial vs. say … doing nothing though? That’s what you’re competing with here, not a re-write. Just leaving the code as it is and benefiting from general purpose performance improvements is what the vast majority of the market does for previously mentioned reasons.

            • DavidC1
            • 12 months ago

            Compare architectural changes between Haswell and Skylake. We got 5-10% out of that.

            Now compare Sunny Cove over Skylake. That’s going to be another 5-10% gain, and maybe that’s on a good day.

            Comparing it to Sandy Bridge is laughable. Sunny Cove brings nothing new. It puts more of what it already had in previous architectures.

      • chuckula
      • 12 months ago

      You are totally right!

      For the same reason that Lord of the Rings was more exciting than cashing my tax refund check.

        • jarder
        • 12 months ago

        At least somebody understood my post…….

        ….. Although I would hope that your tax refund check has more actual details than this waffle-fest.

      • thx1138r
      • 12 months ago

      I found it interesting that they used the 2 ex-AMD guys to deliver the message that Intel has learned from it’s recent mistakes.
      Are they tying to borrow some of AMD’s newly re-found execution competence?

        • jarder
        • 12 months ago

        Nah, it more like they are using the two new guys because everybody else has been tainted by spreading the “10nm is coming soon” lie and wouldn’t be believed.

        • Srsly_Bro
        • 12 months ago

        It really takes AMD to make Intel succeed. AMD made Intel break the law by conspiring with OEMs. AMD made the imc, x86_64, made proper chiplets(not kentsfield trash). AMD also reinvigorated Intel. Intel is the fat guy on the couch and AMD is the life coach.

        I’m sure i missed a few things.

          • chuckula
          • 12 months ago

          AMD forced Intel to adopt AVX-512. They don’t call them [b<]AMD[/b<] vector extensions for nothing!

    • Unknown-Error
    • 12 months ago

    Love to see David Kanter dissect this new microarchitecture. Not a Conroe but this look like a Sandy Bridge like jump by Intel.

      • techguy
      • 12 months ago

      You need to get DK on another podcast.

        • chuckula
        • 12 months ago

        DK is on YouTube with GamerNexus discussing this event (he was there).
        It would be great to have him on with TR too.

          • techguy
          • 12 months ago

          lol, was going to say “before Gamers Nexus” in my last comment, too late.

            • jihadjoe
            • 12 months ago

            lol Scott Wasson was on Gamers Nexus discussing frame times!

      • chuckula
      • 12 months ago

      In order for there to be a Conroe like jump you need to start with a P4 base… Intel is happy to avoid having to be in that situation.

      • WhatMeWorry
      • 12 months ago

      I listened to Kantor’s entire talk at [url<]https://www.youtube.com/watch?v=629r1Ud4Cro[/url<] He got most excited at the tail end of the discussion by of all things: optane. Here's a rough transcript: Optane DIMMs in pretty large capacities so that would be super exciting and I bet they would be monsters to benchmark. this is just something I was thinking about casually is kind of a wild and crazy idea but you know I think on my system I'm currently running with something like maybe a 3 or 4 hundred gigabyte SSD for all my application in OS and you know I have a bunch of hard drives for holding media and other things but you know you can get a terabyte of octane DIMMs no problem so could actually just have a PC where everything sits in octane and you know if the latencies call it 300 nanoseconds for everything that's a lot better than microseconds we're used to it for an SSD. ... it's permanently in memory right so you know how fast is Windows booting then.

        • hansmuff
        • 12 months ago

        Thanks for the transcript.

        I don’t know, are people really concerned with Windows boot time anymore? Just about any SSD will make Windows 10 boot pretty damn quickly.

          • synthtel2
          • 12 months ago

          RAM training takes longer than the whole rest of my boot process put together.

          • Krogoth
          • 12 months ago

          Optane memory is awesome but its practical utility is limited to enterprise/HPC world. There’s no killer mainstream application where a HDD let alone run of the mill SATA SSD media are woefully inadequate as far as boot and loading times are concerned.

            • WhatMeWorry
            • 12 months ago

            Maybe I’m overweighing disk i/o, but isn’t there always lots of disk i/o going on in the background of a running PC even after it boots? So wouldn’t replacing SSD secondary memory with Optane at least make one’s machine incredibly responsive?

            • Krogoth
            • 12 months ago

            Somewhat cheap and plentiful DRAM defeats that purpose for mainstream rigs and workloads.

    • ronch
    • 12 months ago

    The only thing I’m really seeing here is that these are two former AMDers who jumped ship to make a much stronger competitor even stronger.

      • derFunkenstein
      • 12 months ago

      If you can’t afford to keep your talent, you will lose it.

        • Srsly_Bro
        • 12 months ago

        That doesn’t explain why Raja left tho.

          • Goty
          • 12 months ago

          [conjecture]Raja left because he was tasked with the soul-crushing work of keeping an entire division alive for years on bare minimum funding and resources while they pumped money into Zen.[/conjecture]

            • Srsly_Bro
            • 12 months ago

            [speculation]^^

            • Goty
            • 12 months ago

            [synonym]^^

            • tipoo
            • 12 months ago

            Word is that 2/3rds of his Vega team was taken to work on a joint Navi/PS5 project with Sony, leaving him with a skeleton crew for Vega which he took the blame for. If Intel was ready to throw resources at their new GPU division I can see why it would be a tempting switch.

            [url<]https://www.forbes.com/sites/jasonevangelho/2018/06/12/sources-amd-created-navi-for-sonys-playstation-5-vega-suffered/#528d16ee24fd[/url<]

      • enixenigma
      • 12 months ago

      My understanding was that Keller was brought on specifically to design Zen. Once that was done, so was his role at AMD. Keller did a stint at Tesla before joining Intel. His situation is different from Raja’s.

        • tipoo
        • 12 months ago

        It’s pretty much Kellers classic move to move on once a project is in pretty good shape and almost ready to launch. Man seeks a challenge.

      • DancinJack
      • 12 months ago

      ofc that’s all you took from that.

      • K-L-Waster
      • 12 months ago

      AMD is company, not a country. And these guys are employees, not sworn defenders for life.

      • jihadjoe
      • 12 months ago

      Keller is way too good to stay at any one place twiddling thumbs. He was off seeking the next architecture almost as soon as the building blocks for Zen was done.

      • Srsly_Bro
      • 12 months ago

      That’s yet to be seen. Raja made a weak rtg weaker.

      • Amien
      • 12 months ago

      Keller is a superstar, don’t think any company can ‘claim’ him.

      • Vaughn
      • 12 months ago

      The only guy jumping ship is the Radeon guy.

      Jim keller never stays once the product he is hired to build is completed. It’s what brillant people do why stay to have a boss and someone telling you what to do? He calls his own shots.

      A simple look at his wiki shows he has worked for and left AMD numerous times.

      [url<]https://en.wikipedia.org/wiki/Jim_Keller_(engineer)[/url<]

    • tipoo
    • 12 months ago

    [quote<]Sunny Cove is a fundamentally wider core than most every Intel design since Sandy Bridge, as well, expanding from four-wide issue to five-wide and increasing the number of execution ports from eight to 10. Each of those execution units, in turn, is more capable than those of Skylake. Intel added a dedicated integer divider on port 1 to reduce latency for those operations.[/quote<] Interesting to contrast [quote<]Monsoon (A11) was a major microarchitectural update in terms of the mid-core and backend. It’s there that Apple had shifted the microarchitecture in Hurricane (A10) from a 6-wide decode from to a 7-wide decode. The most significant change in the backend here was the addition of two integer ALU units, upping them from 4 to 6 units. Monsoon (A11) and Vortex (A12) are extremely wide machines – with 6 integer execution pipelines among which two are complex units, two load/store units, two branch ports, and three FP/vector pipelines this gives an estimated 13 execution ports, far wider than Arm’s upcoming Cortex A76 and also wider than Samsung’s M3. In fact, assuming we're not looking at an atypical shared port situation, Apple’s microarchitecture seems to far surpass anything else in terms of width, including desktop CPUs.[/quote<] -AT

      • chuckula
      • 12 months ago

      A standard Skylake desktop part is already 5-wide although Icelake/SunnyCove appears to expand that somewhat further.

      If you want a ridiculously wide front end, don’t bother with Apple. Instead look at high end Power chips. They rely on hyperthreading very heavily to keep the front end from mostly sitting idle.

        • dragontamer5788
        • 12 months ago

        No kidding. Power9 SMT8 has 12x dispatch and 8x load/store units per core.

        EDIT: Each Power9 core is designed to run 8-way SMT (equivalent to Hyperthreading), which is why its so incredibly fat.

        ———

        But once again: width isn’t the whole picture. AMD Zen has 4x Integer Pipes + 4x Vector pipes + 2x Load/store units per core. IE: AMD Zen is already 10x execution wide, and it isn’t faster than Intel’s 7x wide execution pipe structure.

        There’s more to core-building than just widening execution pipes. Power9’s latency is 2 for simple operations (like ADD or XOR), which grossly drops its practical performance down (despite being wide as all heck).

          • ermo
          • 12 months ago

          well, AMD’s uArch *does* seem to scale better on workloads that are amenable to SMT?

        • tipoo
        • 12 months ago

        Oh sure. But I don’t have a Power9 in my pants, it’s just a little wild to think.

      • Anonymous Coward
      • 12 months ago

      Huh, had no idea Apple had gone so wide. Surprised that worked out from a efficiency perspective.

        • tipoo
        • 12 months ago

        They use such impressive width to keep clock speeds tamped down compared to competitors, and still wind up well ahead for it. There’s also that their little cores, hilariously and a little sadly, are already up at flagship Android core speeds from just 3 years ago, from the same review, so they probably have to wake up the big cores less often for day to day things compared to the other miniscule little cores of the industry.

        It’s actually even more impressive than it looks
        [quote<]Overall the new A12 Vortex cores and the architectural improvements on the SoC’s memory subsystem give Apple’s new piece of silicon a much higher performance advantage than Apple’s marketing materials promote. The contrast to the best Android SoCs have to offer is extremely stark – both in terms of performance as well as in power efficiency. Apple’s SoCs have better energy efficiency than all recent Android SoCs while having a nearly 2x performance advantage. I wouldn’t be surprised that if we were to normalise for energy used, Apple would have a 3x performance lead.[/quote<] It sounds like Intel has some good stuff going on here, wider, better GPU, so maybe if Apple does switch to in house chips sometime in or post 2020 it won't be a stomp like it would look like against today's 14nm+++ chips, but I'll still be very interested to see the details there.

    • chuckula
    • 12 months ago

    The “Sunny Cove” architecture is Ice Lake but the most interesting thing is that Intel has worked on decoupling the core architecture from the manufacturing process. This means these CPU cores are going to show up in the data center on 14nm in 2019.

      • Srsly_Bro
      • 12 months ago

      Don’t worry, ‘member 7nm is around the corner?

      • DavidC1
      • 12 months ago

      No, other slides show Cooper Lake having BFloat16, but Icelake using Sunny Cove.

    • tipoo
    • 12 months ago

    [url<]https://techreport.com/r.x/2018_12_12_Intel_talks_about_its_architectures_for_the_future/roadmap.png[/url<] New ISA? I'm assuming it wouldn't be so unceremonious and this means new instructions glued onto the existing, eh? Edit: Probably the AVX-512 extension, now that I've RAFO'ed?

      • chuckula
      • 12 months ago

      Here’s a handy Venn diagram: [url<]https://twitter.com/InstLatX64/status/969560033922035713[/url<] It looks scarier than it is: Icelake basically supports all of AVX-512 outside of a few very specialized instructions that were in the Knights HPC products.

        • Concupiscence
        • 12 months ago

        I really wish they’d create an orderly naming scheme for all these variations on AVX-512. Resorting to these rubrics to make sense of supported features is silly – even AVX-512A, B, C, and on down the alphabet would be preferable.

          • derFunkenstein
          • 12 months ago

          It’d be preferable only if each letter was a superset of the features before it (along with the new).

          But that diagram very clearly shows there are two sets of instructions that are at least partially exclusive.

    • Krogoth
    • 12 months ago

    Raja: “iVega will be this large……”

    • chuckula
    • 12 months ago

    Thanks Jeff! You started out with the client CPU that’s of most interest to TR’s readership but there’s more to come.

Pin It on Pinterest

Share This