AMD teases Zen 2 Epyc CPUs with up to 64 cores and 128 threads

At its Next Horizon event today, AMD gave us our first look at the Zen 2 microarchitecture. As one of AMD's first 7-nm products, Zen 2 will be making its debut on board the company's next-generation Epyc CPUs, code-named Rome.

According to AMD CTO Mark Papermaster, TSMC's 7-nm process offers twice the density of GlobalFoundries' 14-nm FinFET process. It can deliver the same performance as 14-nm FinFET for half the power, or 1.25 times the performance for the same power, all else being equal.

AMD is using those extra transistors to improve the basic Zen blueprint in at least two major ways. Zen 2 has an improved front-end with a more accurate branch predictor, smarter instruction pre-fetch, a "re-optimized instruction cache," and a larger op cache than its predecessor.

AMD also addressed a major competitive shortcoming of the Zen architecture for high-performance computing applications. The first Zen cores used 128-bit-wide registers to execute SIMD instructions, and in the case of executing 256-bit-wide AVX2 instructions, each Zen floating-point unit had to shoulder half of the workload. Compared to Intel's Skylake CPUs (for just one example), which have two 256-bit-wide SIMD execution units capable of independent operation, Ryzen CPUs offered half the throughput for floating-point and integer SIMD instructions.

Zen 2 addresses this shortcoming by doubling each core's SIMD register width to 256 bits. The floating-point side of the Zen 2 core has two 256-bit floating-point add units and two floating-point multiply units that can presumably be yoked together to perform two fused multiply-add operations simultaneously.

That capability would bring the Zen 2 core on par with the Skylake microarchitecture for SIMD throughput (albeit not the Skylake Server core, which boasts even wider data paths and 512-bit-wide SIMD units to support AVX-512 instructions.) To feed those 256-bit-wide execution engines, AMD also widened the load-store unit, load data path, and floating-point register file to support 256-bit chunks of data.

At the system level, Zen 2 also represents a major change in the way Epyc CPUs are constructed. Only the CPU core complexes and associated logic will be fabricated on TSMC's 7-nm process. To talk to the outside world, next-generation Epyc packages will feature an I/O die bound to as many as eight Zen 2 "chiplets," for as many as 64 cores and 128 threads per package. This I/O die will contain memory controllers, Infinity Fabric interfaces, and presumably as much other "uncore" as AMD can get onto this cheaper, more mature silicon.

Developing…

Comments closed
    • HERETIC
    • 11 months ago

    Talking Big CPU’s-How about a BIG cooler-found this on OCAU-
    [url<]https://www.overclockers.com.au/image.php?pic=images/newspics/9nov18/35.jpg[/url<] Made a few of those, back in my steel-fab days....................................

    • GatoRat
    • 11 months ago

    After seeing the benchmarks for the I7-9700K with no hyperthreading, I started wondering just how many transistors hyperthreading requires. With that in mind, if AMD removed hyperthreading from Zen 2 64-core, how many more cores could they add?

    • kuttan
    • 11 months ago

    Intel shills in nuts.

    • kuttan
    • 11 months ago

    AMD also displayed a single Rome processor beating [b<]two[/b<] of Intel's flagship 8180 CPUs in a rendering benchmark. No wonder Brian Krzanich was forced out of Intel.

      • Gadoran
      • 11 months ago

      I don t think a single customer will utilize that cpu for rendering.
      Anyway Intel will have up to 56 cores on socket with Xeon AP. Your explanation of Brian departure is broken.

        • kuttan
        • 11 months ago

        Rendering only showed its raw throughput performance which reflects its HPC, server workload performance. Intel doesn’t have any 56 core CPUs not even in their roadmap.

        • cygnus1
        • 11 months ago

        That 56 core socket, how many time more expensive do you think it will be compared to AMD’s 64 core socket??

          • Spunjji
          • 11 months ago

          As many times more expensive as makes Gadoran and similar folks inclined to think they bought the “premium” product. 😀

            • cygnus1
            • 11 months ago

            Lol, good point, gotta get’m in the feels.

        • Spunjji
        • 11 months ago

        Like the proverbial stopped clock, you might be right on the last point. They most likely kicked him out 50/50 due to his superbad management ethics and the awful optics of flogging the maximum allowable quantity of shares before the Spectre / Meltdown news broke.

      • Krogoth
      • 11 months ago

      Brian was forced out because he was the sacrificial lamb for entire 10nm woes. Intel simply gambled too much on 10nm process pulling through without too much hic-ups.

      They are now in jeopardy in losing their foundry advantage which is a massive strategic advantage.

      The security problems go far back and are the results of design oversights by engineers/architects (1990s). Hardware-level security wasn’t a massive concern when the underlying designs of these CPUs were being implemented.

        • DeadOfKnight
        • 11 months ago

        Well they’ve got 7nm in the works, unlike other foundries that are giving up the race. They could win back that advantage.

        • freebird
        • 11 months ago

        Krogoth wrote:
        They are now in jeopardy in losing their foundry advantage which is a massive strategic advantage.

        Fixed that for ya:
        They (Intel) are now in jeopardy in losing their foundry advantage which [s<]is[/s<] [b<]WAS[/b<] a massive strategic advantage.

    • gamoniac
    • 11 months ago

    Wait… I saw this ads on this article. Are you sure your marketing agent isn’t based in Moscow or St. Petersburg? : )

    [IMG]http://i67.tinypic.com/1078irs.jpg[/IMG]

    • Gadoran
    • 11 months ago

    This intersting SKU show clearly how bad are now TSMC yields, so bad that AMD was costrained to put nearly all the uncore on 14nm.
    LIkely the power consumption of this device will not be a record and there will be some effect on clock speed available and latencies.
    Looking at AMD struggles Intel does better to stay on an enhanced 14nm, the only manner to be able to supply the channels of server word. These crap fine processes are a damnation of God.

    Lets hope in an year or two the things will are better. For now standard Epyc on 14nm will be the bulk of AMD server production for a lot.

      • chuckula
      • 11 months ago

      Stop shilling!!

      Chiplets are miraculous!

      Integrated memory controllers are primitive Intel lies!

        • Gadoran
        • 11 months ago

        Yea sure! Uncore power consumption is around 50% of the total power draw.
        Don’t think AMD is happy of Taiwan technology…..not happy at all.

        • Srsly_Bro
        • 11 months ago

        Is that your alt or relative?

          • chuckula
          • 11 months ago

          All I know is that you’ll never find a single post by an AMD fanboy that was made after November 6 saying that integrated memory controllers are any good!

          THANKS AMD!

            • Gadoran
            • 11 months ago

            Yea, I call Rome emergency edition just to pull out something for investors.
            Bad latencies Festival for amateurs

            • cygnus1
            • 11 months ago

            Nah, they’re still good. But philosophical question. If the memory controller is still in the same package/socket as the cores, is it integrated???

            • Srsly_Bro
            • 11 months ago

            Minus 1 because i care!

            • BurntMyBacon
            • 11 months ago

            [quote=”chuckula”<]All I know is that you'll never find a single post by an AMD fanboy that was made after November 6 saying that integrated memory controllers are any good! THANKS AMD![/quote<] Integrated memory controllers are great! THANKS AMD! Note: As I don't consider myself a fanboy in any regards, I suppose this doesn't actually negate his statement. On another note, while AMD was the first to bring IMC to x86, the technology was not new or unknown to Intel. Also, Intel improved northbridge attached memory controller latency significantly during the Core2 era. I'd expect an on package controller could be even lower latency, but I can't see it quite matching their IMC. On top of that, IIRC, Intel's IMC is even lower latency than AMD's. I'm not sure it will make much difference to throughput heavy datacenter use cases, but I'm curious to its effect on gaming and other latency oriented tasks.

        • Spunjji
        • 11 months ago

        This is one of those bizarre posts where it looks equally wrong whether or not I assume it’s satire. You’re actually addressing the best candidate yet for a shill on this forum, so uh, good call?!

      • kuttan
      • 11 months ago

      By using smaller dies that can be produced cheaper and in large numbers is better than using big monolithic die that has poor yield and expensive to produce. AMD is pretty clever to use the best of what is feasible.

      • cygnus1
      • 11 months ago

      You’re not that good at the shill trolling. Watch chucula and krogoth for good examples.

      • BorgOvermind
      • 11 months ago

      What about the failure of intel with the 10nm porcess ?
      By next year, the advantages of AMD Server CPUs will be uncontested:
      – more cores
      – more memory channels
      – more PCIe lanes
      – costs a fraction of what Intel does
      – takes less energy to run
      – better scalability
      – more features
      – more flexible
      – functional process to build it on
      – process lead

      Intel will either have to lower prices significantly or give up datacenter market share.

        • BurntMyBacon
        • 11 months ago

        Overlooking the validity of some of these points and the lack of needed information to claim some of these points, I note that you did not actually claim “Performance” as an uncontested advantage. Oversight?

    • Anonymous Coward
    • 11 months ago

    Seems like just today AWS picked up Epyc [url<]https://aws.amazon.com/ec2/amd/[/url<], which IMO is a big deal for AMD. It might perhaps be hard to commit your business to buying X million $ of AMD over Intel, but [i<]renting it[/i<], now thats another matter. Low risk... try it and see, its excellent for AMD and its reputation. I think I'll go add some AMD hardware to the autoscaling configuration right now.

      • thx1138r
      • 11 months ago

      This is a huge deal for AMD, almost as big as the 7nm announcement itself.

        • Anonymous Coward
        • 11 months ago

        There can hardly be a better way to settle any performance differences between the platforms than make them both rentable, billed by the hour or even fractional second.

        Its not clear if AWS will let the “spot market” price drift much, I see that r5 (xeon) and r5a (epyc) are essentially price matched at the moment. They might set a lower limit to cover their own expenses, and perhaps those are about the same between the platforms. Also they don’t allow the prices to rise freely if demand outstrips supply, which is annoying and opaque, but in theory the spot market could allow the platforms to really fight it out, work-per-dollar.

        I note AWS chose to keep the CPU and memory aligned exactly between xeon and epyc, also the mount points for disks appear to match up, features appear to match up at first glace. So they made it super easy to switch or blend platforms in auto-scaling clusters.

        • ptsant
        • 11 months ago

        Bloomberg was talking about Amazon, not about 7nm stuff.

          • Anonymous Coward
          • 11 months ago

          “AMD shares gained as much as 8.8 percent on the announcement.”

    • psuedonymous
    • 11 months ago

    [url=https://i.imgur.com/2m0LqPW.jpg<]Did a quick comparison of Rome to Skylake-X XCC die[/url<] (yes, I remembered to subtract the MCC 'cores'). Proportional comparison only (not to scale), but that's a heck of a lot more uncore area, almost double the uncore-area per core.

      • Goty
      • 11 months ago

      I’m sure the difference in process tech makes up for some of that, but it kind of makes you wonder what else is lurking in the I/O chiplet. What does the ratio look like for first generation Zen cores?

        • psuedonymous
        • 11 months ago

        AMD claimed the reason for leaving the IO uncore on 14nm was that the PHY interfaces do not scale down well. With so many PCIe links (all the exposed PCIe busses, plus all the Infinity Fabric interconnects) the die is going to be mostly PHY. A move to 7nm may not shrink it much at all.

          • Goty
          • 11 months ago

          It’s obviously not a one-to-one comparison, but you could probably take some of Intel’s optical shrinks and do the same sort of analysis, comparing to stated theoretical density improvements, and get a rough idea of the scaling possible for these parts of the die.

          • tygrus
          • 11 months ago

          IO with off-chip requires larger die area per trace to allow wires to be attached and separated. They also require higher voltage and higher currents which smaller fabrication are much harder to achieve. If your going to do 14nm or larger structures with 7nm fabrication to try and cope with this, than you may as well just use 14nm fab and save money.

    • ermo
    • 11 months ago

    @Jeff

    Would you care to share your best educated guess on what this reveal means for the consumer version of Zen2?

      • thx1138r
      • 11 months ago

      I’d love to hear some more speculation about this too, this could potentially get very interesting.

      My take is that while it’s quite possible that higher-end AM4 chips could use MCM, i.e. an 8-core 7nm chiplet wedded to a small I/O hub, I don’t think budget chips will.At the same time it should be straightforward to have 2 different versions of the I/O hub, one with a GPU and one without. The first question that’s going to be asked of a MCM AM4 chip is what is the memory latency like, because games in particular are sensitive to memory latency. If the latency is not up to scratch the only way to get around it (for gaming) would be to include a big L4 cache on-package a-la the venerable 5775C, but that is starting to get complicated…

      If they don’t go down the MCM route then we’re going to have to wait for a new 7nm core, one that includes the I/O functionality on-chip.

    • jarder
    • 11 months ago

    Anand have a die shot:
    [url<]https://www.anandtech.com/show/13561/amd-previews-epyc-rome-processor-up-to-64-zen-2-cores[/url<] The I/O die looks bigger than the 8 CPU dies put together. The 8-core chiplets are reckoned to be about 73mm^2 according to some random twitter user. So with the old 14nm 8-core Ryzen at 213mm^2 (and that chip was approx 60% core (i.e. approx 127mm^2) and 40% uncore), that's some impressive process shrinkage considering all the extra transistors used in the wider AVX units and other improvements.

    • Unknown-Error
    • 11 months ago

    Serious question. I am a bit concerned about the separation of the I/O. People, what are the pros and cons of this? This is a major departure.

      • Klimax
      • 11 months ago

      We’re back to FSB-like stuff we got to see around Pentium D/Core 2 timeframe.

      Pros: Likely cheaper
      Cons: NUMA gets harder

      It looks like modern packaging of 80s SMP glue..

        • Amiga500+
        • 11 months ago

        How does NUMA get harder when you are making all processors on that socket share memory locally through the L4 cache?

        You’d think it’d get better as you aren’t having to use the main DRAM controller, but instead the cache snooper to process the request.

        Can you explain where your coming from?

        • cygnus1
        • 11 months ago

        We’re absolutely not back to “FSB-like stuff”. The only comparison to that era would be the IO module being similar to the north bridges of that era. But there is absolutely no shared bus. They’re still using infinity fabric switched communications. Also, for single socket, this makes NUMA dead simple, as in there will only be 1 NUMA node for the 64 cores, they will all be equal. And for multi-socket this will still make NUMA simple as it will mean 1 NUMA node per socket vs multiple for Epyc/Zen 1.

        Sorry Klimax, you’re pretty much 100% wrong.

          • just brew it!
          • 11 months ago

          There [i<]is[/i<] one way in which it is "FSB-like": it appears to be (at least based on what we know today) non-NUMA, like FSB-based systems. This could result in more predictable performance across diverse workloads. Edit: Actually, it has more similarities to the old Athlon MP SMP topology, which wasn't a classical Intel-style FSB architecture either. Each CPU socket had a dedicated link to the northbridge instead of using a shared FSB. With Zen 2 it's just all in one package. What's old is new again... 😉

            • Krogoth
            • 11 months ago

            Not to be overly pedantic, it is more like Alpha DEC chips back in the early to mid 1990 (AMD took Alpha/DEC’s IP and engineers back in the mid to late 1990s). 😉

            • freebird
            • 11 months ago

            Krogoth are you a Dirk Myer fan boy??? 😉

            • cygnus1
            • 11 months ago

            This thing is definitely going to be able to operate in dual socket servers, so I would never call it non-NUMA. And in those servers it’s very likely to have 2 NUMA nodes. Hell, we won’t know for sure how it’s NUMA configuration is built until it actually shows up. It’s entirely possible it could show up with a NUMA node for each chiplet. It would be super goofy, but it’s not impossible.

            But basically I don’t think it’s fair to call it non-NUMA because when you do that, then you’re kind of saying any single socket computer with only 1 NUMA node is “FSB-like”. But that’s that’s just 1 similarity. I mean, CPUs that ran on a FSB had branch predictors and other similar hardware to today’s CPUs too, but we don’t relate that to being “FSB-like”. I think the only fair thing to call it would be like a standard single socket memory arrangement vs what AMD did with Threadripper which was very odd in the grand scheme of single socket systems.

      • jts888
      • 11 months ago

      Cons: [list<][*<]extra ~2 pJ/bit power for transfers to a core vs. ideal case of direct-attached DRAM/PCIe device. (no difference vs. remote on-package die in Epyc 1 though) [/*<][*<]higher latency of currently unknown degrees for DRAM accesses vs. direct-attached setup. (same caveat as above, plus latency of this degree being irrelevant for peripherals) [/*<][*<]inter-die inter-cache transfers for cores need to make 4 total hops across the substrate compared to the typical 2 needed in a direct-attach topology like Epyc 1, with all the power and latency penalties that entails.[/*<][/list<] Pros: [list<][*<]cheaper to manufacture and (one would hope) for the end consumer. [/*<][*<]memory access can be interleaved over all 8 DRAM channels in a socket for higher burst read speeds and better channel bandwidth load balancing. [/*<][*<]a single probe table per socket can serve all cores, which can massively reduce probe traffic for cache line requests vs. Epyc 1's distributed design. [/*<][*<]it is conceivable to have PCIe controllers operating over x32 wide links for things like 200GbE/400GbE on-board controllers (some Mellanox ConnectX cards already have internal dongles to attach to a 2nd x16 slot) [/*<][*<]peer-to-peer peripheral traffic does not contend with cores' remote memory/inter-cache bandwidth, and peripheral-to/from-memory traffic does not contend with inter-core cache traffic. [/*<][*<]peripheral control logic can be developed and upgraded independently of core designs, in case a vendor wanted to introduce something like GenZ/CCIX/OpenCAPI coherent PCIe extensions[/*<][/list<] There are probably more factors than this of varying significance, but this is all I can think of off the top of my head.

        • cygnus1
        • 11 months ago

        [quote<] inter-die inter-cache transfers for cores need to make 4 total hops across the substrate compared to the typical 2 needed in a direct-attach topology like Epyc 1, with all the power and latency penalties that entails. [/quote<] They didn't actually draw connections on any diagrams. I don't think you can rule out the chiplets having direct infinity fabric connections to each other which would eliminate that particular bit of latency you pointed out.

        • Unknown-Error
        • 11 months ago

        😮

        Great post. Thanx a billion!

        • Spunjji
        • 11 months ago

        There’s the possibility of another Pro here, depending on market demand – a design not entirely unlike this could end up in a future console and/or another semi-custom project. AMD would thus only have to rework the IO section to match the reduced requirements, instead of rebuilding their CPUs from scratch and/or using a sub-optimal fully-integrated part for the job.

        • Waco
        • 11 months ago

        Excellent analysis.

        The x16 slot expansions for the early PCIe 3 ConnectX 6 cards are incredibly janky, though. I hope if the ConnectX 7 cards are still PCIe 4 that they spend a little longer designing it.

        I should have some Rome hardware in-hand as soon as they release a few samples. Whenever the NDA expires I’m happy to run random benchmarks and/or post initial impressions.

        • Krogoth
        • 11 months ago

        I suspect the move is also motivated by the mobile market. AMD is planing to release a single Zen2 die + Navi-I/O die for the mobile/mainstream desktop SKUs to level the playing the field in the mobile/mainstream desktop space a.k.a making iGPUs ubiquitous on those platforms.

          • jts888
          • 11 months ago

          I had not considered yet that AMD would make integrated graphics standard across the entire line of Zen 2 AM4 part, but that is a very good point you raise.

          I do hope that there is actually some reasonable differentiation in the IO/graphics coprocessors though, and it would be especially nice if they found a way to get HBM in some parts, although AM4 TDP limits might limit processors needing that much bandwidth, and it might require something semi-exotic like EMIB to accomplish.

      • Krogoth
      • 11 months ago

      Moving the I/O away from the direct CPU packaging was inevitable for server and HPC markets.

      Intel is going to do the same thing with their own EMIB project that is currently under the wraps for their server/HPC-tier SKUs.

    • Unknown-Error
    • 11 months ago

    256-bit AVX2 in an AMD CPU? Finally!! So I guess we will have wait until Zen 3 for 512-bit.

      • jarder
      • 11 months ago

      Nah, I say hold off on the 512-bit until it becomes mainstream, there are precious few applications where it is of a significant benefit over 256-bit. Besides, Intel currently have 7 different types of 512-bit support (https://twitter.com/InstLatX64/status/918796987352408064)
      of varying degrees and types, so better to let it settle down before dedicating massive numbers of power burning transistors to such a marginal technology.

      • ptsant
      • 11 months ago

      If you need 512-bit AVX then you should be looking at GPU compute I think. Is there a lot of consumer software (not academic HPC stuff) that actually uses 512-bit AVX? Games? Media applications? I have no idea.

        • blastdoor
        • 11 months ago

        [quote<]If you need 512-bit AVX then you should be looking at GPU compute I think.[/quote<] Not necessarily. It depends on how separable the matrix operations are from the rest of the program. In some contexts they are pretty hard to separate. My impression (though I don't know for sure) is that people aren't able to move MCMC algorithms (part of Bayesian analysis) to GPUs because of this separability issue.

    • ronch
    • 11 months ago

    I’m guessing the reason why AMD decided to separate the uncore from the CCXs is because the uncore may be holding clockspeeds back. If you look at the die of current Ryzens you’ll see that there are many weird structures that look like blobs and splotches, which I would assume were designed extensively using high density libraries. In order to make the Zen 2 cores clock better AMD probably redid many of the core’s plumbing by hand.

    I could be wrong of course, but anyone who can explain those weird-looking structures is welcome to do so.

    • DragonDaddyBear
    • 11 months ago

    Wasn’t memory latency an issue depending on the die that was processing vs die with the memory plugged in? Would this cause more uniform but higher latency?

      • Waco
      • 11 months ago

      Memory controllers talking to each other are quite different than separate dies talking to a common memory controller…but yes. Slightly higher latency (in the best case) but far more uniform latency.

    • maxxcool
    • 11 months ago

    I like this direction. It uncouples the damn CPU ‘just enough’ from everything else that i expect *SERIOUS* clock speed gains from the cores.

    I also like how they are only focusing 7nm where it needs to be. The cores.

    External Cache/pcie/controllers of every kind can all sit on SUPER WIDE bit paths and suck as much low latency bandwidth as needed without having to resort to hard to manufacture transistors and high clock speeds ruining your thermals.

      • chuckula
      • 11 months ago

      5 Ghz 64 Coar Epyc…. CONFIRMED!

        • maxxcool
        • 11 months ago

        pssshaww… 10ghz is doable according to Intel..

      • maxxcool
      • 11 months ago

      Also .. I love that were back to external chipsets and memory controllers LOL!

    • Mr Bill
    • 11 months ago

    My unadorned unpunned comment…
    This is fabulous!
    oops, not on purpose!

      • chuckula
      • 11 months ago

      You should glue on some extra compliments there.

        • Mr Bill
        • 11 months ago

        like green-antired! But why not use all eight?

    • blargh4
    • 11 months ago

    That IO die seems huge. Wonder what’s taking up all the area on that thing?

      • Krogoth
      • 11 months ago

      Memory controller + PCIe controller that is driving up to 128 lanes.

        • Mr Bill
        • 11 months ago

        All PCIe lanes lead to Rome…

      • blastdoor
      • 11 months ago

      Maybe HBM?

        • Waco
        • 11 months ago

        Not on a 14nm logic chip – but next to it down the road? Perhaps.

      • dragontamer5788
      • 11 months ago

      1. 14nm I/O die means it is on an older process.

      2. Maybe a [url=https://en.wikipedia.org/wiki/Clos_network<]Nonblocking CLOS Network[/url<] ?? The I/O chip will need the 8-CPUs (on-package), 8-memory-controllers, and an unknown number of connections to the 8-CPUs + 8-memory-controllers on a 2nd socket. I'm guessing a total network of 8-CCX(local), 8-memory controllers (local), and at least 8-remote links (8-remote memory links on a 2nd socket), and like 128x PCIe lanes if they just want to match EPYC 1's design(128GB/s to I/O). 3. It wouldn't be out of the question for a chunk of the chip to be a L4 cache.

        • chuckula
        • 11 months ago

        Considering how huge it is, I would expect an L4 cache of some type in there.

        There’s no evidence of HBM though, unless what Lisa Su held up isn’t a full device.

          • Waco
          • 11 months ago

          This. Memory controllers, some kind of caching, lots of PCIe4 lanes (128 is what the socket will allow), etc.

          I remember hearing rumors of built-in 10 Gbps NICs as well, but you’d think they’d harp on that pretty aggressively if so (and it would save precious PCIe lanes).

            • Goty
            • 11 months ago

            I thought the dual 10GbE NICs were a known thing in Zen? As in they already exist in Epyc/Ryzen parts but aren’t enabled.

            • Waco
            • 11 months ago

            I lost track of whether they were officially announced or not. 😛

            • Goty
            • 11 months ago

            Kind of makes you wonder if they actually work.

    • thedosbox
    • 11 months ago

    Does this new multi-die approach mean we can expect a cat *and* a kitten on the Ryzen 3 unboxing video?

    • Zizy
    • 11 months ago

    IO on 7nm likely wouldn’t introduce much worse yields – but why not save some money and maximize 7nm wafer capacity by putting IO on a cheaper process you are obliged to use anyway.

    I didn’t believe rumors about IO die till this announcement – I expected similar configuration to 2x 2990wx with half the dies without memory, or perhaps 4x (4x 4C CCX) instead.
    Lisa is quite brave to make a second big bet even before the first one panned out, but it seems to have worked out. Now I wonder what comes next – another crazy idea? Making that chip BE the substrate for the rest (active interposer)? They are going forward MUCH faster than I expected.

    Having 8 chiplets, if AMD kept 32 PCIe/chiplet, this would lead to 256 PCIe 4.0 lanes – quadruple bandwidth vs now. Quite insane and even opens doors to fully connected 4P and partially connected 8P configurations with very high interconnect bandwidth. That would give Intel the run for money – 8x 64C is a beast Intel cannot really counter. Though even reduced 16 PCIe/chiplet would enable 2P with double interconnect bandwidth (needed for 2x cores) – good enough for the meat of the market, though 8x 28C will beat 2x 64C more often than not, at a much higher cost though.

      • Waco
      • 11 months ago

      PCIe is in the IO die, not the chiplets. Further, since this is socket-compatible, we’re left with 128 PCIe lanes regardless (though at PCIe 4 in new boards).

        • cygnus1
        • 11 months ago

        I hope that’s the case. I’ve seen conflicting analysis so far though. Will not be great for a lot of use cases if PCIe is from the chiplets and has extra latency for PCIe transfers between cards plugged into different roots. If it’s from the IO chip, and it basically functions like a giant PCIe switch too, that will be great.

          • Waco
          • 11 months ago

          I’m looking forward to it – centralizing the memory and IO channels is a huge win for programming simplicity and efficiency.

          • jihadjoe
          • 11 months ago

          Makes no sense for PCIe to be on the chiplets when the MC is on the IO die.

        • Zizy
        • 11 months ago

        Yes I know. I just said that even if it was in the chiplets, yields wouldn’t be really much worse – but AMD would still get significantly fewer parts due to increase in chip size. Not that good for AMD that needs to maximize what TSMC can offer. Now, add WSA and 14nm is the obvious choice, while perhaps Zen3 could upgrade to 7nm+ cores and 7nm IO as the WSA is supposedly being amended/removed.

        As for 128 PCIe, well, it could be an updated socket as well. As in, it would work in the old socket just fine at 128 PCIe 3.0, but the updated socket would instead offer 256 PCIe 4.0 by using some of those reserved pins (if there are enough of them).

      • Goty
      • 11 months ago

      [quote<]IO on 7nm likely wouldn't introduce much worse yields - but why not save some money and maximize 7nm wafer capacity by putting IO on a cheaper process you are obliged to use anyway. [/quote<] Now now, there's no room for any of that sort of logic here! The only possible explanation is that AMD couldn't get it to work on 7nm!

      • Eversor
      • 11 months ago

      I imagine that is an improvement reserved for mid-generation improvements down the line.

      AMD is now only making money and this was designed when they were still hemorrhaging cash. The development resources were very limited and the I/O chip is one less expense. I expect them to speed up development from now on.

      On another note, news from around the web mention that 7nm is quite costly to develop for. Costs seem to have increased 3x for 7nm vs 16nm. So that is a pretty big investment for an I/O chip that will probably be specific to HEDT and server.

      • Chrispy_
      • 11 months ago

      Yeah, I really like where this is going in the server market.

      I’m also disappointed that none of these big steps forward are really relevant to the desktop products that I’ll end up spending my own money (rather than someone else’s money) on, but if rumours about a 13% IPC gain over Zen+ are true, then even a ‘traditional’ Zen2 desktop chip is going to remain desirable unless Intel truly do pull something radical out of the bag, and I think they’re too big to have not leaked such a beast by now.

      • TheRazorsEdge
      • 11 months ago

      New processes offer better performance, but initially they are more expensive and have higher defect rates. This design lets AMD use cheap 14nm fabs where it won’t really affect performance.

      AMD probably has some remaining obligation to use Global Foundry as a result of their spin-off agreement a few years back. GF cancelled development of nodes beyond 14nm, so they are not a viable foundry for high-performance CPUs anymore.

        • Gadoran
        • 11 months ago

        No, this design mean bad yields only this.

        The I/O on 14nm will heat a lot of power vanishing nearly half the benefit of the new process. Looks like they realized TSMC process is not a big deal after all. Suboptimal at the best.

          • Waco
          • 11 months ago

          You really don’t know what you’re talking about. This design was taped out long before any information about 7 nm yields was even in existence.

            • Gadoran
            • 11 months ago

            Are you so naive? Bad yields on 7nm is a well known thing since three years ago with the first test wafer.
            Anyway this solution is so bad for power consumption that in pratic they are on parity with a copy exactly SKU on Intel 14nm++.

            I can see a Xeon AP 56 cores out in the same time frame and comparable thermals.

            Only in 2020 they will finally have a full 7nm+ device, uncore included.

            • Spunjji
            • 11 months ago

            Yields so bad that only Apple would use them. Oh, and HiSilicon. And nVIDIA. And AMD’s graphics division. Bad yields. The worst!

            /derp

            • Waco
            • 11 months ago

            Screams of naivety coming from the one who is clearly clueless. Priceless. 🙂

            Enjoy yourself, I guess.

            • Goty
            • 11 months ago

            [quote<]Bad yields on 7nm is a well known thing since three years ago with the first test wafer. [/quote<] I'm not sure if you know this (on second thought, I'm fairly certain you don't), but that's kind of what happens with process technologies. Yields start out low and improve over time with tweaks to the process.

          • Goty
          • 11 months ago

          As I told chuckula, even if that’s true (and I challenge any of you to provide proof that it is), TSMC is still getting better yields than Intel is at 10nm…

            • Gadoran
            • 11 months ago

            Lol, forget this

            • Spunjji
            • 11 months ago

            Accidentally down-voted you and can’t undo it. Y’know, in case you’re wondering why the world has gone mad and purple is orange.

          • Zizy
          • 11 months ago

          Nah, you are wrong. Even if yields are 100% on the 7nm process, it would STILL make sense to keep IO stuff on the older 14nm process to get more out of the 7nm wafers.

          Assume AMD gets 10k 7nm wafers/month, each can house 400 chiplets or 200 IO dies; and that AMD additionally can get 5k 14nm wafers/month for 150 IO dies on each (other 14nm wafers are used for APUs and stuff).
          (I didn’t bother estimating die sizes and actual number of chips from the wafer, but it isn’t needed for this illustration anyway)

          Having IO on 14nm, you can get 8x500k EPYC chiplets/month and 750k EPYC IO dies, so you can make 500k EPYC chips. If you put IO on 7nm instead, you use 4 wafers for chiplets and 1 for IO, leading to 400k EPYC chips in the end. At same total costs due to breaking the WSA (so AMD would end up paying for those unused 5k wafers anyway).

          The moment AMD decided having chiplets with external IO is the way to go, it was probably also decided said IO will be on an older process to maximize use of the fresh and expensive 7nm wafers (the best and likely most expensive process available right now). It just doesn’t make sense to have it on the most expensive process.

            • Gadoran
            • 11 months ago

            This has not common sense. You forget the huge power consumption advantage of a I/O module on 7nm. Suicidal move, now the Intel idea to wait for better yields have a common sense. This SKU is not a great concern at all for Intel.
            They will go on 10nm when AMD finally will have a real 7nm server cpu. 2020.

            • kuttan
            • 11 months ago

            Since Intel can’t deliver their 10nm for ages due to poor yields, AMD at least deliver vastly better performing CPUs that uses combinations of process nodes that is most feasible.

            • Spunjji
            • 11 months ago

            You’ll need to provide some evidence for this “huge” power consumption advantage you’re banging on about. I highly suspect it’s not on the scale you think it is, but you’ll note that I say “highly suspect” because just like you I have no numbers.

            • ptsant
            • 11 months ago

            Also, I don’t know to what extent they still have to buy some GloFo wafers after their revised agreement, in which case it would make sense to use some of that capacity.

    • DancinJack
    • 11 months ago

    Hey guys, go vote!

      • Srsly_Bro
      • 11 months ago

      Already voted for AMD, bro.

        • DancinJack
        • 11 months ago

        that’s what i like to hear, bro!

      • blastdoor
      • 11 months ago

      I live in a state where it doesn’t make a difference, but I voted anyway.

        • DancinJack
        • 11 months ago

        it still matters! good job!

        • willmore
        • 11 months ago

        There’s always local elections that matter. School board, etc.

          • DancinJack
          • 11 months ago

          this gerbil knows what’s up!

      • CScottG
      • 11 months ago

      -been there, done that.

        • DancinJack
        • 11 months ago

        awesome!

      • Mr Bill
      • 11 months ago

      We voted for AMD last night.

      • jihadjoe
      • 11 months ago

      I voted for chuckula!

      • Wirko
      • 11 months ago

      But! I can’t downvote! Not every voting system is as decent as TR’s.

        • Mr Bill
        • 11 months ago

        I do like the ARStechnica system of showing a net followed by plus’s and minus’s. I think TR does not do this because that would just encourage more score keeping.

    • cygnus1
    • 11 months ago

    I wonder if this arrangement will make all the cores on the future Threadripper consumer parts have equal performing access to main memory. That would be a nice improvement there.

      • cygnus1
      • 11 months ago

      [quote=”From Andandtech”<] Since the memory controller will now be located inside the I/O die, all CPU chiplets will have a more equal memory access latency than today’s CPU modules. Meanwhile, AMD does not list PCIe inside the I/O die, so each CPU chiplet will have its own PCIe lanes. [/quote<] So I guess I was right assuming the same type of IO module makes it into the next gen TR. The PCIe thing is really interesting though. It means that there will likely be 8 or 16 separate PCIe roots in a fully loaded Epyc 2 server. And thinking about the big monster servers with many GPU type cards for AI/ML work card to card transfers could be pretty high latency if they have to go through the IO module.

        • Waco
        • 11 months ago

        No. Anandtech is wrong I believe – PCIe is absolutely, 100%, going to be in the IO module. On-chip from chiplet to IO module is so dramatically below the latency threshold that matters for PCIe that it makes no difference at all.

          • cygnus1
          • 11 months ago

          I meant the higher latency would come if making multiple hops from core chiplet, to IO module, to other core chiplet to get to another PCIe device. If it’s all on the IO module, I think it’s going to be pretty awesome. This will honestly wipe out most of the performance issues Zen 1 had.

            • Waco
            • 11 months ago

            Ah, gotcha. Sorry for the misunderstanding! I’m 99.999% sure the IO module holds the PCIe root complex.

    • Kretschmer
    • 11 months ago

    I’m happy for anyone who needs massive thread counts but find these CPUs pretty boring for general use cases and gaming.

      • Jeff Kampman
      • 11 months ago

      *pats his racks of gaming serv-* wait, what?

      • chuckula
      • 11 months ago

      Cinebench is the only game anybody cares about you Intel shill!

        • Mr Bill
        • 11 months ago

        As the renderer for Crysis!

      • ptsant
      • 11 months ago

      The IPC improvements (prefetch, branch prediction, latency, AVX256) should translate quite well to the gaming scenario. AMD doesn’t need to win all single-threaded scenarios, they only have to significantly close the gap with the 8700K and 9x00K.

        • K-L-Waster
        • 11 months ago

        Back of a napkin example: a hypothetical 3700X that narrows the IPC gap between the 2700X and the 9900K by 66% but keeps a similar price point and thermals to the 2700X would be a pretty attractive gamer chip.

          • Srsly_Bro
          • 11 months ago

          I think your napkin is defective. First define baseline. Ryzen 1 is similar to Broadwell. Ryzen 3 is supposed to be at least 15% increase in ipc. Calculate bdw to 9900k and there is your answer. I didn’t use napkin but elementary reasoning.

          Where do you get your numbers from?

            • Waco
            • 11 months ago

            [url<]https://cpugrade.com/a/i/articles/cbr15-ipc-comparison-3ghz-scores.png[/url<] [url<]https://digiworthy.com/2018/06/14/amd-ryzen-intel-coffee-lake-ipc/[/url<] Appears to be a little closer than you're implying, at least for quite a few workloads.

            • Srsly_Bro
            • 11 months ago

            Thanks! I’ll read and revise my post.

        • Mr Bill
        • 11 months ago

        (+3) This. Yah.

    • synthtel2
    • 11 months ago

    Aw, I was hoping they wouldn’t double-up on SIMD power. At least it’ll probably do a lot to close the gaming IPC gap with Intel.

    Any word on reduced-rate support for AVX-512 instructions? They probably aren’t so easy to handle as 256-on-128 was because AVX-512 has a ton of register space.

    • Shouefref
    • 11 months ago

    What about the timeline?

      • derFunkenstein
      • 11 months ago

      “2019” according to Anandtech’s live blog. [url<]https://www.anandtech.com/show/13547/amd-next-horizon-live-blog-starts-9am-pt-5pm-utc[/url<]

      • NoOne ButMe
      • 11 months ago

      probably 1H2019 for EPYC for the Super7, 2H2019 for general market.

    • chuckula
    • 11 months ago

    Separate chips for IO?

    Thanks Intel [url<]https://techreport.com/news/33292/intel-stratix-10-tx-fpga-hooks-up-to-58g-transceivers-with-emibs[/url<] AMD should look into copying EMIB too.

      • Goty
      • 11 months ago

      I guess Intel couldn’t get those transceivers working on 14nm?

        • chuckula
        • 11 months ago

        Wrong as usual. They are all on 14nm but the I/O transceivers are actually stacked silicon that’s connected to the main FPGA with EMIB. That’s why in Stratix most of the silicon is for the FPGA instead of the IO.

          • Goty
          • 11 months ago

          [quote<]Wrong as usual.[/quote<] Entirely possible, but not according to the report from Anandtech: [url<]https://www.anandtech.com/show/12477/intel-launches-stratix-10-tx-leveraging-emib-with-58g-transceivers-[/url<] [quote<]While the central FPGA is built on Intel’s 14nm process technology, the transceivers will be built upon TSMC’s 16FF process, due to Altera’s history of using TSMC for its analog hardware.[/quote<] Thanks for playing, though. *EDIT* Here's another source, in case one more than you provided isn't enough: [url<]http://www.eejournal.com/article/intel-fpga-hits-its-stride/[/url<] [quote<]Intel says these transceivers, fabricated on TSMC 16nm silicon, are the lowest-power transceivers they’ve produced to date.[/quote<]

            • chuckula
            • 11 months ago

            Lovely. However the central chip in there that most certainly is made on 14nm has 30 Billion transistors to implement that FPGA. To put it in context, that’s 50% more than the largest Volta chips. So go ahead and call Intel a failure because they have the capability to build something just a little bit bigger than a smartphone SoC.

            • Goty
            • 11 months ago

            Hey, I’m just using the logic you applied to AMD fabbing their I/O chiplet on 14nm. [i<]Obviously[/i<] the only reason they'd do that is because they weren't capable of doing it on a leading edge process, right? *EDIT* Also, I like how you changed subjects so quickly after being presented with evidence of the correctness of my claims.

            • Spunjji
            • 11 months ago

            Arguing with chuck is like riding a greased hog; you won’t see much in the way of practical results but it sure is fun in a farmyardy sort of way.

            • Goty
            • 11 months ago

            It’s my favorite sport.

            • freebird
            • 11 months ago

            That Chuckula’s MO or SOP.

            SOP:
            1) Present some outrages claim.
            2) Get challenged on facts.
            3) Change the subject: usually throws in that if AMD doesn’t have AVX-512 across product line by 2018 you will buy him whatever he wants.

      • NoOne ButMe
      • 11 months ago

      What advantage does EMIB bring that can be shown?

      Because Intel hasn’t shown one in a shipping product, yet.

      It lets you reduce Z-height (Kaby-lake G) packages is the claim… But Vega Mobile achieves the same 1.7mm Z-height as Kaby-lake G. Without EMIB.

      Maybe in 2019 or 2020 we’ll see an EMIB advantage, but until then, AMD’s packaging technologies seem to be just as good.

        • chuckula
        • 11 months ago

        1. EMIB is massively cheaper and more efficient than a huge silicon interposer that requires TSVs.

        2.. AMD managed to get mobile Vega that thin only by hacking the PCB to push the entire huge silicon interposer way down into the plastic. Funny how that type of outright kludge is celebrated around here but an MCM that only needs two dies to deliver 48 cores and 50% more memory bandwidth than Epyc 2 is demonized pretty viciously for standard bigot upthumbs.

        3. Look at how much power Epyc burns in the interconnect and realize tha the magical 7nm miracle silicon from TSMC isn’t fixing that problem overnight.

          • Goty
          • 11 months ago

          [quote<]3. Look at how much power Epyc burns in the interconnect and realize tha the magical 7nm miracle silicon from TSMC isn't fixing that problem overnight.[/quote<] Do you mean how AMD's Infinity Fabric consumes roughly as much of the chips power budget as Intel's fabric interconnect for the same number of cores? (source: [url<]https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review/4)[/url<]

            • psuedonymous
            • 11 months ago

            You may want to actually read that link you posted ([url=https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review/4<]non-broken link[/url<]). Scroll all the way to the bottom and check out the power consumption for the Epyc 7601. With 6 through-substrate IF interconnects (vs. the 8 of the new chiplet layout) fully [b<]HALF[/b<] of the package power is spent on those interconnects.

            • Goty
            • 11 months ago

            I did read it, but I think maybe you didn’t read my post very closely. I said [b<]per-core[/b<] AMD was using roughly the same amount of the chip's power budget as Intel's mesh interconnect, meaning I wasn't talking about the 7601 mentioned in the article. If we want to make that comparison, you have to find something more capable than the 7980XE in core count and number of memory channels, meaning the power used by the interconnect would necessarily need to grow just as it does with AMD.

            • psuedonymous
            • 11 months ago

            Check again, and do your maths.

            For 2 cores loaded:
            7980XE: 56.88W core to 13.77W uncore, or 4.31:1 core:uncore power
            7601: 7.92W core to 66.21W uncore, or 0.12:1 core:uncore power

            For 36 cores loaded:
            7980XE: 139.34W core to 38.87W uncore, or 3.58:1 core:uncore power
            7601: 79.95W core to 92.23W uncore, or 0.87:1 core:uncore power

            For the 96 thread ‘best case’ for the Epyc 7601: 89.09W core to 85.73W uncore, or 1.04:1 core:uncore power

            There is no way to slice it that Epyc does not expend an outsized amount of power on through-substrate PHY links. That’s more than can be explained by just 4 extra DIMM channels (Anandtech tested with 4x8GB on Intel and 8x8GB for Epyc).

            • Goty
            • 11 months ago

            I should have specified that when I said per core I was also indicating that the 2950X should be used in the comparison. The 2990WX and Epyc 7601 will necessarily use significantly more power for their interconnects because there are significantly more components to hook up to the fabric, and each component does not necessarily enable just one more IF link. Find me a comparable Intel part and measurements to go with it and then we can have the discussion you’re trying to bring up.

            • psuedonymous
            • 11 months ago

            Even if you want to ignore Epyc, then the numbers don’t add up for the 2990WX either:

            For 2 cores loaded:
            7980XE: 56.88W core to 13.77W uncore, or 4.31:1 core:uncore power
            2990WX: 20.63W core to 59.06W uncore, or 0.35:1 core:uncore power

            For 36 cores loaded:
            7980XE: 139.34W core to 38.87W uncore, or 3.58:1 core:uncore power
            2990WX: 115.28W core to 65.37W uncore, or 1.76:1 core:uncore power

            For the 64 thread ‘best case’ for the 2990WX: 109.25W core to 61.32W uncore, or 1.78:1 core:uncore power.

            Even at best, that’s less than half the core:uncore ratio achieved (even ‘per core’ as you say).

            • Goty
            • 11 months ago

            You’re looking at the wrong charts. The numbers you quote for the 2950X are still for the 2990WX. The “uncore” of the 2950X draws around 25% of the total power at 32 threads whereas the “uncore” of the 7980XE draws around 22% of the total power at 36 threads.

            I made a quick and dirty plot of the correct data just so there’s no more confusion: [url<]https://i.imgur.com/k2gpxbI.png[/url<] It's quite clear to see that Intel has a marked advantage at low thread counts due to design differences (AMD has to power up nearly all of their IF links in order to address all of the memory from any particular core, but that issue should go away with Zen 2), but that difference disappears almost entirely by the time you hit just six threads, and nobody is buying these parts to run just two or three cores at full load. Above that six thread number, AMD is more efficient at lower thread counts and Intel takes the lead when things are fully loaded, but the differences are never more than a few percent.

            • psuedonymous
            • 11 months ago

            C&P error corrected. I used the 2990WX as the 2950X is a worthless standin for Rome. It has a single through-substrate IF link, whilst Rome has 8. It will show you a mere 12.5% (at most) of the power consumption of those links.

            • Goty
            • 11 months ago

            :thumbsup: Okey dokey.

          • Waco
          • 11 months ago

          I think if you’re going to start arguing power efficiency you should take a closer look at how efficient Epyc is compared to many Intel skus…

          • NoOne ButMe
          • 11 months ago

          1. no proof that is public to my knowledge. Nor anyone besides Intel claiming this. If you can get another company to do so, or a public proof, I will accept it.

          2a. Er, dunno who is demonizing it. At least I’m not.
          2b. This 48C product did not look very compelling before AMD’s demo. It looks less compelling after.
          2c. Specifically for the “big talking point” of Deep Learning, it’s NUMA memory pool will make it worse than EPYC2.

          3. The interconnect burns power isn’t directly related to the usage of MCM versus EMIB. EMIB isn’t proven in a public product as being able to get two dice as complex as the ones on EPYC or EPYC2 as they did with HBM2 and the “vega” die on Kaby Lake G.

          As I understand it the power has a lot more to do with IF itself, and it wouldn’t differ greatly on an interposer.

          • NoOne ButMe
          • 11 months ago

          oh, and EPYC2 also supports more memory per socket. 4TB! Probably gonna cost a ton, but even with two die glued together… Intel is still “only” at 3TB/socket.

          • freebird
          • 11 months ago

          So if it is so much better than silicon interposer tech, where are all the products based on EMIB?

          • jts888
          • 11 months ago

          EMIB or interposers are only needed for I/O densities in the realm of 1000s of lines per cm, which allows HBM but is overkill for CPU architectures. First-gen IF used something like 2 pJ/bit, or ~1.5 W per die link / ~10 W per Epyc socket for absolute saturation levels of traffic, which is again not exactly a critical issue.

          Cascade Lake AP is a joke because it effectively squeezing a quad-socket platform into a dual-socket form factor, including socket pin density, QPI/DDR4 trace density, and thermal density. The exact same 4 pieces of XCC silicon would be far better utilized in a quad-socket setup with 2DPC for the exact same bandwidth and twice the memory capacity along with higher thermal headroom/clocking for each chip. Since it is by necessity a completely separate platform from any of Xeon by virtue of its one-off socket/motherboard designs, it will be in its own pricing range excepting the unlikely scenario that Intel is willing to take a huge cut in margins.

          Cascade Lake AP is going to use the same fat UPI serdes for intra-socket traffic as it does for the other 2(?) inter-socket links, since it is not going to be a specialize die, so EMIB is not even an option there, if you were even holding the belief that it would be a good or necessary feature in this product’s design.

            • tygrus
            • 11 months ago

            “~10 W per Epyc socket” is wrong.
            Upto 92W for IF power for the EPYC 7601 and typically half of the overall power budget.
            As mentioned by Goty (NOV 6 08:21 PM) above, [url<]https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review/4[/url<] Look towards the bottom of the page and compare it to the Intel 7980XE closer to the top. Even the 32core TR 2990WX has IF max power around 67W (typically around 60W). IO fabrics are using significant power and can which rise exponentially with core/endpoint counts. The actual energy used to add/sub/mul/div is tiny compared to the decoding/control/cache/read/write/IO.

            • jts888
            • 11 months ago

            We are talking only about the difference in the PHYs between EMIB and over-substrate IF, where these numbers are accurate, whereas you are talking about everything outside the cores, including PCIe, DDR, etc.

            Ian’s uncore power analysis was fairly provocative, but he never really went out of his way to demonstrate that the power registers he polled provided a true apples-to-apples comparison.

            • Goty
            • 11 months ago

            Again with people making the 7980XE to 2990WX/7601 comparison. That’s apples to oranges. The 2990WX and Epyc 7601 both have (significantly) more cores, more PCI-E lanes, etc. that have to be included in the IF power budget than the 7980XE has components that must be hooked up to it’s mesh interconnect. If you compare like-to-like, or as close as you can get with those parts, and compare the 7980XE to the 2950X, we see that the portion of the package power devoted to the interconnects are roughly the same between the two parts. The 7980XE comes out ahead in the comparison because it’s got two more cores than the 2950X to hook up and each core talks over the mesh interconnect (as opposed to each CCX in the TR part), but the difference is not exceptional.

          • Spunjji
          • 11 months ago

          1) True! EMIB is great.

          2) …what?!

          3) *removes tape, re-inserts* …what?!

      • Mr Bill
      • 11 months ago

      Are those HBM on those EMIBers?

      • kuttan
      • 11 months ago

      Intel paid shill doesn’t liked what AMD going to do with Zen 2. Quite natural. :DD

    • ptsant
    • 11 months ago

    So the IO module is like a little motherboard-on-a-chip. Nice.

      • cygnus1
      • 11 months ago

      well, more like they’ve integrated an old school northbridge chip.

    • Srsly_Bro
    • 11 months ago

    02:33PM EST – Cray benchmark: One socket Rome scored 28.1 seconds, Two 8180M 30.2 seconds

    Posted from Anandtech’s live coverage.

    Intel needs that arm miracle chip more than ever.

      • chuckula
      • 11 months ago

      Cinebench or GTFO!

        • Srsly_Bro
        • 11 months ago

        I mean Intel just talked about their 56 core CPU yesterday and today, Dr. Lisa Su is like, “sup?”

          • Eversor
          • 11 months ago

          Yes… they benched against a 2×28 core platform, much similar to what Intel announced, which mostly excludes micro-arch improvements.

          Cascade Lake-AP will also be more expensive to fab and you will see the cost reflected to the end customer.

      • ptsant
      • 11 months ago

      But the iPad Pro is faster in GeekBench.

        • DancinJack
        • 11 months ago

        End User is now hacking other gerbil accounts. Reported.

    • derFunkenstein
    • 11 months ago

    This is probably the best way to get as much out of TSMC 7-nm as possible. Small die size should mean that you’re getting lots of tiny chips out of a wafer, so defects claim a lower percentage of the chips, should something happen. Some of that savings will get eaten by assembly costs, though.

      • Goty
      • 11 months ago

      I wonder what fraction of the total cost of these comes down to assembly anyhow? Seems like they might have kept it pretty cheap, getting away with the usual substrate rather than something like an interposer.

        • derFunkenstein
        • 11 months ago

        I dunno.

        They’re doing the exact opposite with the MI60 GPU as they’re doing with Epyc 2. 13.2 Billion transistors in 330 square millimeters.

          • NoOne ButMe
          • 11 months ago

          GPUs are much more recoverable.

          And there isn’t a good solution for MCMing GPUs together yet.

          The GPU also seems to have very relaxed design. Only gaining about 1.55x more density, low compared to AMD’s own claims of 2x density, and far below TSMC’s claims of 3.3x.

          That may help with yield, or maybe it’s purely a play to get higher clocks/low power. (Power, Performance, Area, Cost)

            • derFunkenstein
            • 11 months ago

            All true things, but I’m a little surprised there hasn’t been some sort of development of a front-end/memory controller that talks to multiple GPUs. I mean, I’m probably showing a ton of ignorance here (which wouldn’t be the first time) but I’m sure there’s research being done into the topic.

            • NoOne ButMe
            • 11 months ago

            Nvidia and AMD have published whitepapers or at least done some announcement of research on it i believe.

            Although iirc those were supercomputer scale

      • cygnus1
      • 11 months ago

      I think that change will also boost their ability to die harvest as well. I think we’ll see more models that have only 6 or 7 cores working per “chiplet” in this generation. That right there should help the bottom line a good bit as well I would think.

    • chuckula
    • 11 months ago

    Now that AMD confirmed that they are using the 8 core modules I welcome the apologies for the downvotes when I was the first person to mention that design here.

      • derFunkenstein
      • 11 months ago

      I didn’t downvote you at the time, so I did it now instead. Is that OK?

        • chuckula
        • 11 months ago

        Sure!

        Incidentally, downthunbs from the AMD crowd who attack me for being better informed about what AMD is actually up to than they are double my revenue stream. It’s in my contract.

          • Growler
          • 11 months ago

          Downvotes are red, while upvotes are green. Maybe the AMD fans are choosing to salute you with their favorite color rather than supporting the green team.

          • Srsly_Bro
          • 11 months ago

          What comes after triple-agent? You got everyone fooled here.

          • derFunkenstein
          • 11 months ago

          I only downvoted because it’s fun. Presumably it’s why people downvote me. 😀

        • Mr Bill
        • 11 months ago











        Oh! The [size=20]humanity![/size] EMIB’ery!


        deleted because it was lame, put back by negative demand.
        Gimmie your negative votes!

      • Krogoth
      • 11 months ago

      PREACH THE TRUTH BROTHER!

      REVEAL AMD’S MASSIVE GLUE EATING BINGE!

      • blastdoor
      • 11 months ago

      I’ll give you credit — you were right. I just gave you three upvotes, for what it’s worth!

        • Waco
        • 11 months ago

        His pessimism was lucky, though. Many were hopeful for a 16 core Ryzen-esque die and they went a different route. It still means 16 core Ryzen parts could be a very near thing if they go MCM and maintain economies of scale from having only one 8 core die to fab.

        (side note, I was privy to this last year but it’s been fun stirring the pot)

          • chuckula
          • 11 months ago

          A 16 core die was logical considering all the 7nm hype. If TSMC’s 7nm is really twice as dense as GloFo’s rather anemic “14nm” process used for Zen, then a full 16 core die should only be about 200 mm square. Not particularly big for a process that’s being touted as years and years ahead of Intel, especially when AMD claims that its GPUs on 7nm are more than 50% larger than that.

          Nice downthubs, but let’s take this a step further with Intel’s supposedly “obsolete” 14nm process.

          Let’s look at the approximately 175mm^2 i9 9900K with 8 cores. BUT WAIT… that’s got integrated graphics which take up about the same area as 2 cores + L2 & L3 cache. So let’s make it a 10 core, 175 mm^2 part.

          BUT WAIT! We’re only at 175 mm^2 here with 10 cores. Let’s bump that to 16 cores, which is an easy extrapolation of about 80 mm^2 given it takes about 53 mm^2 to go from a quad-core at 122mm^2 to an octo-core Sklylake at 175 mm^2.

          BUT WAIT! We’ve suddenly “discovered” that integrated memory controllers and I/O are EVIL AND OBSOLETE things. So all that PCIe and memory controller circuitry can get thrown out and be replaced with some sort of chiplet bus connect to a Northbridge that’s vastly smaller to implement since it isn’t doing anything complicated. So who knows how much space that saves, let’s assume just 1 core’s worth for fun, so pull out 13.25 mm^2.

          So we have a 16 core “magic” on Intel’s “failed obsolete” process that’s about 240 mm^2, which is delivering 16 cores using about 25 mm^2 more than a cut-rate first-gen RyZen chip all without requiring a single 10nm transistor to come out of Intel’s fabs.

          And TSMC’s miracle 7nm process couldn’t even deliver a vastly smaller version of that when push came to shove.. in 2019. Suddenly Cannon Lake in 2018 that didn’t need to go all “advanced” chiplet doesn’t look so bad.

            • blastdoor
            • 11 months ago

            It’s possible that the issue isn’t yields, but rather the fixed cost of taping out different designs. AMD has to pinch every penny.

            • chuckula
            • 11 months ago

            So now they have to tape out designs at multiple foundries and pay extra for integration. Fun.

            Love it or hate it, the original Zen was one chip and that’s it from a silicon perspective if not a packaging perspective. These are more complicated and AMD didn’t jump through all those hoops because 7nm was too good and too cheap.

            • Waco
            • 11 months ago

            I’m sure it’s to give them even more flexibility going forward – standardize on chiplets (of various types) and IO modules across the line.

            • blastdoor
            • 11 months ago

            Then… why did they jump through the hoops?

            • chuckula
            • 11 months ago

            That’s easy: They were in a race to get something that’s related to 7nm out the door as fast as humanly possible and if it meant effectively going back to an in-socket North Bridge to do it they were OK with the idea.

            I personally owned and used “chiplet” systems for years (Apple made billions of dollars selling chiplet systems too, BTW), they’re called Core 2 Quads, and AMD didn’t magically come up with some amazing architecture out of thin air here.

            • blastdoor
            • 11 months ago

            Or maybe it’s because 7nm allows them to double performance, holding power constant? Maybe?

            • chuckula
            • 11 months ago

            Oh yeah, I’m sure every multithreaded benchmark will scale perfectly.

            Given all that corporate-speak, I’m sure there’s no reason to deny that Epyc 2 will destroy a Xeon AP by at least 33% — no wait, this is AMD and it’s 7nm let’s make that 66% — in all benchmarks.

            Especially those pesky “databases” that nobody ever uses in high-end servers. Can’t wait to see that superior 8 channels of RAM destroy a 12 channel Xeon AP that gets to devote plenty of bandwidth to Optane DIMMS while still having more raw bandwidth.

            Not to mention AVX benchmarks now that AMD has given us Haswell-level floating point power.

            • Waco
            • 11 months ago

            I think your wheelhouse is about 50 feet to your left.

            • chuckula
            • 11 months ago

            Tell me one [b<]FUNDAMENTAL[/b<] difference between a Core 2 Quad and these chips. I dare you. 1. CPU cores & Cache on a package? Check. 2. No memory controller on-board? Check. 3. No PCIe on-board? Check. 4. Simplified I/O that only talks to the North Bridge (ooh I mean "magical I/O chip that nobody ever thought of in the history of the world before today!) Check. 5. Use of plain old copper interconnects in a PCB just like 1980s era motherboards? Check, but I guess AMD gets credit for making shorter traces. I don't care that third-party efforts in Moore's law that weren't done by AMD mean that there can now be a larger number of chiplets than in the past and that the north bridge can have more stuff in it. You cannot show me a single [b<]fundamental[/b<] architectural difference here and it's getting a little annoying that AMD acts like it should win a Nobel Prize for discovering how chips were made in the 90's.

            • Waco
            • 11 months ago

            Nobody disputes the facts (except you, for some reason).

            It’s the spin you’re placing on them that’s getting you called out – you’re clearly wrong on your conclusions.

            This isn’t something crazy new. Nothing in computing really ever is. Execution is what matters.

            • Srsly_Bro
            • 11 months ago

            A former president won a Nobel prize, so despite everything you said, that doesn’t take AMD out of the running.

            • jts888
            • 11 months ago

            How about probe filter driven interconnect fabric for fully scaling die-to-die traffic instead of independent dies contending for/snooping on a shared bus? Or maybe a coherency protocol obviating writebacks of dirty shared lines? How about the inter-socket interconnects sitting on the same die as the PCIe and memory controller, allowing inter-socket access of memory and peripherals to stay out of the way of intra-socket cache transfers?

            Superficial topology similarity is not very meaningful if the protocols over the links and the state held within the nodes are dissimilar.

            • ptsant
            • 11 months ago

            What you don’t realize is that these decisions were made years ago. Nobody new what the yields would be, how easy (or not) it would be to go to 10 or 7nm and how the needs of the market would evolve.

            AMD has made bets that did not pay off (HBM for gaming, 8-core in the bulldozer era, when the software wasn’t ready) but in this case they clearly made a winning bet. The chip is easy to produce, cheap and is most likely going to perform very well.

            Intel made a different bet, counting on their huge (at the time) process advantage and they find themselves in a more difficult situation. Even if the massive 56-core has spectacular perf, it will be insanely expensive to produce and will have great difficulty competing in per/$ metrics which dominate the segment.

            Anyway, there is no doubt that Intel could have done something similar. The point is that they didn’t.

            • blastdoor
            • 11 months ago

            Good points, although I suspect people more knowledgeable than I am probably did have a pretty good understanding that the move to 10/7nm would be pretty hard. Intel was probably just over-confident that they could pull it off.

            I wonder if perhaps those decisions were also initially made assuming GloFo would be fabbing these things. If that were the assumption, then a very conservative design with respect to potential yields would have made a great deal of sense.

            • Spunjji
            • 11 months ago

            Agreed. It seems unlikely that anyone thought it’d be easy, but the bods in charge at Intel clearly didn’t budget for it being this tough, either.

            That’s a good point there, too. When this was in the design phase GloFo were still publicly gunning for process leadership.

            • jts888
            • 11 months ago

            AMD has been saying in their published whitepapers for years that chiplet designs would allow them to cherry-pick at finer granularities to make high core-count chips with completely low-leakage sub-components. It’s not just about yields of non-defective parts, it is an attempt to compensate for perceived process deficits around clocking/leakage for mid-to-high core count parts.

            AMD can’t make an 8c 5 GHz 14 nm part to save their lives, but they can trade inter-die I/O power for the ability to make 32c Epycs with better overall int perf/Watt than monolithic Skylake XCCs that barely yield any 28c parts in the first place. Sticking with 8c dies for Epyc 2 is a continuation of this policy IMO and not just reacting to disappointing TSMC 7nm yields, since it is exponentially easier to find and aggregate 8 * 1 cm^2 least-leaky parts than 4 * 2 cm^2 least-leaky ones, even when yields for fully-intact parts are in the 90+% range.

            • Anonymous Coward
            • 11 months ago

            Sounds like a very good plan for servers (which already have somewhat compromised memory latency due to NUMA), but I wonder though how this works smoothly for the market currently served by single-die products. It would be a shame for Zen2 to not scale down as well as Zen1.

            I assume that desktop remains an important part of their plans to get some sales volume on the 8-core chiplets.

            • jts888
            • 11 months ago

            I think it is an open question around how many different dies might contain the new 8c CCX (or 2 * 4c CCXs?). 8c is certainly still overkill for the lowest market segments served by APUs, but maybe 7nm yields really are modest enough that there will be tons of parts with only 2 to 6 viable cores that are still perfectly serviceable in this segment. OTOH, small consumer parts don’t need as much I/O as Epyc, so a separate die with 8c, 2 channels of DDR4 and 32 lanes of PCIe isn’t totally out of question in my mind.

            • synthtel2
            • 11 months ago

            I think the CCX/IF-only die we see here plus an 8C APU die would do a pretty good job of covering everything. The APU with all cores enabled could move plenty of volume in the 2700/X’s segment, the extra cores aren’t using up a huge amount of die area (cost) for lower segments, and if power is the limitation and the workload can manage it more cores at a lower clock will be more efficient than restricting core count. It sounds like Zen+’s power management is capable enough that putting 6-8C in a laptop isn’t crazy.

            That does assume that a 7nm wafer isn’t massively more expensive than a 14nm wafer, which may not be true. The impression I’ve been getting is that 7nm’s extra costs are more about masks and so on (so the game is about getting a full lineup of products from as few die designs as possible), but I wouldn’t know and could be way off base.

            • Anonymous Coward
            • 11 months ago

            I was thinking that they’ve set themselves up well for a pretty mean APU, if thats their thing. Build the IO chip around a GPU, throw in a little HDR and crank it to 120W while disposing of 2nd-rate server dies. Offer a version without HDR and disabled cores to meet lower price points, maybe throw in a single-die (6C?) option for < 25W. Might be the end of GPU-less Ryzen.

            Will be quite interesting to see that memory latency though.

            • Spunjji
            • 11 months ago

            NOW we’re talking. That could be quite the chip and, presumably, would clear the path for variants with additional on-package memory dies too.

            I know I’m dreaming at this stage but the dream is nice.

            • Anonymous Coward
            • 11 months ago

            Its the most realistic chance at a “big APU” so far, anyway. I wonder how the multi-die thing changes the economics.

      • freebird
      • 11 months ago

      Coming from the same guy that doubted that AMD would be building 7nm Rome at TSMC back in June…

        • chuckula
        • 11 months ago

        I never said AMD wasn’t building Rome on 7nm, although now that we know the whole picture I should have been more aggressive in pointing out that AMD never claimed that Rome was being fabbed [b<]exclusively[/b<] using 7nm technology as that massive "14nm" GloFo chip in the middle makes clear. I questioned unsupported statements about the choice of vendors.. you know back when AMD was publicly talking about using GloFo's 7nm process in shareholder conference calls where lying isn't a good idea. And I was right to do so.

          • freebird
          • 11 months ago

          I don’t ever remember AMD committing to producing a specific 7nm product at a specific Foundry (although I think she talked about 7nm Vega being produced at TSMC, but not in a conference call) in a conference call, but please provide me the link, so I can read about it. I do remember Lisa Su stating they had flexibility in the Q2 2018 Conference call:

          “So at 7-nanometer, we are engaged with both TSMC and GLOBALFOUNDRIES. I would say that we do have, on a product-by-product basis, the choice between the foundries and we make those decisions on a product-by-product basis.”

          which I would HARDLY describe as a LIE.

            • Beahmont
            • 11 months ago

            AMD had, until Global Foundries killed their 7nm and below projects, an exclusive wafer agreement with Global Foundries. AMD would have to pay Global Foundries a substantial amount of money per wafer to make their products at TSMC.

            It would have been unusual and legally require notice to the AMD share holders if AMD went with anyone other than Global Foundries if the wafer supply agreement was still in effect, and it was in effect until August of this year when Global Foundries canceled the deal.

            • freebird
            • 11 months ago

            Did you ever read any thing about the WSA and its Amendments or do you just make it up as you go?

            Basically, AMD has the ability to produce anything anywhere now. They are committed to paying a fee for wafers produced at other Foundries, but the 6th Amendment also stated both working on 7nm in partnership & good faith.

            So there is NO EXCLUSIVE WAFER agreement. I didn’t see anything REQUIRING AMD to notify share holders WHERE they were going to produce what product, so that is BS.

            AND GF didn’t cancel the WSA, but it will VERY LIKELY be re-written, with a 7th Amendment already being negotiated.
            [url<]https://wccftech.com/amd-is-negotiating-a-7th-amendment-to-the-wsa-wafer-supply-agreement/[/url<] Considering that GF is likely in material breach of section 8 of the WSA 6th Amendment, that should be no surprise to anyone. "8. 7NM OPERATIONAL PLAN a. The Parties shall work in a spirit of partnership and good faith to focus resources to assist FoundryCo to develop its 7nm process technology in accordance with its time schedule. AMD shall provide such cooperation as reasonably required to enable FoundryCo to manufacture 7nm products for AMD consistent with AMD’s time schedule for 7nm Products. The details of such cooperation will be mutually agreed and set forth in an operational plan, which plan shall be based on the elements further described in Exhibit A (the “7nm Operational Plan”). The Parties acknowledge that certain elements of the 7nm Operational Plan will be updated from time to time per the Parties’ mutual agreement in order to fulfill the objectives set forth in this Section 8 until the 7nm Operational Plan is complete. " [url<]https://www.sec.gov/Archives/edgar/data/2488/000000248816000263/wsaamendmentno6redacted.htm[/url<]

      • Unknown-Error
      • 11 months ago

      up-voted!

      • ptsant
      • 11 months ago

      Who cares. The point is they have a 64-core CPU and Intel doesn’t. Perf is all that matters.

        • jarder
        • 11 months ago

        I disagree, Perf is not the the only thing that matters. It’s performance and cost.

          • ptsant
          • 11 months ago

          Of course. I meant with reference to the chip design. But in this case the chip design also means lower [production] cost. If current pricing is any guide, Epyc will be sold at competitive price points.

        • Beahmont
        • 11 months ago

        No they don’t. AMD has an 8 chip MCM. If you think Intel can’t make an MCM to compete you obviously have not been paying attention to the Cascade Lake AP announcement.

          • Spunjji
          • 11 months ago

          He said 64-core CPU. This is indeed a 64-core CPU. Are you denying that, or just nitpicking to adhere to a criticism?

          If you think Intel /can/ make something to compete then you too must have missed the Cascade Lake AP announcement, whereby it 1) very notably doesn’t exist in a form they can take benchmarks from yet, and 2) doesn’t have as many cores or the same RAM capacity capabilities as this. That’s before we even get to power consumption.

          Of course, it can and will compete because the server marketplace isn’t just about the CPU. I just think you’re being seriously disingenuous in how you’re arguing your point.

          • chuckula
          • 11 months ago

          You need to get with the pro-AMD koolaid program!

          MCM from AMD? THIS IS THE GREATEST MIRACLE OF ALL TIME! Especially when in 2019 with TSMC’ miracle process AMD has gone to making 8 core dies without any real integrated components as the miraculous advance from 2018 when they had 8 core dies with integrated components.

          MCM from Intel? THIS SUCKS! THEY JUST PUT MULTIPLE CHIPS TOGETHER TO MAKE MORE CORES!

          Don’t even get me started about how 8 channels of RAM on a 64 core Epyc 2 is a miracle while 6 channels of RAM on last year’s Xeons is stupid and 12 channels of RAM on a 48 core Xeon is literally impossible to manufacture and must be worse than 8 channels on Epyc!

            • cygnus1
            • 11 months ago

            I think you’re having too much fun sir, you need to calm down.

    • ptsant
    • 11 months ago

    So, 3700X will be an 8-core I guess. With luck 4.5GHz and +10% IPC… Would buy.

      • Waco
      • 11 months ago

      Different die, in this case, unless they’re going MCM for desktop as well. I guess there’s no reason not to, but that also means they can do a 16 core part pretty easily.

        • ptsant
        • 11 months ago

        Why do you think it’s going to be a different die? Isn’t it simpler to just stick a single chiplet with some IO glue on a chip? Or is it more expensive? But anyway, it won’t be more than 8-core/2-channel.

          • Waco
          • 11 months ago

          The chiplet does not have a memory controller – so if it has an IO die, they could easily fit more chiplets aside that IO die (which would likely be logically 1/2 or 1/4 of the big one in Epyc)…

          I’m assuming AMD will keep costs at minimum by standardizing the dies on the leading-edge process as much as possible.

    • Krogoth
    • 11 months ago

    AMD IS STUCK ON 14NM FOR I/O, PATHETIC!

    INTEL IS ON A ROLL WITH 10NM AND WILL CRUSH NOT-EPYC 2. AVX512 WILL DOMINATE THE COMPUTING WORLD!

    #PoorNotEpyc2
    #PoorGarbageNavi
    #PoorFuranceVega20
    #AMDShouldStopEatingGlue

      • Srsly_Bro
      • 11 months ago

      LMAO. I’m starting to become concerned.

        • derFunkenstein
        • 11 months ago

        It reminds me of Microsoft’s Twitter bot “Tay” that started out OK but eventually turned into a Nazi.

          • Ninjitsu
          • 11 months ago

          seems to be a common thing with twitter bots

      • cygnus1
      • 11 months ago

      Downvoted for too much caps lock

        • chuckula
        • 11 months ago

        SSK wept.

          • cygnus1
          • 11 months ago

          upvoted for truth

          bahahahaha

          • derFunkenstein
          • 11 months ago

          SSK uses the shift key like the OG that he is.

          • Mr Bill
          • 11 months ago

          SSK knows the shift key is the SARCASM tag.

      • cygnus1
      • 11 months ago

      Also, get off your high horse. Every major advance in CPUs worth a damn in the last 15 years has been AMDs. Intel hyperthreading/SMT has been a security disaster and AVX512 is not dominating much but a few specific uses.

      AMD has given us X64, integrated memory controllers, integrated GPUs, and now multi-chip modules for mainstream SKUs. Intel can only lead through manufacturing prowess and they’re losing that to contract fab companies. Intel is simply not a leader. Face the facts there Krogoth…

        • Krogoth
        • 11 months ago

        > falls for the obvious troll bait

          • cygnus1
          • 11 months ago

          lol, how dare you, sir!

            • ronch
            • 11 months ago

            T’was quite obvious he’s just trolling.

        • chuckula
        • 11 months ago

        Considering that Sandy Bridge had an integrated GPU before Llano launched and since MCMs are suddenly Holy Amazing products Intel should also get credit for MCM integrated graphics in Bloomfield years before AMD had a product on the market.

        As for hyperthreading sucking so bad.. it sucks so bad that the only worthwhile chips from AMD since the good ol’ days in 2003 all implemented it even as the AMD fanboys insult Intel for a feature they love when AMD does it.

        Literally yesterday there were the usual AMD fanboys out with pitchforks because the published stream triad scores for Epyc were done with the “stupid” hyperthreading on Epyc turned off… until somebody with more than two brain cells pointed out that this is done intentionally by AMD on its own systems for best performance.

        Integrated memory controllers? Weren’t you paying attention today?

          • cygnus1
          • 11 months ago

          I think you’re confused on the Bloomfield GPU… Bloomfield was the first gen i7’s, pretty sure those did NOT have GPUs. The i3’s and i5’s that generation had GPU’s I believe, but were not Bloomfield.

          Also, Intel didn’t integrate a good GPU until much later than AMD, and it still isn’t good, just decent… Just sayin 😉

        • jihadjoe
        • 11 months ago

        And most of “AMD’s” advancements came straight from DEC’s rotting corpse lol.

        First x86 IGP (or SoC for that matter) IIRC was Cyrix’s MediaGX. It even had integrated sound!

        MCMs? Uhh Kentsfield? AMD even made fun of Intel for glueing together two dies and not having a true quad-core.

        AMD64 wasn’t anything revolutionary at all. How hard can extending a bunch of registers be? Intel did the same thing going from 8 to 16 to 32 bits. The fact that they were able to release EM64T so soon after AMD64 is proof they always had the tech, just didn’t want to do it because they were on-board with Itanic.

          • Waco
          • 11 months ago

          Going from 32 bit to 64 bit is just extending a bunch of registers? Glad we got that one sorted out…

            • chuckula
            • 11 months ago

            It really isn’t much more than that along with logical extensions of basic operations and a few addressing modes. X86 was already used to undergoing extensions from the 16 to 32 bit transition and there were plenty of older architectures that had done the heavy lifting already.

            Considering 64 bit was implemented on even the Pentium 4 without a major microarchitecture iteration, he’s not wrong

            • Waco
            • 11 months ago

            Sure, it sounds simple. It’s not, though, to implement well.

            I was just trying to drive the point that simple things rarely are when talking about computing.

            • chuckula
            • 11 months ago

            Designing the first 64 bit CPUs was tough but DEC already had the talent to do it and AMD just had them apply known working ideas to their chips. Intel & HP built and entirely new 64 bit architecture from scratch and it took a whole lot longer compared to starting from a working chip and tweaking it.

            • Mr Bill
            • 11 months ago

            Itanium?
            Sure, but just like the market went for the 8088 instead of the 8086 or the 68000, the market choose existing functionality over possibly superior functionality.

            • blastdoor
            • 11 months ago

            I think the thing that is often under-appreciated in these types of discussions is the challenge of assembling “old” ideas/components into a coherent/effective/valuable whole product. Figuring out what to include — and what *not* to include — in a product in order to most efficiently achieve an objective is hard to do. Doing it well is very valuable.

            We could criticize a head coach of a football team that wins the Super Bowl as having done nothing truly “new.” He just assembles a bunch of off-the-shelf components (players and plays) in a way that works especially well in the context he is facing. Big deal! Well actually, yes, it is a big deal.

            Whether you’re designing a play, a strategy for a season, or a product it’s all about the choices you make in terms of what to do, what *not* to do, and then executing. Coming up with a genuinely new component — for example, inventing the memristor or the lithium ion battery — is great, but that’s not the only thing that’s great.

            For AMD, it’s all about play calling. They can’t afford to make the R&D investments to come up with truly new things (or at least not too many truly new things). Lisa Su appears to be calling plays a heck of a lot better than her predecessors. She deserves credit for that.

            • Waco
            • 11 months ago

            Exactly. Especially if you look at the history of computing over the past 50 years or so, there are very few original ideas. Combining existing ideas in an intelligent way and executing them well matter far more than how “original” the components in question are.

            • Spunjji
            • 11 months ago

            Word to all of this. The iPad was the first successful tablet and “all” it did was scale-up a smartphone. Thing is, that worked a ton better than what everybody else had been trying until then.

            I have mad respect for what Lisa Su and AMD are achieving on a comparative shoestring budget. Intel seem to be suffering from too much money and not enough focus.

          • cygnus1
          • 11 months ago

          [quote=”jihadjoe”<] And most of "AMD's" advancements came straight from DEC's rotting corpse lol. [/quote<] Lol, not arguing that, but they didn't come from Intel 😀

            • Mr Bill
            • 11 months ago

            Sometimes the [url=https://www.youtube.com/watch?v=Lt5M8CXLkzQ<]hired gun[/url<] is the best solution. Seems to have worked for Zen too.

            • Spunjji
            • 11 months ago

            Thanks for saying what I was thinking! If DEC were so /obviously/ good then why didn’t Intel buy them… oh yes, they were literally a rotting corpse.

            Sometimes good business sense in the tech world is knowing which bodies to loot (nVIDIA and 3DFx come to mind).

      • kuttan
      • 11 months ago

      Cheap troll. Chuckula trolls better than you 😀

        • Spunjji
        • 11 months ago

        Chuckula is reassuringly expensive

          • cygnus1
          • 11 months ago

          I laughed. +3

            • Spunjji
            • 11 months ago

            Much appreciated, but the downvote demons struck.

            I AM UNDONE!

            • K-L-Waster
            • 11 months ago

            If you’re going to melt, please do so off the carpets — the cleaning bill is hell.

        • Krogoth
        • 11 months ago

        PFFT, MY TROLL-FU IS STRONGER!

      • BorgOvermind
      • 11 months ago

      Intel failed the 10nm. See my post above.

    • blastdoor
    • 11 months ago

    Interesting that the memory controller has been separated from the CPU cores, given that AMD was early to integrate the two in the first place.

    Also interesting that 64 cores are achieved with 8, 8-core chiplets rather than 4, 16-core modules.

      • Jeff Kampman
      • 11 months ago

      This is clearly a yield-maximizing move for 7 nm.

        • DancinJack
        • 11 months ago

        Yup, it immediately made me think about the recent moves Intel made regarding chipsets and possible outsourcing of some stuff to focus on 10nm.

        • blastdoor
        • 11 months ago

        Indeed, and also perhaps all about minimizing the cost of designing for 7nm. Correct me if I’m wrong, folks, but I think this implies that AMD really only has to pay for a single 7nm CPU design and tape out — the 8 core chiplet. That chiplet could presumably be used in almost every product they make. They can just pair the 7nm chiplet with whatever 14nm IO chip is appropriate for a given market. And of course some chiplets might have fewer than 8 active cores.

        Although, if that’s true, what does it mean for the APU? Does the APU remain a 12/14 nm product? Or do we get a 14 nm IO chip paired with a 7nm CPU chiplet and a 7nm GPU chiplet?

          • Goty
          • 11 months ago

          [quote<]Although, if that's true, what does it mean for the APU? Does the APU remain a 12/14 nm product? Or do we get a 14 nm IO chip paired with a 7nm CPU chiplet and a 7nm GPU chiplet?[/quote<] I'm guessing you get a separate GPU chiplet connected over IF. They're already doing CPU/GPU communication over IF for Rome/Vega 20 according to the presentation today. Of course, they could just produce another monolithic chip like they did with Raven Ridge. Who knows?

            • UberGerbil
            • 11 months ago

            [quote<]I'm guessing you get a separate GPU chiplet connected over IF. They're already doing CPU/GPU communication over IF for Rome/Vega 20 according to the presentation today.[/quote<] Naively (because there's a lot of considerations I can't weigh properly or simply know nothing about) that's how I'd do it. From a technical standpoint it's nice because the GPU becomes just another core behind the memory controller; from a business standpoint it's nice because you can combine various binned CPU and GPU chiplets to give yourself the combinations that most appeal to OEMs and the market at large. I wonder if they've given consideration to putting some memory in the IO chip as Last Level Cache (a la Intel's eDRAM) or even slapping some HBM on there. In fact, with the latest HBM stacks they could basically turn the IO chip into the motherboard: no DIMMs required, just hook up the ports and you've got a functional PC. For ODMs trying to do their own "Air" style laptops, that might be a pretty interesting core component.

            • zzing123
            • 11 months ago

            Given that Vega is going 7nm and PCIe 4 too, AMD can just plonk Vega side by side to Zen 2 as a chiplet in it’s own right. What’s also interesting about separating the uncore in its own chip, is that the configuration options are endless:

            – You want an 8P server: make a bigger IO package with even more PCIe 4 lanes for Infinity Fabric.
            – You want an APU: one Zen 2 chiplet, one Vega chiplet and one uncore
            – You want an ML 2P server: 4x Zen 2 chiplets, 4x Vega chiplets, and the same IO package
            – You want a 2P virtualisation server: 8x Zen 2 chiplets, IO package (as shown)

            In the future, I wouldn’t be at all surprised to see PC’s go into a transputer-like design: a motherboard with 16x PCIe 4.0 (probabbly x64, not x16) slots and slot in what ever amount of CPU, GPU and service packages (NVMe, Storage Controller, NIC etc) as you need…

        • psuedonymous
        • 11 months ago

        While only two are shown in the diagram they do not specify how many chiplets are present per package. That could be anywhere from 1 CCX chiplet to 1 core per chiplet, which could allow for very small dies indeed. Anyone expecting large-die 7nm yeilds at TSMC (or Samsung) to somehow be better than Intel are managing with 10nm is working on imagination rather than physics. The scaling barriers care not what company you work for.

        It’s going to be interesting to see how well through-substrate Infinity Fabric can do when every single memory operation needs to dip through it. That hasn’t been a good situation for Threadripper (the ‘island’ dies that only have access to memory via the other two dies) but the star vs. ring topology might help out there.

          • Jeff Kampman
          • 11 months ago

          It’s up to eight per Epyc at the moment.

            • psuedonymous
            • 11 months ago

            So the regular double-CCX complexes. Will be interesting to see if that scales down to the unannounced Zen 2 desktop parts (one 7nm core die plus one 14nm uncore due), or if those will arrive significantly alter once 7nm can handle a larger monolithic die as in current desktop Zen parts.

            • jts888
            • 11 months ago

            It was actually left completely unstated whether the compute chiplets were 4c*2ccx or 8c*1ccx designs. If anything, I suspect the latter, since it would not only not require a 3 port IF crossbar of minimal utility, but also remove any need whatsoever for any sort of home agent/probe filter logic on the compute dies. (When a core needs a line, it is either in the local aggregate L2/L3, or across the single IF link) An internal CCX/L3 crossbar with 9 ports would clearly be substantially more complex than the 5 port designs in Zen 1 CCXs, but this would likely be more than made up for by the removal of the HA logic to the central 14nm I/O die.

            • Zizy
            • 11 months ago

            Well, it will still have IF crossbar for external communication and PCIe, so you are moving from 4 ports to 3 (or a complicated internal xbar with 10 ports). I believe having 4 ports with 2 of them further split to 5 ports internally is far simpler (2x 4C CCX), though assuming some day AMD moves to active interposer, it wouldn’t make too much sense to keep CCX. They might have bit that bullet already.

          • blastdoor
          • 11 months ago

          The 2990wx also faces the challenge of four memory channels serving 32 cores. So those two ‘island’ dies have more disadvantages than lacking a direct connection to RAM.

          • blastdoor
          • 11 months ago

          One other thought… what if AMD stacked some HBM on top of that IO chip?

          • Waco
          • 11 months ago

          There’s a very big difference between an on-package direct IO route versus hopping through remote memory controllers via a long-haul fabric (like Infinity Fabric in current Epyc/Threadripper). They might call it the same thing with the chiplets but I’d be surprised if they don’t have the bandwidth and latency cranked down tightly.

          • freebird
          • 11 months ago

          7nm TSMC/Samsung can’t be compared (yields or process tech) to Intel’s original 10nm process in many ways.

          Intel WAS planning on using Cobalt and COAG (Contact Over Active Gate), both of which may be dropped from the “shipping” 10nm process Intel finally delivers late in 2019.

          • Spunjji
          • 11 months ago

          It’s best to avoid committing the fallacy you’re accusing others of falling into. TSMC are doing fine making chips larger than Intel managed with their first go at 10nm, so yes, reality suggests that Intel are struggling where others are not.

          Your assertion to the contrary falls into the domain of “imagination” and no amount of vague hand-waving about physics will change that.

        • Wirko
        • 11 months ago

        It will also reduce the power density a little.

      • jihadjoe
      • 11 months ago

      Welcome back, Mr. Northbridge! It’s been a while.

        • blastdoor
        • 11 months ago

        Does this mean the return of Nforce?

          • tay
          • 11 months ago

          Dolby DTS encoding!! Good times…

            • jihadjoe
            • 11 months ago

            I still have a working Athlon 64 + NForce + Radeon 9500 softmod setup.

            After upgrading to Core 2 Quad I just sort of chucked it in the closet and left it there for the last decade. Pulled it out just last week and was pleasantly surprised to see it still works!

      • Chrispy_
      • 11 months ago

      Just imagine that “chiplet” is a CCX and it all makes sense again.

      A Zen CCX has always been four cores clustered around L2 and L3 shared cache, and then the whole CCX is connected via infinity fabric.

        • Mr Bill
        • 11 months ago

        So, they are all having CCX via the infinity fabric?

          • Goty
          • 11 months ago

          I think it’s “on top of,” at least when they’re feeling frisky.

      • freebird
      • 11 months ago

      8-cores x 8chips makes more sense than 16-cores x 4 chips if you have read this:
      [url<]http://www.eecg.toronto.edu/~enright/Kannan_MICRO48.pdf[/url<]

        • UberGerbil
        • 11 months ago

        “[url=https://i.imgur.com/LAq6VW5.gif<]Misaligned ButterDonut[/url<]"!

          • Redocbew
          • 11 months ago

          Every chip should have a name which could be an auto-generated github repo name.

    • chuckula
    • 11 months ago

    The point is, ladies and gentleman, that glue — for lack of a better word — is good.

    Glue is right.

    Glue works.

    Glue clarifies, binds together, and captures the essence of the low-core count dies that [s<]GloFo[/s<] [u<]TSMC[/u<] churns out. Glue, in all of its forms -- glue for cores, for profit margin, for ad clickbait, Cinebench -- has marked the upward surge of AMDkind. And glue -- you mark my words -- will not only save AMD's server marketshare, but that other malfunctioning corporation called the USA.

      • Krogoth
      • 11 months ago

      AMD SHOULD STOP SNIFFING THE GLUE! THEY ARE LOSING WHAT THE TINY AMOUNT OF PRECIOUS BRAIN CELLS THAT HAVE LEFT.

      • derFunkenstein
      • 11 months ago

      It’s time to stop crying over spilled glue.

      • Mr Bill
      • 11 months ago

      Gluons forever! The strong force agrees!

        • chuckula
        • 11 months ago

        It’s still sad that nobody here got that reference.

        My vast well of cultural knowledge is wasted.

          • Mr Bill
          • 11 months ago

          Enlighten me Obi Wan. Glue up my cultural void.

          [url=https://www.youtube.com/watch?v=VVxYOQS6ggk<]Greed?[/url<]

          • K-L-Waster
          • 11 months ago

          Yeah yeah, Gordon, go back to shady mergers and acquisitions.

      • Mr Bill
      • 11 months ago

      OK, three quarks for Muster Mark!

    • Goty
    • 11 months ago

    I think one of the most telling things from the presentation is that AMD is quoting 2x performance per socket over Naples. Well, you can double the theoretical performance by simply doubling the core count at the same frequency (which would be impressive), but what does that say about IPC improvements?

      • blastdoor
      • 11 months ago

      Do we know it’s the same frequency? Perhaps IPC up, frequency down….

      They said double the density, power cut in half at same frequency, or 25% more frequency at same power. To keep power the same while doubling cores and adding more FPU resources, perhaps AMD cut frequency and boosted IPC.

        • Goty
        • 11 months ago

        Yeah, but you’d think half power at the same frequency would give you double the cores per CPU at the same frequency and power, right? If there’s an IPC uplift, they could reduce power and keep the same performance/core for the doubling of performance, but I don’t see much point in that.

          • blastdoor
          • 11 months ago

          Yes — half power at same freq would allow you to double the cores and keep total power constant.

          But they have done more than double the cores — they’ve also added significant new FPU resources. So, I’m thinking they have to pay for that somehow. One way to pay would be to cut frequency. To cut frequency a bit without losing performance, increase IPC.

          Increasing IPC means adding more transistors, which then adds back power, but adding transistors scales power linearly while adding Hz is more than linear. That is, reduce frequency by 1%, increase transistors by 1%, power goes down.

    • sweatshopking
    • 11 months ago

    BUT GAMEZ DOTNT CARE ABOUT SO MANY THREAD GUIZE

      • BabelHuber
      • 11 months ago

      Nice 🙂

      Look at the Rome socket in Lisa Su’s hand: [url<]https://images.anandtech.com/doci/13547/1541532227330.JPEG[/url<] Seems like 8 dice for the CPUs (8x8) plus the huge IO chip in the middle to me. Such an 8 CPU die plus a (much) smaller IO chip will be the new desktop socket, I guess...

        • derFunkenstein
        • 11 months ago

        Look at the size of Lisa’s package!

          • Growler
          • 11 months ago

          Size matters not. I learned that from an ancient philosopher, so it’s got to be true.

        • Anonymous Coward
        • 11 months ago

        Reminds my of the stuff IBM has made over the years, huge packages covered in dies. For example Power5 from 15 years ago: [url<]https://en.wikipedia.org/wiki/POWER5#/media/File:Power5.jpg[/url<]

      • BorgOvermind
      • 11 months ago

      Games are the worst optimized software in existence. Except windows.

    • LocalCitizen
    • 11 months ago

    #PoorClap

Pin It on Pinterest

Share This