Nvidia’s Pascal to use stacked memory, proprietary NVLink interconnect

GTC — Today during his opening keynote at the Nvidia GPU Technology Conference, CEO Jen-Hsun Huang offered an update to Nvidia’s GPU roadmap. The big reveal was about a GPU code-named Pascal, which will be a generation beyond the still-being-introduced Maxwell architecture in the firm’s plans.

Pascal’s primary innovation will be the integration of stacked "3D" memory situated on the same substrate with the GPU, providing substantially higher bandwidth than traditional DRAMs mounted on the same circuit board.

If all of this info sounds more than a little familiar, perhaps you’ll recall that Nvidia also announced a future, post-Maxwell GPU at GTC 2013. It was code-named Volta and was also slated to feature stacked memory on package. So what happened?

Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia’s core GPU architecture.

Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I’m not sure we can say that Volta has been delayed, since the firm never pinned down that GPU’s projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.

The picture above includes a single benchmark result, as projected for Pascal, in the bandwidth-intensive SGEMM matrix multiplication test. As you can see, Pascal nearly triples the performance of today’s Kepler GPUs and nearly doubles the throughput of the upcoming Maxwell chips. This comparison is made at the same power level for each GPU, so Pascal should also represent a nice increase in energy efficiency.

Compared to today’s GPU memory subsystems, Huang claimed Pascal’s 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.

The other item of note in Pascal’s feature set is a new, proprietary chip-to-chip interconnect known as NVLink. This interconnect is a higher-bandwidth alternative to PCI Express 3.0 that Nvidia claims will be substantially more power-efficient. In many ways, NVLink looks very similar to PCI Express. It uses differential signaling with an embedded clock, and it will support the PCI Express programming model, including "DMA+", so driver support should be straightforward. Nvidia expects NVLink to act as a GPU-to-GPU connection and, in some cases, as a GPU-to-CPU link. To that end, the second generation of NVLink will be capable of maintaining cache coherency between multiple chips.

NVLink was created chiefly for use in supercomputing clusters and other enterprise-class deployments where many GPUs may be installed into a single server. Interestingly, as part of today’s announcements, IBM revealed that it will incorporate NVLink into future CPUs. We don’t have any details yet about which CPUs or what proportion of the Power CPU lineup will use NVLink, though.

Huang claimed NVLink will offer five to 12 times the bandwidth of PCIe. That may be a bit of CEO math. The first generation of NVLink will feature eight lanes per block or "brick" of connectivity. Each of those lanes will be capable of transporting 20Gbps of data, so the aggregate bandwidth of a brick should be 20GB/s. By contrast, PCIe 3.0 transfers 8Gbps per lane and 8GB/s across eight lanes, and the still-in-the-works PCIe 4.0 standard is targeting double that rate.

NVLink apparently gets some of its added bandwidth by imposing stricter limits on trace lengths across the motherboard, and the company says it has made a "fundamental breakthrough" in energy efficiency, resulting from Nvidia’s own research, that differentiates NVLink from PCIe. NVLink will not be an open standard, though, so we may not be seeing a public airing of the entire spec.

The module pictured above will be the basic building block of many solutions based on the Pascal GPU. Each module has two "bricks" of NVLink connectivity onboard, and the board will connect to the host system via a mezzanine-style NVLink connector. The combination of connector and NVLink protocol should allow for some nice, dense, and high-integrity server systems built around Nvidia GPUs—and it will also ensure that those systems can only play host to Nvidia silicon. This proprietary hook is surely another motivation for the creation of NVLink, at the end of the day.

Huang said he wants the Pascal module to be the future of not just supercomputers but all sorts of visual computing systems, including gaming PCs. Mezzanine-style modules do have size and signal integrity advantages over traditional expansion cards with edge-based connectors. Another benefit of this module is additional power without auxiliary power cables. Nvidia’s current Tesla GPUs draw between 225 and 300W, and the firm apparently expects to power them solely via the mezzanine connection to the module. We’ll have to work to tease out exactly what Huang’s statement means for future consumer PCs, but Nvidia admits it doesn’t expect PCIe cards to be going away any time soon.

Comments closed
    • hoboGeek
    • 6 years ago

    In that third picture, that pen looks like 20 feet long.
    Who is going to use that huge size pen?
    Whales? Elephants? They can’t even write! Why?
    Duh!! Because they don’t have fingers.

    EDIT:
    Don’t even get me started on that huge GPU chipset

      • yogibbear
      • 6 years ago

      Go back to your school for ants.

      • BIF
      • 6 years ago

      Well, what about that huge zipline keychain the janitor has clipped to his belt? Oh, I’ll bet that’s where they keep that big “Key to the City”.

    • kamikaziechameleon
    • 6 years ago

    Can they fix the SLI issues???

    • USAFTW
    • 6 years ago

    Why is AMD being so secret about their next gen GPUs? Maybe there isn’t one?

      • ronch
      • 6 years ago

      I’ve been thinking about this for a while. GCN has been around for two years, and when they said GCN’s gonna be sticking around for a while longer back in 2013 (when they said they were skipping a refresh or something like that), we were expecting something better than Hawaii. Hawaii’s good, but it’s really not much more than a slight tweak of the existing fundamental GCN architecture. If they had something up their sleeves you can be sure they would’ve announced it by now to steal Maxwell’s thunder. But no.

        • l33t-g4m3r
        • 6 years ago

        From what I’ve heard, AMD will be making improvements to GCN. That said, GCN is a powerful architecture, and it’s problems aren’t hardware related, but driver related. Case in point: Mantle. If AMD can get their dx12 performance up to Mantle levels, they might have a good case against nvidia. If not, I don’t care what new hardware they bring out, their driver inefficiencies will kill the performance and nvidia will win.

          • USAFTW
          • 6 years ago

          Case in point, historically AMD GPUs weren’t competitive with nVidia’s at lower res, but they would reach and often exceed nVidia performance levels at higher res. Now with techreport’s recent Mantle article and seeing how nVidia runs away from AMD cards at lower detail levels and res, I’m thinking it’s a driver overhead problem. But I’ve also read the argument about architecture differences and how AMD balances driver performance for higher res.

      • chuckula
      • 6 years ago

      There is one, Pirate Islands, which is very appropriate given the Arrrrrrr… uh I mean “R” prefix on their cards.

      However, it’s not anywhere near being ready to launch and it’s unclear how many changes AMD has made to the GCN architecture. Big-Maxwell also isn’t that close to launching although hopefully we’ll see it out this year.

        • ronch
        • 6 years ago

        So is Pirate Islands appropriate for playing… Pirated games? 😀

          • Ninjitsu
          • 6 years ago

          Should sell loads in India and China.

    • jjj
    • 6 years ago

    Volta delayed by what looks like 2 years is a bit alarming. Ofc the extra BW that comes with Pascal should be nice.
    Do wonder if they will use NVLink in their SoCs not just discrete GPUs and if that might happen before 2016.
    Too bad we got no news on Denver and Maxwell but i do hope you guys are getting a K1 dev board for some early k1 numbers.

    • Unknown-Error
    • 6 years ago

    Unlike like AMD and its boring predator CEO this annoying, egoistic yet brilliant, dynamic, charismatic CEO pushes nVIDIA to innovate.

      • NeelyCam
      • 6 years ago

      Predator?

        • chuckula
        • 6 years ago

        If Nvidia’s CEO is a Predator, then it looks like AMD needs a Xenomorph to burst out of Rory’s chest one of these days.

          • NeelyCam
          • 6 years ago

          Funny, but a bit of a reading comp fail?

            • chuckula
            • 6 years ago

            True, but I still want to see Xenomorphs.

            • NeelyCam
            • 6 years ago

            We all do

    • Captain Ned
    • 6 years ago

    Not so sure that naming a card after a dead student programming language (yes, I’m old) was such a good idea.

      • Ninjitsu
      • 6 years ago

      I’m pretty sure they named it after Blaise Pascal…

        • MadManOriginal
        • 6 years ago

        Yes, the GPUs are all named after scientists.

        • hoboGeek
        • 6 years ago

        So was the programming language.
        [url<]http://en.wikipedia.org/wiki/Pascal_%28programming_language%29[/url<]

      • Meadows
      • 6 years ago

      What Ninjitsu said. Ned, you’re not old enough.

      • Wirko
      • 6 years ago

      If Pascal hadn’t died, it wouldn’t have reincarnated as Delphi.

      Granted, it probably met some dead students during the time it spent in its afterlife.

      • ronch
      • 6 years ago

      If (Ned = right) then (upthumb = upthumb +1) else (downthumb = downthumnb + 1);

      Sorry, I’m not even sure my syntax is correct. Haven’t been programming for an age and a half!

        • Ninjitsu
        • 6 years ago

        Correction:
        [code<] if (Ned==right); [/code<] and you should add [code<] thumb_total= upthumb-downthumb; [/code<] at the end. 😛 EDIT: Fixed syntax error at former lines 2 and 4. 😉

          • way2strong
          • 6 years ago

          [code<]if (Ned = right) then upthumb := upthumb + 1 else downthumb := downthumb - 1; thumb_total := upthumb - downthumb;[/code<] Pascal syntax required in pascal thread.

            • Scrotos
            • 6 years ago

            Seeing PASCAL again makes me want to go back to programming.

            Seeing that c snippet makes me want to punch myself in the nuts.

            I always did like PASCAL’s syntax more than the pseudo line noise garbage of c.

            • Wirko
            • 6 years ago

            Me too. Mr. Niklaus Wirth seems to have a rare sense of aesthetics – curly braces are absent, source files end on “end.”. His compiler would punch you in the nuts three times if you put a colon and a P at the end of file.

            • 223 Fan
            • 6 years ago

            And := of Pascal / Modula2 / Ada has always annoyed me. Plus BEGIN / END and other assorted verbosity. Probably because I learned FORTRAN and COBOL on punched cards first and by comparison the sparse syntax of C was a revelation and breath of fresh air. Elegance is a relative term.

            • Ninjitsu
            • 6 years ago

            Ah. I don’t know Pascal. But if you say

            [code<] downthumb := downthumb - 1; [/code<] and then [code<] thumb_total := upthumb - downthumb; [/code<] Then you'll have a logical problem, because if upthumb=4 and downthumb=-4, thumb_total=8 instead of 0. 🙂

            • way2strong
            • 6 years ago

            You make a persuasive argument.

          • Wirko
          • 6 years ago

          Fatal: Syntax error in line 4: 😛

            • Ninjitsu
            • 6 years ago

            Hahahaha!

            But then, there’s an “and” in line 2 as well!

            EDIT: Fixed. 😀

      • smilingcrow
      • 6 years ago

      Pascal is alive but on a lot of meds.
      Delphi was a great home for Pascal but it seems to be in terminal decline.

    • ronch
    • 6 years ago

    NVLink sounds like it’s something Nvidia could use in the future should they decide to enter the ARM server market and perhaps augment their ARM cores with GPU compute cores. They already have proven that they can design a 64-bit ARM core and they have chipset and memory controller experience. Now here’s NVLink. Perhaps this is something akin to Freedom Fabric<tm> or something they can later evolve into something similar to it? They have all the necessary IP blocks and I can’t imagine that it would be hard for them to bring themselves up to speed on this. Their engineering prowess is one of the most formidable in the world.

    And this is coming from an AMD fan who’s kept away from Nvidia GPUs for ten years.

      • BlondIndian
      • 6 years ago

      Calling yourself an AMD fan to hide your Nvidia fanboyism is a pretty neat trick ! Keep it up.

        • ronch
        • 6 years ago

        Oh how quick and careless of you to make such a statement. I like AMD and use mostly AMD but that doesn’t mean I won’t applaud other companies that deserve it or throw a bomb at AMD if they need to wake up, right?

    • ronch
    • 6 years ago

    When a company makes innovations that aren’t merely ‘going for the low-lying fruit’, it gets more respect. Case in point, Intel and Nvidia. These two push the boundaries of our imagination, not merely go out and cram more cache or cores on a piece of silicon and push TDPs higher to force higher performance. In this game, finesse counts for a lot, and coming up with smarter, more innovative solutions to solve engineering problems is something not all companies can pull off. Sometimes, you have to break the rules.

      • xeridea
      • 6 years ago

      How about AMD with HUMA, APUs in general, Mantle, tessellation before it was cool, display sync type thing that doesn’t require a $250 chip, and heavily supporting open standards such as OpenCL and DisplayPort?

      This is just a proprietary connector to lock companies in, and stacked memory, which has been an idea floating around the interwebs for years.

      What has Intel done? “3D” transistors? 50% smaller transistor size and 19% less power usage, I don’t think the technology helped much. Others have plans for similar tech, but agree that it makes more sense at smaller geometries.

        • Airmantharp
        • 6 years ago

        If you’re an AMD fan, that’s cool- but note that none of those technologies are really ‘innovative’, and can mostly be summed up as ‘here’s some marketing-speak for stuff that’s the same as what everyone else is also doing’. But specifically:

        -HUMA/APUs aren’t that big of a deal, really. It’s not like Intel/Nvidia/ARM/et al aren’t working on the same thing.

        -Mantle, as DirectX 12 has shown, is mostly stillborn, and gasp! It locks companies in.

        -G-Sync cannot be replicated without a hardware change, and no, AMD’s solution isn’t the same thing. It’s half-assed at best.

        -‘Open standards’? How about OpenGL? Linux? CUDA is open too, unlike Mantle…

        -DisplayPort: you’re reaching.

        Intel ships the most advanced CPUs available to consumers and businesses- and does it regularly. If you’re going to complain about them at all, you have to mention the market as well- Intel is only competing with themselves!

        And if you want to complain about Nvidia, don’t forget to mention stuff like having smooth multi-GPU operation figured out years before AMD was willing to admit that they’re products had problems, and only after they’d been caught- and then let’s not forget to mention that very few use them for compute cards outside of the cryptocurrency mining scene.

        Is your picture still overly rosy? How’s that AMD stock doing?

          • kukreknecmi
          • 6 years ago

          If you call Cuda as open then you the way out of your league. It’s all proprietary, nvidia may just allow you to backend it thats all.

        • ronch
        • 6 years ago

        Whoa dude, don’t be so defensive. I love AMD too. It’s just that I admit many of their technologies aren’t THAT ‘GASP!’ or are cool on paper but aren’t/weren’t implemented as well as they could’ve been, or see the light of day half-baked. I’m just being honest here. I gasp when there’s something to gasp for and meh when I’m being Krogothed.

        APUs – They’ve been harping about how this new paradigm will shift the entire computing landscape for years, man, and yeah, it does look promising, but after so long, there are so few apps that actually use GPGPU compute. Programmers need to specifically code for GPGPU computing, which is a given, and which is partly why adoption is slow, but forcing devs to use a thousand GPU cores isn’t a walk in the park and they’d much rather have a chip that’s fast no matter what. We’re getting there, I guess, but come on, even AMD’s software stack isn’t ready yet. These paradigm shifts take years to be embraced by developers and it doesn’t help if you’re selling APUs but your software infrastructure is incomplete. The fact that AMD is a small company doesn’t help either.

        Bulldozer – The idea of shared resources is actually one of the most interesting concepts in CPU design I’ve EVER seen, but it was just sabotaged by the narrow width of the two discrete integer clusters (or cores, if you will). Also, while the Bulldozer concept is very interesting, it’s not exactly something that I’d say is so revolutionary and so difficult to pull off, certainly not for the likes of Intel or AMD that have extensive CPU design experience (ok, maybe not AMD). A front end feeds two separate integer clusters… so what? You know what’s amazing? Designing a scheduler that can extract so much parallelism from code and feed so many execution units/ports. Now that’s something that’s really hard. Do you know someone who makes those? Going back, you know why those integer clusters are so narrow? They COULD’VE been much wider if AMD took the time and EFFORT to design a better, smarter scheduler. And by the way, throwing more cores onto a die and putting in a ton of (slow) cache instead of developing faster cores is just plain lazy, don’t you think?

        Kaveir – Killed by 28SHP. Price isn’t compelling either given our other options.

        Integrated memory controller – Although it allowed AMD to kick Intel’s crotch back in 2003, in the words of Pat Gelsinger, “You can only integrate the memory controller once. After that, what do you do?” It’s not exactly hard to imagine that the IMC will need to get inside the CPU package to unlock more performance. It’s not amazing, it’s just common sense. It’s the low-hanging fruit. But even so, when Intel did IMC it really kicked AMD’s butt. AMD had all the time in the world (except perhaps money) to perfect their memory controllers, but…..ah… so much for Randy Allen’s bragging about their Direct Connect Architecture. He could’ve been better off being a sleezeball marketing guy.

        Mantle – It’s no secret that AMD’s DX drivers leave something to be desired, so when Nvidia’s DX driver gives a good showing against Mantle, it just doesn’t gain AMD much respect. Besides, a new API for AMD’s GPUs isn’t exactly mind-blowing. If Nvidia made its own API I wouldn’t clap my hands either.

        There are many other things that AMD is doing and which doesn’t bag them a lot of respect, but you do get the idea with these examples. Don’t get me wrong: AMD is a formidable engineering firm, but I think they need to stop being a Jack of all trades and instead focus on what they REALLY have to do to gain respect (and dollars), and do it really, really well.

          • xeridea
          • 6 years ago

          The GPGPU to accelerate programs, this is to be expected. There was similar thing that happened when multi-core cpus came to be. It is becoming more common now though in many apps. The big thing on that will be being able to use HUMA to significantly reduce programming complexity, and speed at which GPU accelerated apps run. It would have been nicer if HUMA came sooner, but these things take time.

          Mantle is a good voice out on the fact that game performance shouldn’t be left up so much to specific optimizations per game because the API is so unpredictable and inconsistent because it is to far abstracted. From what I have read, programmers love it because it is far easier to optimize effectively, especially for multi-core rendering.

          Bulldozer was a bit of a letdown, I think it ultimately came down to limited resources to optimize it before release, and various unexpected delays. If it was at about Piledriver status on release, it would have been a lot better reception.

            • smilingcrow
            • 6 years ago

            “Bulldozer was a bit of a let down”

            You are a master of the under stated.

        • Klimax
        • 6 years ago

        Just a note: Tesselation was done in short timeframe by both companies. N/RT Patches. (Died later)
        Why are you listing DisplayPort?

        As for Intel, I suggest to look architecture of core including hard stuff like branch prediction. There is a reason, why others have hard time catching up. (Also it is said that Intel is in many areas quite ahead of others including academics)

        NVidia? First GPGPUs. And then as ATI/AMD slew of technologies for games, but they are rarely individually known. Although NVidia was first with HW T&L.

        • Ringofett
        • 6 years ago

        [quote<] 50% smaller transistor size and 19% less power usage, I don't think the technology helped much.[/quote<] Excellent point. Must be why AMD dominates the data center. /s Fanboys should, as a 'best practice', read their posts firsts to make sure they at least aren't too obvious. But you're right, AMD's innovated gloriously in to the bargain bin of the PC industry.

        • NeelyCam
        • 6 years ago

        [quote<] This is just a proprietary connector to lock companies in, and stacked memory, which has been an idea floating around the interwebs for years. [/quote<] That's not giving credit where credit is due. AMD has nothing to compete with this. Why? Maybe because they've been cutting R&D left and right to become profitable in the short term..?

      • BlondIndian
      • 6 years ago

      All three companies have innovated . To say none of them do is ignorant . You can’t compare different technological innovations , that would be apples and carrots and gorillas .

      I could start listing some of the innovations by AMD , but I don’t think I that’ll helpful here .

      Intel and NVidia have huge R&D budgets and many fancy new technologies . However the AMDs and ARMs of this world manage to stay relatively competitive with much less resources is itself an acheivement . When one of these slag off , the bigger guy gets complacent . Intel has been Meh for the boundaries of imagination for the last couple of years as AMD CPUs underperformed.

      HSA seems to be more groundbreaking than TSX or more cache(iris pro) or more cores(titan Z) . The implementation and software and CPU cores are still works in progress though.

        • xeridea
        • 6 years ago

        I know that there is a lot of innovation that goes into updating architectures and various aspects that all 3 companies have done. I was just wondering why AMD was not mentioned, when the thinking outside the box type innovation has mainly been AMD recently. Intel hasn’t had that much incentive recently, and Nvidia seems to just come up with proprietary niche features.

          • Airmantharp
          • 6 years ago

          Intel hasn’t had much incentive recently because AMD hasn’t been innovating, and Nvidia’s proprietary niche features are innovation!

          Fun how we can twist things, isn’t it?

            • ronch
            • 6 years ago

            I’m not saying AMD’s not innovating. All I’m saying is that more often than not, I get more impressed by Intel’s and Nvidia’s innovations than AMD. I’m sticking with AMD though. They do have promise. They just need to focus on a few areas and do them really, really well instead of spreading themselves too thin and bothering with trivial, practically not-so-important matters such as TruAudio (yes I’m sure it’s gonna be nice, but will it make them rich?).

            • Airmantharp
            • 6 years ago

            I agree!

            I’m just responding to xeridea’s AMD-clinginess. I sure hope AMD manages to push some real innovation (i.e. PERFORMANCE) into the CPU and GPGPU markets, as their competitors aren’t going to let up :).

            • ronch
            • 6 years ago

            Any chip firm can design a processor. Problem is, in this game you win if you have three bases covered: performance, energy efficiency, and cost. Bag all three and you bag money. But it’s frickin difficult. It takes skill and innovation, and selling your products with the help of incentives such as freebies and such is no way to make a killing.

      • NeelyCam
      • 6 years ago

      When reading that NVLink paper, I realized that I haven’t seen anything like it from AMD for years. NVidia develops new efficient links all the time, as does Intel, Rambus and IBM. The last time AMD did that was HyperTransport.

      To me it looks like the high-performance computing war will happen between NVidia and Intel, and AMD has dropped out.

    • UnfriendlyFire
    • 6 years ago

    Nividia’s new connector reminds me of the AGP.

    There was no AGP 2.0.

      • Pzenarch
      • 6 years ago

      Wasn’t there? I could have sworn they released with AGP 1x, then a while later released 2x at double the bandwidth, following with 4x and 8x continuing the trend. Okay, so the naming wasn’t the same, but it went through seemingly similar iterations.

      A quick Google turns up this seemingly aged and not especially senior resource, but it rings bells.
      [url<]http://www.playtool.com/pages/agpcompat/agp.html[/url<]

        • BlondIndian
        • 6 years ago

        AGP had 1x 4x and 8x IIRC

          • Scrotos
          • 6 years ago

          And 2x as the link you replied to shows.

          Good old AGP Pro. Now that wasn’t very widespread.

          Oh, and the link shows the spec went up to 3.0 to address the original poster.

    • NeelyCam
    • 6 years ago

    Volta is delayed because TSMC’s 20nm is delayed.

      • BlondIndian
      • 6 years ago

      So Maxwell is 28nm ?

      Or was Volta renamed to Pascal ?

      (the latter makes more sense to me – Volta was supposed to have stacked DRAM , right ?)

        • Airmantharp
        • 6 years ago

        Maxwell was supposed to be first gen 20nm, and then Volta was second gen, but Maxwell has been moved up to 28nm because TSMC.

      • ronch
      • 6 years ago

      Dunno why you were downthumbed. If anything, it’s TSMC that deserves a downthumb. We should be at 10nm already but no, they keep blowing it. Here Neely, lemme give you a hand (or rather, a thumb).

        • USAFTW
        • 6 years ago

        Maybe because getting at lower geometry transistor size and keeping the heat density and power in check is hard and each wafer keeps getting exponentially more expensive since 55 nm?

          • ronch
          • 6 years ago

          I thought TSMC is being run by EXPERTS?!?!

            • nanoflower
            • 6 years ago

            They are experts at what they do. They just aren’t as “experty” as the Intel experts. That or they just don’t have enough money to keep up with Intel.

      • Ninjitsu
      • 6 years ago

      Not sure how much sense that makes, second gen Maxwell is still supposed to be 20nm (unless TSMC is really late with 20nm, like mid 2015 types, which would probably explain Intel taking it easy with 14nm), and Maxwell’s unified virtual memory has gone over till Pascal too.

      Since Nvidia tends to switch process with each generation (but TSMC was late so that plan for Maxwell wasn’t possible), the 2016 date further adds to the speculation that Pascal will be implemented with TSMC’s FinFets.

      Volta then gets the next smaller node.

      Of course, if Big Maxwell ends up on Maxwell too, then i think TSMC isn’t going to hit the market with 20nm before 2016 (implying they ramp up production in mid 2015), since Pascal would be on that, then.

        • jihadjoe
        • 6 years ago

        I dont think the boffins at Intel are taking it easy at all, it’s just that the process tech is approaching the physical limits of what is possible with silicon semiconductors and things are getting HARD. Like pushing toward C hard.

        A few years back there were all sorts of stories about how Intel isn’t measuring lithography right, and how TSMC/GloFo at the same process node would be heaps smaller, but I think people severely underestimated the difficulty of getting things down, and Intel definitely deserves more respect just for having got 22nm, and now 14nm working in the first place.

      • the
      • 6 years ago

      Maxwell was to be a mix of 28 nm and 20 nm. The low end 28 nm parts have already started to arrive. The 20 nm versions are due later this year (though the big Maxwell might be 2015).

      Volta was always going to be 20 nm and released well after TSMC had 20 nm in production volume.

      From all indications that I’ve seen thus far, Pascal is simply a renaming of Volta.

        • nanoflower
        • 6 years ago

        Scott pointed out above that Volta is still coming. It’s just been pushed back a bit with Pascal taking the spot that used to be held by Volta.

          • Ninjitsu
          • 6 years ago

          Not sure if that doesn’t count as Volta being renamed to Pascal…i mean if Nvidia wanted to sell a rock and call it Volta it pretty well could, it’s just a code name for something (that probably doesn’t exist yet).

    • Bensam123
    • 6 years ago

    Hmmm… wonder what their ‘energy breakthrough sauce’ is… PCIE 4.0 will be catching up to this and it’s adding other features as well. If it’s going to proprietary you think they’d add more to it. Maybe the energy efficiency will add up to more for giant clusters, but for one computer this isn’t even close to a big deal.

      • NeelyCam
      • 6 years ago

      Might be related to NVidia’s interconnect papers from this year’s and last year’s ISSCC. I’ll check them out later and summarize

        • NeelyCam
        • 6 years ago

        OK, I checked out the NVidia paper from 2014 ISSCC (paper 26.1). Some of the key points:

        – 20Gb/s differential link (6.5pJ/b), channel loss target <20dB at Nyquist (10GHz). 28nm TSMC.
        – The slides imply that this could be used for CPU/GPU communication or GPU/HMC communication
        – The slides also mention a “Brick”, so sounds like this paper indeed discusses the NVLink
        – The physical layer is relatively conventional, but well implemented for low power.
        – Differential voltage swings of individual transmitters can be adjusted based on the loss of the trace (for saving power)
        – Trace length limitations reduce channel loss, enabling simpler and more power efficient equalization techniques

        Overall, it’s a well-designed link. I’m not sure if it qualifies as a “fundamental breakthrough”, though… I didn’t see any huge new innovations there (unlike in the NVidia’s 2013 ISSCC I/O paper, which was some 5x more efficient than this one). Curiously, the next paper (26.2) was from Intel, showing a link that was more power efficient, and operating at 32Gb/s. I wonder if that’s coming to the market in some form.

      • BlondIndian
      • 6 years ago

      Since they’ll be Nvidia specific , they can optimize the link to better suit GPUs . Like Scott mentions the tighter specs like short link traces on MB and Server use will enable more optimizations .

      PCIe is a jack of all trades and so will not be as good as a targeted proprietary link .

      Nvidia will add some low latency dynamic down-clocking(since it’s all Nvidia products) to cut down power .Makes sense in servers and HPC .

      No use for the PC space though. PCIe 4.0 will probably have a NVLink 2.0 counterpart …

    • MadManOriginal
    • 6 years ago

    Can you please explain to me what a ‘mezzanine connection’ is?

      • fredsnotdead
      • 6 years ago

      I’d like to know as well. Perhaps the card stacks horizontally on the motherboard, kind of like a daughter board with connectors across the whole bottom of the card?

      • BlondIndian
      • 6 years ago

      Exactly what I was thinking after reading the article.

      • Convert
      • 6 years ago

      [url<]http://www.google.com/search?q=PCI+Mezzanine+Card&num=100&safe=off&nord=1&site=webhp&source=lnms&tbm=isch&sa=X&ei=SGwyU8T1N9beoASIv4KgAg&ved=0CAkQ_AUoAg&biw=1920&bih=985&dpr=1[/url<]

      • Scrotos
      • 6 years ago

      It’s like when you added the rainbow runner upgrade to the matrix millennium back in the day.

    • sschaem
    • 6 years ago

    I so wish Intel would do this for some version of skylake. (on package memory)

      • brucethemoose
      • 6 years ago

      Some Haswell CPUs already have a bit of on-package memory, don’t they?

        • UnfriendlyFire
        • 6 years ago

        It’s not stacked though. And you can’t boot a computer that doesn’t have RAM plugged in.

      • BlondIndian
      • 6 years ago

      Intel is not a DRAM manufacturer – They haven’t focused on any of the new Memory techniques (atleast in public)

      I think they’ll wait to see who wins the next-gen memory war and then jump ship. Right now no one has production parts of note .
      Nvidia is taking a risk – staying on the cutting edge comes with it’s share of risks.

        • Ninjitsu
        • 6 years ago

        Random fact: Intel started as a DRAM manufacturer, and made processors to sell more DRAM.

    • tipoo
    • 6 years ago

    Interesting. Working with GPGPU, it’s more about latency now than bandwidth. You lose a lot of the performance gains of GPGPU work by sending things back and forth from one memory pool to the other through PCI-E. If this could remove that bottleneck, GPGPU would be much more viable. This is why AMD is pushing forward with unified APUs, Intel is doing the same, Nvidia does not make x86 CPUs so they are doing this instead .

      • BlondIndian
      • 6 years ago

      Yes , good point . Although clumping GPGPU as a single kind of workload is an oversimplification .

      Lower latencies will enable new workloads to be GPU accelerated . Higher Bandwidth will enable others .The combination will speed up almost all workloads.
      The idea of putting Denver on GPU was also along similar lines.

    • chuckula
    • 6 years ago

    Don’t get too hot & bothered about Nvlink: It’s exclusively for use in supercomputing environments and isn’t designed as some sort of replacement for normal PCIe. If it’s done properly it probably makes relatively small transfers of data between nodes more efficient with lower latency since that sort of thing can actually be important in some supercomputing workloads.

    Additionally, as we see time & time again, PCIe in its current form is usually overkill in the bandwidth department for playing video games where most of the communication is bulk transfers of textures/geometry/etc. up to the GPU’s memory.

      • Parallax
      • 6 years ago

      It’s a good thing that PCIE is not currently a bottleneck like other interfaces, and let’s hope it stays that way.

        • Waco
        • 6 years ago

        PCIe is a huge bottleneck in HPC workloads…

          • Parallax
          • 6 years ago

          Is there anything better for HPC? Compared to SATA, DisplayPort, USB, etc… which often limit the speeds of devices they are connected to, I thought PCIE was doing pretty well at least WRT video cards.

            • Deanjo
            • 6 years ago

            [quote<]Is there anything better for HPC?[/quote<] Hypertransport offers greater bandwidth.

        • HisDivineOrder
        • 6 years ago

        Seems like they think the increase in memory bandwidth and lower level access promised by DirectX 12 and/or OpenGL might lead to PCIe becoming a bottleneck.

        I’m not so sure.

    • Deanjo
    • 6 years ago

    Ironically, I’m unaware of any Cuda bindings for Pascal.

      • chuckula
      • 6 years ago

      DELPHI! FTW!!

Pin It on Pinterest

Share This