Nvidia Tesla V100 throws 21 billion transistors at GPU computing

Machine learning is one of the most demanding applications for GPUs today, and Nvidia has been riding that wave with huge graphics chips dedicated to compute tasks. The Tesla P100 was the crown jewel of the Pascal architecture for general-purpose GPU computing, and today, Nvidia took the wraps off its first Volta GPU to continue that mission. Say hello to the GV100 GPU aboard the Tesla V100 accelerator.

Every spec of the V100 is eye-popping. Nvidia CEO Jensen Huang says the 815mm² chip is made at the reticle limit of TSMC's 12-nm FFN process. Its 21 billion transistors make up 5120 stream processors running at a boost clock of 1455 MHz, good for 7.5 TFLOPS of FP64 operations or 15 FP32 TFLOPS. Nvidia also provisioned this chip with 20MB of register files for its shader multiprocessor (SM) units, 16MB of cache, and 16GB of HBM2 memory good for 900 GB/s of theoretical memory bandwidth. The chip talks to other components in a system with a 300 GB/s second-generation NVLink interconnect.

The Tesla V100 also includes dedicated hardware for the tensor operations critical to deep learning tasks. Nvidia claims the chip can process tensor tasks at 120 TFLOPS (perhaps a typo for TOPS, the metric usually cited for tensor operations). For comparison, Google's TPU ASIC can deliver a claimed 92 TOPS.

The full Volta GPU comprises 84 SMs, 5376 FP32 SPs, 5376 INT32 SPs, 2688 FP64 SPs, 672 "tensor cores," and 336 texture units. Almost certainly because of the chip's enormous size and the associated yield challenges, the Tesla V100 doesn't have all of these resources enabled.

Source: Nvidia

Nvidia will begin offering Tesla V100 accelerators as part of its DGX-1 compute nodes in the third quarter of this year, and it says it'll begin making cards available to its partners in the fourth quarter.

Comments closed
    • derFunkenstein
    • 3 years ago

    Look at the size of that…

    • AnotherReader
    • 3 years ago

    Very impressive! Is this the largest single die microprocessor in history? Nvidia’s HPC and AI lead has increased. AMD can breathe a sigh of relief that this is too large to become a 1180 Ti.

    • DeadOfKnight
    • 3 years ago

    If they made an 815mm² fully-enabled ~375W Titan X(v?), rich folks may be willing to fork over $1,500 for one. It would basically be equivalent to two GTX 1080s without the downsides of SLI. Not suggesting it would ever happen, since it’s unlikely that it would be profitable​ for them to do so. However, it would be the first time the price is right to buy a Titan GPU in terms of performance per dollar.

    • Kougar
    • 3 years ago

    Interesting… Ryan Smith claims Volta is the biggest change to the underlying architecture since Fermi. I look forward to learning the details. 🙂

    • tipoo
    • 3 years ago

    So this 12nm is the same density as 16nm, right? Which itself wasn’t different from 20nm, but with finfets? Intel is right, we should switch to transistors/mm2, this is a mess.

      • Airmantharp
      • 3 years ago

      It’s probably more dense for Nvidia’s products, given that this process was tuned specifically to their standards, though it’s clear we can’t compare the linear measurement standard between companies without qualification.

      • Rza79
      • 3 years ago

      CLN12FFC
      Power: – 25% or Performance: +10%
      Area Reduction: 20%

        • tipoo
        • 3 years ago

        Hm, interesting. Was just going off the density relative to the Titan X.

    • psuedonymous
    • 3 years ago

    Hang on, who the [b<]hell[/b<] is fabbing the interposer for that thing?! Not only is the die already reticle limited, the GV100 die alone at 815mm^2 is only ~200mm^2 smaller than [i<]the entire Fury X assembly[/i<] (1011mm2). From [url=https://techreport.com/review/28499/amd-radeon-fury-x-architecture-revealed/2<]Techreport's review[/url<]: [quote<]The reason why Fiji isn't any larger, he said, is that AMD was up against a size limitation: the interposer that sits beneath the GPU and the DRAM stacks is fabricated just like a chip, and as a result, the interposer can only be as large as the reticle used in the photolithography process. (Larger interposers might be possible with multiple exposures, but they'd likely not be cost-effective.) In an HBM solution, the GPU has to be small enough to allow space on the interposer for the HBM stacks. Koduri explained that Fiji is very close to its maximum possible size, within something like four square millimeters.[/quote<] Even if the rendered die assembly shot is completely wrong the the HBM modules are not 2-per-side but all 4 aligned along a single side, that's not enough room for everything to fit onto the interposer. Either Nvidia have found a fab house with jumbo reticle, or the GV100 sits on multiple interposers.

      • K-L-Waster
      • 3 years ago

      My guess is multiple interposers. That would increase costs prohibitively for a gaming card, but given the prices NVidia will be charging for these, it probably won’t be a problem.

      • chuckula
      • 3 years ago

      [Edit: Benetanegia appears to have it. Full-size interposer made with double patterning]

      Actually you may have answered your own question:
      [quote<]Even if the rendered die assembly shot is completely wrong the the HBM modules are not 2-per-side but all 4 aligned along a single side, that's not enough room for everything to fit onto the interposer. [/quote<] The interposer might not be filling all the space underneath the GV100 die. Instead, it just occupies enough space to handle the HBM2 stacks and the portion of the GV100 die that connects to them. Having all of them arranged on one side of the package makes that easier to do.

      • the
      • 3 years ago

      They could be using an organic interposer which doesn’t have the same area limitations as a silicon based on.

        • psuedonymous
        • 3 years ago

        HBM relies on a silicon interposer to operate. To use a normal PCB as an connector board requires HMC instead.

          • Benetanegia
          • 3 years ago

          An organinc interposer is not a PCB. It is exactly the same as a silicon interposer except the material is organic. That being said current HBM tech does require a silicon interposer.

      • willmore
      • 3 years ago

      Given the low volume of the parts, have you considered that they might not be producing the interposers with optical lithography?

      • Benetanegia
      • 3 years ago

      They used double exposure on the interposer, according to Ryan Smith from AT.

        • UberGerbil
        • 3 years ago

        Just as the original TR article quoted above said
        [quote<] (Larger interposers might be possible with multiple exposures, but they'd likely not be cost-effective.) [/quote<] "Cost-effective" depends on the price you can demand, and in this case that price clearly is very, very high.

          • Benetanegia
          • 3 years ago

          Clearly. What is it? $17k each or something like that? They could even get one die per wafer on both the chip and the interposer and still remain profitable? But of course yields are going to be substantially better than that, even if they are still atrocious by general standards.

          Breaking even on a $3 Billion investment isn’t that easy tho, so the high price is somewhat justifiable. High risk, high reward.

    • Tristan
    • 3 years ago

    why to spend 100000$ for these 2 billions transistors used for these tensor cores ? Custom AI chip can do the same and will cost only 100$ or less

      • Airmantharp
      • 3 years ago

      If your focus is AI, you may not- but Tesla GPUs have uses for workloads other than AI.

      • K-L-Waster
      • 3 years ago

      Please show us where you can get this kind of computer power for $100.

        • chuckula
        • 3 years ago

        That’s easy. The back of the truck that’s shipping these cards!

    • kuttan
    • 3 years ago

    With a 815mm2 die space one can use V100 as a mirror as well 😛

      • tipoo
      • 3 years ago

      A $150,000 dollar mirror to show the world…Something. I’m not sure what.

    • ptsant
    • 3 years ago

    Volta seems like a great product. Unfortunately, it comes with a bunch of proprietary standards (CUDA must die) that stifle competition. The price is also probably obscenely expensive, but that doesn’t matter to Amazon, Google and the like who need some deep-learning horsepower for their Voice-driven AI “assistants” that spy on us 24/7.

    Meanwhile, the incompetents at Radeon Technology Group can’t even produce a competitor to the 1070 that actually brings money (gasp!) to the company. Fire Raja yesterday. Ideally, have him escorted away covered with tar and feathers.

    An I still think the RX480 is probably the best card to buy on price/perf.

      • Krogoth
      • 3 years ago

      The high-end gaming GPU market doesn’t provide the big cash anymore. Even Nvidia knows this.

      That’s why big versions of their next generation architecture have always been focus on general compute first and graphical performance second since Fermi. The bulk of discrete gaming GPU revenue actually comes from $149-$299 market. The high-end gaming market is just halo effect and prestige at this point.

      The gaming versions of Volta is going to be “crippled” and cut-down versions of this behemoth.

      I hardly called AMD’s efforts incompetent either. They put a nice fire on Nvidia’s backsides with RX480/RX470 launch. Vega’s impending launch is making them somewhat anxious.

        • Airmantharp
        • 3 years ago

        Why do you believe that there will be no lower-precision GV102 for the next Titan and top-end -Ti parts?

          • Krogoth
          • 3 years ago

          Titans and “Ti” are just binned general compute parts (Consume too much power for Telsa/Quadro brand and/or have defects).

          It is akin to X series for Intel CPUs which are really just “rejected” Xeons.

            • Airmantharp
            • 3 years ago

            You don’t seem to be making a distinction between GV100 and GV102…

            • Krogoth
            • 3 years ago

            GP102 axes some FP64 parts, scraps NV-Link and opts for more affordable GDDR5X. “GV102” may or may not follow the same path assuming that it is in the works.

            • psuedonymous
            • 3 years ago

            It’s a physically different die. GP102 is not just a GP100 die with bits fused off or left dark, those units are not actually present in the first place.

            With GV100 being such a LOLHUEG die (with commensurate ultra-low yields) the chances of ‘GV102’ being a binned down GV100 die rather than a whole separate die is basically “no”.

            • Airmantharp
            • 3 years ago

            +3 for spelling it out, I got tired of the back-and-forth and decided to quit before the [i<][b<]real[/b<][/i<] sarcasm came out! And that's basically it. The Gx102 part for Pascal was the same basic size as the Gx100 part, but filled to the brim with only the lower-precision units that are good for gaming, which means that it had more gaming-focused compute units than even the Gx100 parts. So that's what we expect with a hypothetical GV102: similar die size, but all low-precision compute units, and likely using 384-bit GDDR6 bus instead of the more expensive HBM like the Pascal-gen Titans and 1080Ti.

        • Kretschmer
        • 3 years ago

        Then why does the latest Steam survey show the RX 480 at 1.1% and the 1070 at 3.3%?

          • Krogoth
          • 3 years ago

          Steam survey has a massive high-end gamer bias in its samplings and it mostly covers North America market. It is no shock that 970 and 1070 hold comparably large numbers.

      • Airmantharp
      • 3 years ago

      CUDA tops your list of proprietary standards that must die?

      I won’t claim that CUDA is ‘great’ for the industry as a whole and for competition, but Nvidia does make the most powerful GPUs for compute and otherwise and CUDA is the best way to get every ounce of performance out of those GPUs.

      Yes, it’s proprietary, but understand that this doesn’t actually matter where these things are going: big iron. If AMD were competitive on the hardware front (and they could be soon), they’d ensure that they at least have their software stack figured whether that be extensive support for optimizing OpenCL or some other solution.

    • Unknown-Error
    • 3 years ago

    Very impressive indeed. nVidia truly owns the game!

    • beck2448
    • 3 years ago

    Very impressive. Can’t wait to see the whole Volta family.

    • torquer
    • 3 years ago

    But will it run Crysis?

    • Krogoth
    • 3 years ago

    It looks like a GP100 on roids not that it is a bad thing.

    Just don’t expect massive leaps in GPU performance for lesser parts that will eventually distill down to lesser parts though.

    The real question is whatever Nvidia will still stick with GDDR5X or move onto HBM2 for their customer-tier GPUs.

      • Srsly_Bro
      • 3 years ago

      Are you not impressed with 16 gbps gddr6?

      • ImSpartacus
      • 3 years ago

      By GDDR5X, you mean GDDR6, right?

      We already had a memory vendor state that they had a customer with a 384-bit GDDR6 memory subsystem.

      [url<]https://techreport.com/news/31790/sk-hynix-fires-up-its-foundries-for-16-gb-s-gddr6[/url<] Who do you know that would do such a thing? AMD? No, it's GV102.

    • the
    • 3 years ago

    Something seems a bit off in the the table. Or at least something has been removed from Volta that was originally in Pascal that isn’t documented.

    The number of transistors in the chip increased by 33% but all the major functional block increased by 40% to 50% in number. Granted not everything on the chip increased (still only need a single PCIe controller, same 4096 bit wide HBM2 bus, etc.) but things don’t seem to add up right. This also doesn’t include the extra logic necessary for the tensor units.

    Missing from the chart are FP16 rates which is something nVidia was promoting for deep learning work. In Pascal, FP16 workloads were done via their own units so this could have been removed for Volta. This would require Volta’s FP32 hardware to handle FP16 workloads, though this [i<]could[/i<] keep FP16 rates the same as the FP32 hardware. The addition of the tensor unit accepting FP16 inputs so this could offset the removal of dedicated FP16 units else where for the most common FP16 based applications. (Or the FP16 units in Pascal simply evolved into the tensor unit.) I'd say that ROPs could have been removed from the design but unlike the open question if they existed in the GP100, we know immediately on the GV100 that they are indeed part of the design. The DGX Station has cards with DisplayPort out so there has to be at least some raw pixel pushing capabilities. Fabrication doesn't seem to have improved density at all either. 610 mm^2 with 15.3 billion transistors is 25.0 million transistors/mm^2 vs. Volta's 25.9 million transistors/mm^2. That's barely an improvement going from 16 nm FinFet to 12 nm FFN.

      • chuckula
      • 3 years ago

      You are right about the fabrication density part. The “12nm” node is — to use another marketing technique — effectively a “16nm++” refinement of the 16nm node.

      That’s still pretty much equivalent to the other high-end GPU we are seeing in the same timeframe so it’s not a disadvantage but it’s not a truly new node.

      • tsk
      • 3 years ago

      Wouldn’t surprise me if 12nm FFN isn’t much of a density improvement, it’s TSMC after all.

      • CampinCarl
      • 3 years ago

      One of the key aspects of Volta (as I see it anyway, aside from the 16×16 tensor units) is the increased L1 cache.

      It was completely re-engineered and combined with the shared memory in the SM. So there’s a lot of transistors there, too.

      They made a lot of improvements to the MPS engine as well on the chip, as well as changing the thread schedulers and warp synchronization stuff so they can support what they referred to as ‘starvation-free’ algorithms (basically, stuff that spin-waits in a thread).

      You’ll want to check out the inside-volta link that Jeff posted below, and also go look at the talk on GTC On Demand (when it appears)…I just sat through it, and I came away thinking one thing: The Pascal team must have really been jealous of the Volta team, because Volta strikes me as much more of what Pascal should have been.

      • psuedonymous
      • 3 years ago

      [quote<] In Pascal, FP16 workloads were done via their own units[/quote<] In GP100, FP16 was handled by the FP32 units, at double rate (i.e. you packed two separate FP16 operations together and executed them in one FP32 operation). This did not filter down to GP102/GP104 though.

    • Chrispy_
    • 3 years ago

    I’ve not been paying attention, since I thought Volta was too far away to worry about still.

    What’s the architectural difference between Pascal and Volta?

      • Jeff Kampman
      • 3 years ago

      [url<]https://devblogs.nvidia.com/parallelforall/inside-volta/[/url<] I think the big things are independent execution paths for FP32 and INT32 data, extremely fine-grained control of threads in flight, the merging of the shared memory and L1 data cache per SM, and the dedicated tensor hardware.

        • Chrispy_
        • 3 years ago

        Thanks.

        If I’m not mistaken that looks like relatively minor evolutionary tweaks to realtime graphics processing and a fairly serious upgrade to machine learning capabilities?

        As nice as it would be to have a 5120 core GPU, that’s not an affordable die-size for consumers – I’m guessing these are being sold to Amazon, Microsoft, Tesla etc

          • MathMan
          • 3 years ago

          What would it take for something not to be a minor evolutionary tweak?

          I see quite a bit of significant changes, especially in terms of integer performance and cache/shared memory handling. Volta may well be the architecture that brings up back to the Fermi days of outstanding compute performance… but without the power consumption.

            • the
            • 3 years ago

            Integer performance appears to be helped by simply spinning off that functionality into its own independent block. Under Pascal, FP32 and INT32 functionality were tied together.

    • USAFTW
    • 3 years ago

    Since GP102 had the exact same FP32 ALU count as GP100 but higher clocks, could we possibly, ever see a 5360j-“core” Titan XV in a year or so?
    Time isn’t kind to RTG, if recent leaks of Vega’s performance turn out to be true, they’re in dire trouble in the GPU division.
    All the worse for us, Nvidia left on its own tends to steadily jack up prices.
    Used to be these two companies would switch positions as the GPU performance crown holder. If my memory serves me right, Nvidia has been top dog ever since Maxwell and GM204.

      • K-L-Waster
      • 3 years ago

      Let’s not get carried away. AMD has been focused more on the affordable mid-range of GPUs since Polaris, which may not generate awe-inspiring headlines but can definitely generate a lot of sales volume sold correctly.

      Chips like this one and cards like the Titans and the 1080ti get alot of attention for sheer performance, but they don’t sell in super huge quantities. As long as AMD can release a card that competes with the XX60 and XX70 range from NV they should be able to turn a profit (which is what they need more than the performance crown).

        • USAFTW
        • 3 years ago

        I agree with you about AMD focusing on high volume parts, but if Nvidia’s direct competitor (GTX 1060) outsells it 5-to-1, the problem still persists. Beyond that, AMD is effectively shielded from sipping on the nectar of that sweet, sweet, high margin, still-higher-volume-than-a-Polaris 10 GPU, high end GPU market.
        Barely turning a profit is not enough these days. Developing efficient GPU architectures and being able to scale them up all the way to 600mm2 range asks for deep pockets and AMD is at a disatvantage.
        Source (FWIW): [url<]http://store.steampowered.com/hwsurvey/videocard/[/url<] GTX 1060 market share: 5.02%. RX 480: 1.1%.

          • K-L-Waster
          • 3 years ago

          Agreed — they do need to actually sell cards for the business model I described to work.

          • ImSpartacus
          • 3 years ago

          Jeez, that’s worse than I thought.

          I know AMD had like 20% market share, but that’s like 16ish% market share. And it on gets worse once you include gp104 and gp102 sales.

          Amd still does sorta well on the semi-custom side, but you’re right that it’s not enough.

        • christos_thski
        • 3 years ago

        Ι agree with your general sentiment. Having said that, AMD has already released a XX60/XX70 competitive card, it’s the R9 Fury. Now they need to get it down to affordability (old R9 Fury stock was actually selling for some great prices recently, but that was probably more of a fluke or stock unloading rather than a real profitable price).

    • synthtel2
    • 3 years ago

    What new graphics-oriented tech are we expecting in GP102/4/etc? I haven’t been paying much attention, but I haven’t seen anything at all.

    The one time in recent memory that Nvidia hasn’t had a good process jump with which to get generational improvements, they added the tiled rasterization trickery instead (though the better yields from a mature process surely helped too). This half-node jump alone doesn’t seem likely to offer the kind of generational improvements Nvidia usually goes for, and tech improvements like Maxwell’s don’t show up every day. If they can make an 815 mm[super<]2[/super<] chip, yields will probably be great on smaller stuff, so die sizes could move up a notch like with Kepler -> Maxwell, but that still leaves perf/W gains questionable. A lot does depend on how good 12FFN actually is, and GV104's launch may be the first that people like us learn about that. Of course I'm hoping for AMD's sake that they don't have anything special here and 12FFN is meh, but it would be undeniably cool to see Nvidia pull their typical generational improvement out of thin air.

    • Mr Bill
    • 3 years ago

    Sir, don’t you think maybe we should make a GPU that uses more than one billion transistors? [url=https://www.youtube.com/watch?v=jTmXHvGZiSY<]Virtucon alone makes over 9 billion dollars a year[/url<]

    • tipoo
    • 3 years ago

    So what’s remotely competitive with this right now, out of curiosity? From Nvidia’s website, though it’s likely not completely unbiased, Xeon Phi isn’t close already, let alone after this release

    [url<]http://images.nvidia.com/content/tesla/justthefacts/fact1-chart.png[/url<] There's the Google TPU, but that has limited uses.

      • chuckula
      • 3 years ago

      The Xeon Phi is not designed with the same goals as one of these parts. In particular, Xeon Phis are not designed to maximize the peak number crunching power to anywhere near the same degree as a Tesla card, or else Intel would have adopted a completely different design.

        • Beelzebubba9
        • 3 years ago

        What’s the ideal use case for Xeon Phi at this point?

          • chuckula
          • 3 years ago

          Workloads that aren’t just plain parallel number crunching with effectively no data dependencies.
          And they do exist.

          It’s easy for Nvidia marketing to find examples of Tesla being awesome in the right workloads, but Intel’s marketing department can play that game too.

          Oh, and this may shock some people but: Xeon Phis are dirt cheap compared to these bad boys.

            • Redocbew
            • 3 years ago

            Just about anything is probably going to be “dirt cheap” compared to these, but it’s not like they’re targeting this stuff at individuals.

          • UberGerbil
          • 3 years ago

          It’s complicated: [url<]https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240026/[/url<] And of course FLOP/$ matters (to varying degrees, depending on the customer)

        • the
        • 3 years ago

        The Larrabee derived cores are weak but the Silvermont based cores with AVX-512 are able to obtain sustained throughput far closer to their peak values. The problem is that the GP100 has a such a larger peak, nVidia still comes out on top even if nVidia cannot sustain as much performance with respect to their peak value.

        The idea of adopting another architecture is something they explored in their research (see their Tera-scale 80 core research vehicle) but management has been on an ‘x86 every where’ strategy. Knight’s Landing on-die interconnect stems from this project.

          • chuckula
          • 3 years ago

          If Intel wanted to make a giant chip that’s relatively similar in design to Tesla they would just scale their existing GPU architecture (with some tweaks) up to make a massive chip.

          I’m pretty sure a Tesla would win in a head-to-head competition though since Nvidia has the real expertise in this type of product.

            • tipoo
            • 3 years ago

            The Intel Gen graphics are actually pack a pretty surprising compute punch, relative to their graphics performance. Would certainly be interesting to let that loose.

          • Waco
          • 3 years ago

          They’re FAR better at handling branchy code than any GPU, and they’re a lot easier to feed (memory bandwidth / latency). Volta at least has NVLink, but that’s limited to very specific designs.

      • Ninjitsu
      • 3 years ago

      Xeon Phi’s have the advantage of being able to host systems too, iirc.

    • USAFTW
    • 3 years ago

    Poor Vega

      • ImSpartacus
      • 3 years ago

      Yeah, Vega would’ve been competitive in 2016. Nvidia just isn’t slowing down.

      I smell some fire sales in 2018. I love a good deal!

        • USAFTW
        • 3 years ago

        I think AMD’s marketing regrets putting “poor Volta” in their promotional video right about now.

          • chuckula
          • 3 years ago

          People wonder why I rag on “poor” AMD (ironic isn’t it?)

          It’s because their own marketers define themselves not by Making AMD Great Again (TM) but by making d-bag hits on their more successful competitors.

            • USAFTW
            • 3 years ago

            SAD!!

            • bjm
            • 3 years ago

            Ah, so you admit it, even you are so ensnared AMD marketing that you cannot look past it–no better than an irrational AMD fanboy. Thankfully, we have sites like TR that can review the actual products and judge them based on the merits of their performance, rather than simply rehash or rag against pointless idiotic marketing.

          • tsk
          • 3 years ago

          Haha I was just thinking the same.

    • chuckula
    • 3 years ago

    [quote<]Nvidia CEO Jensen Huang says the 815mm² chip is made at the reticle limit of TSMC's 12-nm FFN process. [/quote<] OK look Nvidia, we all knew you were in a complete panic about AMD and only threw this chip together in a desperate bid to distract us from Vega's launch. But you went full reticle limit. You never go full reticle limit.

      • TwoEars
      • 3 years ago

      you’re so vain

      you probably think this launch is about you

      (I can see this becoming a TR thing now)

        • K-L-Waster
        • 3 years ago

        That isn’t actually a total eclipse of the sun: it’s Jensen Huang holding the chip up….

          • Bobs_Your_Uncle
          • 3 years ago

          But it IS a “Total Eclipse of The Heart” (read: MY HEART)! So yes … This Launch IS ALL ABOUT ME!
          [url<]https://youtu.be/lcOxhH8N3Bo[/url<] I need you tonight, Volta!

        • Waco
        • 3 years ago

        Love it!

        • Laykun
        • 3 years ago

        don’t you …. don’t you

        GPU FABBERS!

      • ImSpartacus
      • 3 years ago

      Except for GM200, which neutered Fiji…

      • dyrdak
      • 3 years ago

      815mm² – holy smoke. I’d like to see yields on these. For the price, the task may as well be distributed and done on commodity hardware (and likely faster).

      • psuedonymous
      • 3 years ago

      All kidding aside, this doesn’t seem like anything aimed at AMD (who really do not have anything targeting the same segment of ultra-high-end neural network training).
      If anything, it seems aimed directly at Google’s TPU. Packing a massive new chip with a whole bunch of brand new matrix-FMA processing cores is a shot right across Google’s bow in that regard. Nvidia seem to be betting that putting out a chip that [s<]it slices it dices[/s<] is both a GPGPU powerhouse and contains the hardware equivalent of several TPUs will be a better value proposition than putting in banks of lower power more efficient TPUs. If I were looking to scale out my datacentre and were limited by physical density, that could make a lot of sense.

        • frenchy2k1
        • 3 years ago

        Have you considered that they are not “aiming” at anyone in particular, mostly because they are the front runner, and are just supplying chips that their customers want?
        This is HPC, where some customers can eat all the computing power they can get.
        Moreover, nvidia is a pioneer in the AI business and far ahead of its competition.

        Google released a Tensor processor that works uniquely on their own framework and is designed for inferencing (running the AI). This is both for training and inferencing.

        As said, though, this has NOTHING to do with AMD, which is not even present in AI and barely in HPC. Although once their initial demand is sat, nvidia will probably release a pro card with those chips, same as they did with GP100. I would guess it will be the Quadro GV100.

          • K-L-Waster
          • 3 years ago

          What you mean they’re releasing a product solely with the intent of selling it? How weird….

      • DPete27
      • 3 years ago

      [url=https://www.youtube.com/watch?v=oAKG-kbKeIo<]tropic thunder reference?[/url<]

    • CampinCarl
    • 3 years ago

    Hey Jeff, hope you managed to get into the main hall; because the stream over in 220 is straight garbage.

    He did mention that a good analogue for the size of the chip was the face of your apple watch.

    I also appreciated that he talked about the obvious difficulties in fabbing it. (Hint: They won’t be cheap…probably ever).

      • Jeff Kampman
      • 3 years ago

      I’m sadly not on the ground at GTC, but the remote stream wasn’t much better.

    • chuckula
    • 3 years ago

    BTW: Here’s TR’s launch story back from the P100 launch: [url<]https://techreport.com/news/29946/pascal-makes-its-debut-on-nvidia-tesla-p100-hpc-card[/url<] A few stats from that article to compare to today's announcement: [quote<]The 610 mm2 GP100 is built on TSMC's 16-nm FinFET process. It uses 15 billion transistors paired with 16GB of HBM2 RAM to deliver 5.3 teraflops of FP64 performance, 10.6 TFLOPS for FP32, and 21.2 TFLOPS for FP16.[/quote<] So an increase of about 6 billion transistors (roughly 40% more) resulted in a die area increase of 215 mm^2 (about 35% larger area).

    • chuckula
    • 3 years ago

    It will be interesting to see how this filters down to a consumer level product about a year from now or so.

    In the meantime, they’ll be selling these bad boys to people with really deep pockets.

    • Magic Hate Ball
    • 3 years ago

    815mm^2?

    Holy cats this is going to be one expensive chip to make.

      • cmrcmk
      • 3 years ago

      We should take bets on whether the MSRP will be more or less than my house’s value.

        • strangerguy
        • 3 years ago

        I think it would be more funny and tragic about taking bets on how cut down GV104 will be and still able to beat AMD.

        • psuedonymous
        • 3 years ago

        $70k for a workstation with 4x V100 GPUs.

          • CampinCarl
          • 3 years ago

          Also, if you buy a DGX-1 at the conference today ($150k) they’ll upgrade you from P100s to V100s in Q3 when they ‘release’ to themselves. I think they said Q4 availability to OEMs.

      • chuckula
      • 3 years ago

      To put that in perspective, that’s within spitting distance of the total area of the four Zen dies that make up a 32 core Naples server part. Except it’s just one hunk of silicon.

      • TwoEars
      • 3 years ago

      21 billion transistors should make it the biggest chip mankind has ever produced, I don’t even want to think about yield rates and such.

        • Magic Hate Ball
        • 3 years ago

        I did one of those yield calculators online. With a 12 inch wafer you could theoretically get 60 per wafer (assuming 100% work which is more and more unlikely as the die size increases….), that’s crazy!

        • the
        • 3 years ago

        By area size it would be the biggest mass produced chip. Previous record holder was the Tukwila Itanium 2 at a staggering 699 mm^2.

          • brucethemoose
          • 3 years ago

          The SPARC64-XII is 795mm^2.

        • maxxcool
        • 3 years ago

        😉 some of the old sparc and fujistu chips were pretty monsterous in the day compared to the wafer size …

      • the
      • 3 years ago

      That is pretty the edge of the reticle limit. I’m actually thinking that GV100 isn’t a single chip on an interposer but a collection whose area adds up to 815 mm^2. If not, wow, that thing is big.

        • Topinio
        • 3 years ago

        Seems probable, I’m not aware of anything else near that big and even 400 mm^2 is reasonably big. A couple of CPUs have been made in the high 600’s, but have any been over 800 mm^2 before?

          • the
          • 3 years ago

          Previous record holder was an Itanium chip at 699 mm^2. There are a couple of big iron chips like the 12 core Power8 that are ~650 mm^2.

        • TwoEars
        • 3 years ago

        They might have designed it for 10nm, but then they had to make it work for 12nm work..

      • Krogoth
      • 3 years ago

      Just like the GP100 before it. It is a ultra-low volume part tailor towards enterprise market like massive Broadwell-EP chips and upcoming Naples/Skywell-EP chips.

    • psuedonymous
    • 3 years ago

    Like GP100, it will likely never grace the PCB of a consumer PCIe GPU.

    But still, DO WANT.

    ::EDIT:: [url=https://devblogs.nvidia.com/parallelforall/inside-volta/?ncid=so-twi-vt-13918<]Nvidia have a page with more details[/url<]. Interesting tidbits include: - NVLink 2 - HBM is from Samsung not SKHynix - Unified memory (almost certainly restricted to NVLink, and therefore POWER systems, though) - FP32 and INT32 are now separate cores (can operate independently and in parallel)

      • jihadjoe
      • 3 years ago

      1) Buy a Quadro GV100.

      2) Use it for games!

        • Krogoth
        • 3 years ago

        Which is pants on head stupid. The GV100 board doesn’t even have video outputs on it. It is a going to be “Tesla” GPGPU only like the GP100 before it.

          • DeadOfKnight
          • 3 years ago

          That’s why he said “Quadro”, but I agree. You’d get more bang for your buck buying a Titan.

            • Krogoth
            • 3 years ago

            GV100 isn’t a Quadro part because the silicon doesn’t have any TDMS/RAMDAC in it (Making more room for more GPGPU stuff). You will have to way until Nvidia releases a version that does have one probably will end-up being a GV102.

            • the
            • 3 years ago

            Apparently it [i<]does[/i<]. The DGX station system has DP outputs.

            • Krogoth
            • 3 years ago

            It is driven by a separate dedicated low-end GPU probably a GP106.

            • MathMan
            • 3 years ago

            See other TechReport article.

            It simply has 4 GV100 chips instead of 8.

            GP100 has video outputs. Why would GV100 not have them?

            • psuedonymous
            • 3 years ago

            We all thought the same about GP100 lacking any video output hardware on-die, but this later turned out to not be the case with the release of the Quadro P100. It’s possible that Nvidia dumped the remnants of video hardware for an extra fraction of an SM or something, but more likely the portion of die area ‘saved’ would be so tiny for it not to be worth throwing away the market for direct video output cards. They spend a big chunk of presentation of ultra-high-end simultaneous simulation and visualisation for VR CAD after all.

          • jihadjoe
          • 3 years ago

          [quote<]Which is pants on head stupid. [/quote<] That's the whole appeal behind it! Remember those old 3dfx commercials? [quote<]The GV100 board doesn't even have video outputs on it. It is a going to be "Tesla" GPGPU only like the GP100 before it.[/quote<] There actually is a [url=http://www.pny.com/nvidia-quadro-gp100<]Quadro GP100[/url<] featuring a fully enabled P100 chip and lots of display connectors. Pretty sure a Quadro GV100 will follow soon after this Tesla launch (just as it did for Pascal).

        • the
        • 3 years ago

        1) Buy DGX station

        2) Play games with four linked GV100 cards.

      • ptsant
      • 3 years ago

      The know-how does trickle down to consumer parts. Especially HBM2 support will have to be integrated in consumer products at some point, I guess. Either that, or AMD has made a huge strategic mistake, which is also not unlikely.

      • Airmantharp
      • 3 years ago

      You don’t want GV100- you want GV102, which will likely be of similar size but with all compute units focused on the lower-precision operations that games stick to.

      Unless of course you can actually use all of that high-precision compute power, well, then get that second mortgage ready for the Quadro release!

Pin It on Pinterest

Share This