Nvidia upgrades its DGX SaturnV cluster with Tesla V100 chips

Nvidia has a lot to talk about at the 2017 Supercomputing Conference, thanks to the presence of its GPUs inside 34 recently-introduced members of the latest TOP500 supercomputer list. Out of those half-thousand machines, 87 now proudly boast Nvidia hardware inside. Furthermore, the company's wares power 14 of the top 20 most efficient supercomputer clusters on Earth. If all that wasn't enough, the company now announced a Volta upgrade to its own DGX SaturnV cluster.

The specs for a single Tesla V100 chip are pretty impressive: 21 billion transistors running at up to 1455 MHz, 5120 stream processors, and 16 GB of on-package HBM2 memory delivering 900 GB/s of maximum theoretical bandwidth. A DGX-1 node packs eight of these enormous silicon wafers into one box for an absurd amount of floating-point power. Nvidia's assembled its DGX SaturnV from a whopping 660 DGX-1 nodes. If you took the GPUs out of SaturnV and spaced them at one-foot intervals, a mile would pass before you ran out of chips.

the system's total potential performance figures are absurd: 5280 Tesla V100 GPUs ready to deliver as much as 660 "AI" (FP16) petaFLOPS, 80 petaFLOPS of FP32 capability, and 40 petaFLOPS of FP64 chops. That outsized capacity for half-precision processing comes at least in part from each Volta GPU's 672 tensor cores.

Nvidia says it plans to use SaturnV for single mission-critical problems in a "hero run" and also to solve time-sensitive internal research challenges. The company says its GeForce product team will use SaturnV to analyze customer data in order to deliver more optimized gaming experiences and help in selecting candidates for job openings. SaturnV already had a hand in simulating 300,000 miles of driving for Nvidia's autonomous vehicle program.

Nvidia touts the efficiency of the new cluster, as well. The company says the cluster provides 15 gigaFLOPS of FP64 compute capability for every watt of power consumption. The company goes on to say that its experience in developing the system, including innovations in scheduling and cluster management, will provide benefits to DGX-1 node buyers. Should Nvidia's theoretical figures hold, it's likely that SaturnV will end up even higher than its current #36 ranking in the Top500 list.

Comments closed
    • torquer
    • 2 years ago

    Makes you wonder what the gubmint has that they don’t talk about, like the NSA’s super computers and custom silicon and what their performance levels are like.

    /tinfoil hat

      • Shinare
      • 2 years ago

      I guess you give the “gubmint” more credit than I do.

    • Ushio01
    • 2 years ago

    Damn 131 computers knocked off the Top500 list in just 6 months.

      • chuckula
      • 2 years ago

      That’s actually interesting because even though the very top of the top 500 doesn’t move much, there is a big proliferation of more powerful supercomputers that are being used by a wider range of institutions than in the past.

    • Krogoth
    • 2 years ago

    [url<]https://www.youtube.com/watch?v=1uoVfZpx5dY[/url<]

    • Srsly_Bro
    • 2 years ago

    But can it mine?

      • Wirko
      • 2 years ago

      It [i<]will[/i<] mine. Something far more profitable than cryptocurrencies, gold, rhodium, oil, lithium, or cetera.

      • maxxcool
      • 2 years ago

      Mines customer data like a BOSS …

    • jihadjoe
    • 2 years ago

    I wonder how hard those tensor cores are to program for. IIRC when Vega launched there was some talk about using FP16 for gaming because precision wasn’t really that much of a necessity. If that move does happen, then big Volta might end up being a monster gaming chip.

      • NTMBK
      • 2 years ago

      They’re essentially useless for gaming. Take a look at the CUDA API for programming them. The only thing they can do is multiply big FP16 matrices together (and need the entire warp working in parallel, which breaks the normal threading model for graphics shaders). You can’t use them for general purpose FP16 compute.

      EDIT: Of course they should be fantastically efficient for deep learning!

      EDIT 2: Here’s the API for reference: [url<]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma[/url<]

      • tipoo
      • 2 years ago

      A lot of it would be wasted transistors for gaming. Like you said some things can be moved to half precision FP16 because full precision is not needed – so of course very little in a game would benefit from DOUBLE precision.

      So I mean, could you write a game around this, yeah, but it would be a lot of wasted effort and money.

    • Anonymous Coward
    • 2 years ago

    [quote<]to analyze customer data in order to deliver more optimized gaming experiences[/quote<] This is a load of bullcrap.

      • K-L-Waster
      • 2 years ago

      What makes you say that?

        • willmore
        • 2 years ago

        A solid grip on reality?

          • Anonymous Coward
          • 2 years ago

          I admit to being curious about the (presumed) nugget of reality that underpins the vague marketing speak.

            • K-L-Waster
            • 2 years ago

            GeForce Experience has a feature that recommends “optimal” settings for the games on your system. While it’s possible NVidia hires teams of people to test every game and make recommendations, my guess is they actually collect usage data and have one of these SaturnV clusters analyze the data.

    • chuckula
    • 2 years ago

    [quote<]The company says its GeForce product team will use SaturnV to analyze customer data in order to deliver more optimized gaming experiences and help in selecting candidates for job openings. [/quote<] Something about putting both of those things in the same sentence makes me think of the Last Starfighter.

      • CScottG
      • 2 years ago

      ..close:

      [url<]https://www.youtube.com/watch?v=KcCNKQ7gxq4&feature=youtu.be&t=134[/url<]

Pin It on Pinterest

Share This