Nvidia SaturnV supercomputer takes HPC efficiency to new heights

Fast cars have a fantasy attached to them that fuel-efficient ones do not. The latter, though, are much more practical in just about every case. The same might go for high-performance computing, where speed usually makes the news, but efficiency means much lower operating costs.  That was exactly Nvidia's goal with its DGX SaturnV, which the company announced this week as the world's most efficient supercomputer.

Nvidia says the SaturnV is 42% more efficient than last year's most efficient machine. Where last year's winner managed 6.67 GFLOPS/W, the SaturnV pulled 9.46 GFLOPS/W. Nvidia compares the SaturnV to the current list as well, matching it to the Camphor 2 supercomputer in terms of performance and beating it by 2.3 times in energy efficiency.

The SaturnV is made up of 125 DGX-1 deep learning systems, each of which has eight Tesla P100 cards inside. That's 1,000 cards, each of which can perform FP16 calculations at 21.2 TFLOPS.  For comparison, a GTX 1080 performs at 138 GFLOPS. Nvidia is banking heavily on the power of machine learning, which these DGX-1 systems are designed for. Nvidia offers up examples like modeling new combustion engines and fusion reactors as potential uses for the SaturnV. The DGX-1 itself is already in the field, with groups like Open AI, Stanford, the New York University and BenevolentAI using them for research. Nvidia itself uses the DGX-1 for designing the autonomous driving software included in the Drive PX 2.

The DGX SaturnV might not be the fastest supercomputer this year—it ranks a not-too-shabby 28 on the TOP500 list—but its incredible efficiency is going to make the system far more practical for many of the applications that companies and universities will be using it for in the coming years.

Comments closed
    • CuttinHobo
    • 3 years ago

    Nvidia compared SaturnV to the Camphor 2 supercomputer, but I want to know it compares to The Great Hyperlobic Omni-Cognate Neutron Wrangler.

      • chuckula
      • 3 years ago

      Obscure HGTTG reference FTW.

      • psuedonymous
      • 3 years ago

      Forget that, it’s not even a patch on a Bambleweeny 57 Sub-Meson Brain!

    • Srsly_Bro
    • 3 years ago

    You can just say new York university. Everyone knows what NYU is.

      • morphine
      • 3 years ago

      Everyone in the US, you mean? 🙂

        • Srsly_Bro
        • 3 years ago

        Isn’t that everyone???

    • bronek
    • 3 years ago

    You got the numbers wrong: [url=http://images.nvidia.com/content/technologies/deep-learning/pdf/Datasheet-DGX1.pdf<]170 TFlops is for 1 box (all 8 P100 together, not each one)[/url<]. Also, this number is for half-precision floats (FP16) which is still good enough for neural networks, but maybe not for other purposes, like physical models. Since [url=http://www.nvidia.com/object/tesla-p100.html<] single P100 delivers 10TFlops for FP32[/url<], the box still delivers very respectable 80TFlops for single precision.

      • chuckula
      • 3 years ago

      In TR’s defense they said that each [i<]card[/i<] has 170 TFlops and that each card in turn carries 8 of the P100 modules. These "cards" aren't like a regular consumer GPU, think instead a very large motherboard "blade" that's populated with the P100 modules. The 1/2 precision smoke & mirrors numbers are a good point though.

      • morphine
      • 3 years ago

      Yep, that’s an editing mistake. Fixed.

    • Amiga500+
    • 3 years ago

    Yeah, but can it play Doom?

      • VincentHanna
      • 3 years ago

      Not only can it play doom, it can beat doom.

    • AnotherReader
    • 3 years ago

    I thought each P100 had a peak throughput of [url=https://techreport.com/news/29946/pascal-makes-its-debut-on-nvidia-tesla-p100-hpc-card<]5.3 DP TFLOPs[/url<]. Where is the 170 TFLOPs figure coming from?

      • chuckula
      • 3 years ago

      5.3 * 8 P100 modules on each card * magical fudge factor of about 4 = 170 TFlops!!

      Actually, that 170 Tflop number appears to be 16-bit half-precision, which is the “new” (as in Super Nintendo “new”) hotness for inflating performance numbers since those magical “deep learning” applications consist of updating weight values on neurons in a neural network that don’t require high-precision numeric values.

        • bronek
        • 3 years ago

        Actually it is 21Tflops for FP16 per card, times 8 cards in a box = 168TFlops, rounded to 170.

          • chuckula
          • 3 years ago

          Thanks for restating what I said.

        • AnotherReader
        • 3 years ago

        Thanks! I assume, that for the efficiency calculation, they are using the DP TFLOPs figure. They could have gone ahead and used the 8-bit throughput rate as 8-bit is good enough for neural networks.

          • MathMan
          • 3 years ago

          No, P100 doesn’t support those 8 bit operations.

    • Voldenuit
    • 3 years ago

    I, for one, welcome our new supercomputer overlords.

    • NTMBK
    • 3 years ago

    Shouldn’t they call it Falcon 9 to get Elon to buy one?

      • morphine
      • 3 years ago

      Nvidia already has “Tesla” cards. I hear Elon’s been buying them like hot wheels.

Pin It on Pinterest

Share This