Nvidia upgrades its DGX SaturnV cluster with Tesla V100 chips


— 3:36 PM on November 14, 2017

Nvidia has a lot to talk about at the 2017 Supercomputing Conference, thanks to the presence of its GPUs inside 34 recently-introduced members of the latest TOP500 supercomputer list. Out of those half-thousand machines, 87 now proudly boast Nvidia hardware inside. Furthermore, the company's wares power 14 of the top 20 most efficient supercomputer clusters on Earth. If all that wasn't enough, the company now announced a Volta upgrade to its own DGX SaturnV cluster.

The specs for a single Tesla V100 chip are pretty impressive: 21 billion transistors running at up to 1455 MHz, 5120 stream processors, and 16 GB of on-package HBM2 memory delivering 900 GB/s of maximum theoretical bandwidth. A DGX-1 node packs eight of these enormous silicon wafers into one box for an absurd amount of floating-point power. Nvidia's assembled its DGX SaturnV from a whopping 660 DGX-1 nodes. If you took the GPUs out of SaturnV and spaced them at one-foot intervals, a mile would pass before you ran out of chips.

the system's total potential performance figures are absurd: 5280 Tesla V100 GPUs ready to deliver as much as 660 "AI" (FP16) petaFLOPS, 80 petaFLOPS of FP32 capability, and 40 petaFLOPS of FP64 chops. That outsized capacity for half-precision processing comes at least in part from each Volta GPU's 672 tensor cores.

Nvidia says it plans to use SaturnV for single mission-critical problems in a "hero run" and also to solve time-sensitive internal research challenges. The company says its GeForce product team will use SaturnV to analyze customer data in order to deliver more optimized gaming experiences and help in selecting candidates for job openings. SaturnV already had a hand in simulating 300,000 miles of driving for Nvidia's autonomous vehicle program.

Nvidia touts the efficiency of the new cluster, as well. The company says the cluster provides 15 gigaFLOPS of FP64 compute capability for every watt of power consumption. The company goes on to say that its experience in developing the system, including innovations in scheduling and cluster management, will provide benefits to DGX-1 node buyers. Should Nvidia's theoretical figures hold, it's likely that SaturnV will end up even higher than its current #36 ranking in the Top500 list.

Tip: You can use the A/Z keys to walk threads.
View options