At its GPU Technology Conference today (GTC), Nvidia CEO Jensen Huang made a fair number of announcements. However, none were likely as exciting to the machine-learning crowd as the announcement of the Nvidia DGX-2. As the successor to the DGX-1, the DGX-2 is likewise a pre-built computing cluster that combines a bundle of Nvidia GPUs and supporting hardware into a turn-key HPC system. This year's model doubles the number of GPUs to 16, and Nvidia says it offers a staggering 2 petaflops of compute throughput.
Naturally, the DGX-2 upgrades the GPUs in question to Volta-based Tesla V100s. These cards are the latest model (also introduced at GTC) that double the local memory capacity to 32 GB of HBM2 onboard each card. As usual for the Tesla V100, the GPUs and their RAM sit astride SXM3-form-factor cards that use mezzanine connectors for their NVLink 2 interfaces instead of the usual golden finger slots.
The 16 GPUs are connected to an interconnect fabric called NVSwitch that allows any of the GPUs to communicate at 300 GB/sec with any of its brethren. Nvidia says NVSwitch is custom silicon created specifically for enabling the fabric within the DGX-2. That kind of link speed allows software running on the DGX-2 to treat the total 512 GB of HBM2 memory in this box as a single pool, rather than addressing each chip's local memory separately. Nvidia says that thanks to the combination of the new hardware and software optimizations, the DGX-2 is 10 times faster than a DGX-1 system equipped with Tesla V100 cards. Despite the DGX-2's enormous compute prowess, it only draws 10 kW of power.
While the GPUs are obviously the DGX-2's raison d'être, the rest of the machine is nothing to sneer at. A pair of Xeon Platinum CPUs are served by 1.5 TB of memory, and a "PCIe switch complex" links those chips to 30 TB of NVMe storage and eight 100-Gigabit Infiniband cards. The whole package is barely the size of a mini-fridge, and Nvidia says it weighs 350 lbs (159 kg). If you're after what Nvidia calls "the world's largest GPU," you'll apparently be able to get one in Q3 of this year for $400,000.