Oak Ridge National Labs, IBM, and Nvidia birth Summit supercomputer


Oak Ridge National Laboratories has a new supercomputer built with bleeding-edge silicon from IBM and Nvidia. The lab and the system's maker claim the machine is the fastest HPC installation in the world. Summit is made of 4608 compute nodes, each built around two IBM Power9 CPUs and six Nvidia Tesla V100 accelerators. ORNL says this combination of gear can deliver a theoretical 200 petaflops (PF) of double-precision floating point arithmetic and over 3 exaflops (EF) of mixed-precision work for AI and deep-learning applications. IBM says the Summit gestalt is over five times more powerful than its predecessor, Titan, and more than a million times faster than the average laptop PC.

Each of Summit's IBM Power System AC922 nodes gets two IBM Power9 processors and six Nvidia Tesla V100 accelerator units. The CPUs play in a pool of 512 GB of DDR4 memory and each Tesla card has 16 GB of its own on-package HBM2 memory. Each processor is connected by two NVLink bricks, each capable of moving 25 GB/s in both directions. The individual processors in each Summit node are connected to three V100 accelerators each, also using Nvidia's proprietary interconnect. Each node has its own 1.6 TB of non-volatile memory for use as a burst buffer, probably in the form of SSD array. Summit as a whole uses an IBM Spectrum Scale filesystem. Maximum storage capacity is 250 PB of storage capacity, while peak write speeds can hit a ludicrous 2.5 TB/s.

Most gerbils probably have at least a passing familiarity with the Tensor-Core-heavy Tesla V100 accelerators that are related to the unit found in Nvidia's desktop-ish Titan V graphics card. IBM's Power9 CPU architecture is a bit more foreign. Each Power9 chip has 22 cores with support for up to four hardware threads per core, adding up to 176 total CPU threads per node.  Each core has its own 32-KB data and 32-KB instruction caches. Pairs of cores share a 512-KB L2 cache and a 10-MB L3 cache.

The nodes are connected using a dual-rail EDR InfiniBand network that provides node injection bandwidth of up to 23 GB/s. The network is built in a non-blocking fat tree topology. The network has a three-level tree design implemented by a switch inside each cabinet of nodes along with director switches connecting the cabinets to one another.

The Power9 and Tesla V100 silicon is cooled using "cold-plate" technology, which judging from the picture means the type of water blocks familiar to builders of PC open-loop liquid cooling systems. The remainder of the heat removed from each rack is removed using a back-of-the-cabinet heat exchanger. The cold plates and the heat exchangers both use medium-temperature water, a setup the Oak Ridge team says is more cost-effective to maintain than traditional cold water setups.

IBM says Summit will be used in the fight against cancer, the identification of next-generation materials, and in furthering scientists' understanding of disease. Data scientists will apply machine-learning algorithms to provide medical researchers with a view of the US cancer-suffering population "at a level of detail obtained only by clinical trial patients." On the materials science front, scientists will crunch numbers to try to identify advanced materials for the next-generation of batteries, building materials, and semiconductors. Bioinformaticians and computational biologists will apply Summit's AI capabilities to locate patterns in the interactions of protein-protein and cell-cell interactions to gain a more precise understanding of how human diseases affect health.

Interested gerbils can read more at Oak Ridge National Laboratory's Summit site. Lawrence Livermore National Laboratory is currently working on Sierra, Summit's sister supercomputer, based on the same Power9-Volta V100 node architecture. Nuclear scientists will use Sierra to simulate the performance of nuclear weapons systems instead of underground test detonation. Summit runs Red Hat Enterprise Linux 7.4.

Tip: You can use the A/Z keys to walk threads.
View options