The Top500 list of most powerful supercomputers in the world got a new king today. The Sunway TaihuLight, a new installation at the National Supercomputing Center in Wuxi, China, took the top spot with a Linpack score of 93 petaflops. TaihuLight triples the performance of the former number-one system, the Tianhe-2. That's quite impressive by itself, but TaihuLight achieves its performance entirely with Chinese ShenWei CPUs and a custom interconnect. Tianhe-2 relied on two Intel Ivy Bridge CPUs and three Xeon Phi coprocessors in each of its 16,000 nodes to achieve its 33.9-PFLOP Linpack performance figure, according to the Top500 page on that system.
Each of TaihuLight's 40,960 nodes contains a single ShenWei SW26010 CPU, designed by the National Research Center of Parallel Computer Engineering and Technology. According to the Top500 press release, that chip is a 64-bit RISC part with SIMD and out-of-order execution capability. Each of these 260-core chips is capable of three teraflops of compute performance on its own. The Top500 notes that the 93-PFLOP Linpack score of the machine falls short of its 125-PFLOP theoretical maximum, although the organization suggests that's to be expected with the Linpack benchmark.
Each node has 32GB of DDR3 memory, for a total of 1.3PB across the entire system. The Top500 notes that TaihuLight has a rather small amount of memory relative to its 10,649,600 cores. Tianhe-2, for example, has a similar amount of memory for its 3.12 million cores. Each CPU core also gets just 12KB of instruction cache per core and 64KB of "local scratchpad" memory, rather than the L1-L2-L3 cache hierarchy we might be used to seeing these days. While the Top500 says TaihuLight's 15.3MW power draw while running Linpack "will certainly earn it a place in the upper reaches of the Green500 list" for power efficiency, the organization belives the machine's efficiency would suffer if it had a more traditional amount of memory for its size.
According to a paper by Top500 contributor Jack Dongarra on the system, TaihuLight has already run three scientific simulations that have been picked as finalists for the Association of Computing Machinery's Gordon Bell Prize, and two of those programs have achieved 30 to 40 petaflops of sustained performance on the machine. Dongarra says that the fact TaihuLight is running "Gordon Bell contender applications" means that it's "not just a stunt machine." The Top500 says TaihuLight will be used for research and engineering work in "climate, weather and earth systems modeling, life science research, advanced manufacturing, and data analytics." If you're a researcher in those areas, perhaps it's time to figure out how to get some time on this beast.