Fujitsu joins the deep-learning stampede with specialized silicon

— 1:30 PM on July 19, 2017

Nvidia's revenues, profits, and share price have all benefited from surging demand in both the PC gaming hardware and the graphics compute marketplaces. The company could potentially face growing competition in deep learning from rival AMD's Vega graphics chips, the Radeon Instinct family, and the company's ROCm platform, but the AI jockeying doesn't stop there. Intel is working on its own Lake Crest chips, and Google is working on its Tensor Processing Units.

Fujitsu is now throwing its hat into the deep-learning ring, as well. The Japanese server and supercomputer manufacturer announced its intention to build a Deep Learning Unit (DLU) AI processor before the end of its fiscal 2018 (running from April 2018 to March 2019) at the International Supercomputing Conference. Fujitsu's Takumi Maruyama claimed the DLU chip will offer a ten-fold improvement in performance-per-watt compared to competitors' silicon, though it's not clear whether a training or inferencing workload was used to make that claim.

The DLU chip will be composed of an array of Deep Learning Processing Units (DPUs) connected using a high-performance fabric.  A dedicated master core manages the collection of DPUs and the interaction between DPUs and the on-chip memory controller. The chip has native support for FP16, FP32, INT16, and INT8 datatypes. Fujitsu says the low-precision integer datatypes can be used effectively with some deep-learning applications to reduce power consumption while maintaining acceptable accuracy. The company says the chips will have a simple pipleline in order to reduce hardware complexity and an on-chip network for DPU-to-DPU communication.

Fujitsu says the DLU will run using an all-new instruction set architecture (ISA) designed specifically for deep learning. Each DPU has 16 deep-learning processing elements (DPEs), each of which is made up of eight single instruction, multiple data execution units and a "very large" register file under full software control. The DLU will utilize on-package HBM2 memory, as well. The company promises that the design will be scalable using its proprietary Tofu interconnect technology.

 The first-generation DLU silicon  will be sold as a coprocessor, similar to the way that Nvidia offers its Tesla GPU compute products. Fujistu plans to embed the DLU into a CPU starting with the second-generation products. Given the Japanese electronics manufacturer's links to the SPARC architecture, integration of DLUs into future Fujitsu SPARC chips seems most likely. The company didn't provide any estimates of a release date for these integrated chips.

Tip: You can use the A/Z keys to walk threads.
View options