Whether in self-driving cars or personal assistants, deep learning and artificial intelligence are some of the fastest-growing markets for GPUs right now in the form of compute accelerators. Nvidia's beastliest Pascal chip, GP100, isn't even available on a traditional graphics card yet—its home is on the Tesla P100 accelerator. That decision highlights just how important powerful GPUs are to deep-learning tasks like training and inference right now. With its Pascal Titan X and its family of Tesla accelerators, all running CUDA deep-learning programs supported by its cuDNN machine-learning libraries, Nvidia has established a dominant position in this exploding corner of high-performance computing.
Last week in Sonoma, California, AMD laid out an alternative path for companies and institutions looking to take advantage of the power of machine learning, and today, the company is revealing its hand with a new initiative called Radeon Instinct. Instinct is a hardware and software stack built around the Radeon Open Compute Platform, or ROCm. Formerly known as the Boltzmann Initiative, ROCm provides the foundation for running HPC applications on heterogenous accelerators powered by AMD graphics cards. With the release of Radeon Instinct, ROCm can accelerate common deep-learning frameworks like Caffe, Torch 7, and TensorFlow on AMD hardware. All that software runs atop a new series of Radeon Instinct compute accelerators that we'll discuss in a moment.
On top of ROCm, deep-learning developers will soon have the opportunity to use a new open-source library of deep learning functions called MIOpen that AMD intends to release in the first quarter of next year. This library offers a range of functions pre-optimized for execution on Radeon Instinct cards, like convolution, pooling, activation, normalization, and tensor operations. AMD says that convolution operations performed with MIOpen are nearly three times faster than those performed using the widely-used "general matrix multiplication" (GEMM) function from the standard Basic Linear Algebra Subprograms specification. That speed-up is important because convolution operations make up the majority of program run time for a convolutional neural network, according to Google TensorFlow team member Pete Warden.
Deep-learning applications built atop ROCm and MIOpen will be accelerated by three new Radeon Instinct cards for use in servers. All three of these accelerators are passively-cooled, and each one has a specific niche. The Radeon Instinct MI6 uses a Polaris GPU with 16GB of RAM to deliver up to 5.7 TFLOPS of FP16 or FP32 throughput. It'll also offer up to 224 GB/s of memory bandwidth. All that performance comes in a 150W TDP. This card is meant for inference work. Meanwhile, the Radeon Instinct MI8 employs a Fiji GPU equipped with 4GB of HBM RAM to offer 8.2 TFLOPS of FP16 or FP32 throughput alongside 512 GB/s of memory bandwidth. AMD expects this card to be useful as an inference accelerator or for more general HPC tasks.
The most exciting Radeon Instinct card, the MI25, echoes the debut of Nvidia's GP100 GPU on the Tesla P100 accelerator. This card is the first product with AMD's next-generation Vega GPU on board. Most of the details of Vega and its architecture remain secrets for the moment, but we can say that the Vega GPU on board will offer support for packed FP16 math. That means it can achieve twice the throughput for FP16 data compared to FP32, an important boost for machine-learning applications that don't need the extra precision. The Polaris and Fiji cards in the Radeon Instinct line support native FP16 data types to make more efficient use of register and memory space, but they can't perform the packed-math magic of Vega to enjoy a similar performance speedup.
The MI25 will slot into a TDP under 300W. AMD isn't talking about the amount or type of memory on board this card yet. You'll have to hold tight for Vega's official announcement to learn what the "High Bandwidth Cache and Controller" are, as well. Sorry.
We can't resist using these early Vega specs to make some guesses about whatever fully-enabled consumer Radeon card will come with this graphics chip on board, though. If the MI6 and MI8 take their names from their peak FP16 throughput, we can work back from the name and peak FP16 performance of the MI25 to make some educated guesses. If the Vega GPU on board this card features similar raw specs to AMD's last big chip, the Fiji GPU on the R9 Fury X, a roughly 1500-MHz clock speed and a theoretical 4096-stream-processor shader array would shake out to about 12.5 TFLOPS of FP32 performance. AMD hasn't (and wouldn't) confirm anything about real-world implementations of Vega, though, so take our guesses for what they are.
If Radeon Instinct products have any hope of competing in the marketplace, they need to perform—and it seems as though they will. AMD teased a machine-learning benchmark result for two of its cards using the DeepBench GEMM test. Compared to the Maxwell Titan X as a baseline, the Pascal Titan X and the Radeon Instinct MI8 both offer about a 1.3x speedup for this bench, at least going by AMD's numbers. The MI25 achieves nearly two times the monster Maxwell's performance. According to DeepBench's maintainers, the program uses vendor-supplied libraries where possible, so it should be at least somewhat representative of the performance one could expect to achieve using cuDNN or MIOpen-accelerated code.
Radeon Instinct accelerators will also support hardware virtualization through the company's MxGPU feature. In this case, MxGPU will present multiple virtual Radeon Instinct slices to guest operating systems performing deep-learning tasks. Nvidia touts its Tesla accelerators for virtual desktop infrastructure work, but it doesn't offer similar flexibility for deep-learning acceleration as far as we're aware.
While high-performance computing and machine learning may not be TR's regular cup of tea, our feeling is that AMD's Radeon Instinct salvo could offer some much-needed competition in a space that has been dominated by Nvidia's hardware and CUDA programming tools so far. Whether AMD's trio of accelerators and a more open software stack are enough to tempt companies to make the switch remains to be seen, but if they do, Instinct could help AMD establish an important foothold in the exploding, lucrative HPC market being driven by machine learning and artificial intelligence. We'll find out when Instinct products make their debut in the first half of next year.