Late last year, AMD let us peek behind the curtain at its upcoming Radeon Instinct compute accelerators and Radeon Open Compute platform (ROCm). These accelerators are designed to challenge Nvidia in the fast-growing deep learning and machine intelligence markets. At the time, AMD was a bit coy about the Radeon Instinct MI25, an accelerator based on the highly-anticipated Vega architecture. Now that these cards are close to shipping, AMD opened up about the MI25's tech specs.
Like the MI6 and MI8 compute accelerators we learned about last December, the MI25 is a passively-cooled dual-slot card intended for deep learning and HPC applications. It boasts 64 compute units and 4,096 stream processors, and has 12.3 TFLOPS of FP32 and 24.6 TFLOPS of FP16 peak performance on tap. AMD didn't officially announce the GPU's clock rates, but using the above numbers, a little napkin math puts the highest clock at about 1500 MHz.
That kind of theoretical performance deserves potent memory, and AMD doesn't seem to be cutting corners in that regard. The MI25 will ship with 16GB of HBM2 ECC memory. With its 512-bit memory interface, the MI25 should have 484 GB/s of memory bandwidth on tap. AMD is once again teasing us regarding the card's High Bandwidth Cache Controller (HBCC), apparently wanting to keep details about it under its hat for just a little longer.
The company also touts the MI25's power efficiency. The accelerator's 300W TDP might sound hefty to those used to reading power consumption figures for consumer graphics cards, but power-hungry hardware can still be efficient if it can get things done quickly. Using the MI25's peak FP16 performance to determine per-watt performance, AMD points out that the card ought to outstrip Nvidia's competing Tesla products.
AMD plans to release the Radeon Instinct compute accelerators in the third quarter of this year. These units aren't likely to show up at the shelves of your local Microcenter, however. Instead, AMD will ship units to hardware partners including Boxx, Colfax, Exxact Corporation, Gigabyte, Inventec, and Supermicro.
There's also software to go along with new hardware. AMD's ROCm platform is evolving at a rapid pace, and the company just announced that version 1.6 is ready for a June 29 rollout. The big news for this release is the inclusion of MIOpen 1.0, AMD's GPU-accelerated deep learning library. MIOpen offers support for multiple frameworks including Caffe, Google's TensorFlow, and Torch.