Nvidia powers up deep-learning inference with Tesla P4 and P40

— 10:32 AM on September 13, 2016

At the GPU Technology Conference in Beijing, NVIDIA CEO Jen-Hsun Huang announced the next two cards in the Tesla family of high-performance compute accelerators. The Tesla P4 is built on top of the same GP104 silicon as the GeForce GTX 1080, while the Tesla P40 uses the GP102 silicon used in the newer Titan X.

Nvidia has thrown its weight behind deep learning for some time now, and in fact, the Pascal architecture itself was introduced with the Tesla P100 and the DGX-1 deep learning system. The P100 processor is well-suited to neural network training, which Nvidia says benefits greatly from the chip's massive FP16 performance. By contrast, the new P4 and P40 are focused on neural network inferencing, which is more sensitive to latency. Critically, it also doesn't require that same level of floating-point precision.

As a result, the new cards are designed to specifically support high-speed operations using eight-bit integers. The Pascal chips in the new Tesla P4 and P40 can pack four INT8 operations in the place of a single FP32 fused multiply-add operation. That capability has huge implications for performance on the neural network inferencing applications these accelerators are intended for, assuming developers can optimize for the low-precision data type.

With the emphasis on low-precision performance, Nvidia is introducing a new measure of performance: "TOPS," or tera-operations per second. This is essentially the integer version of the familiar FLOPS measurement, and Nvidia says the Tesla P40 is capable of 47 TOPS. Nvidia claims the P40 is four times faster in real-world inferencing performance than the previous-generation Tesla M40.

While the Tesla P40 is a 250W part focused on maximum single-card performance, the Tesla P4 is focused on density and power efficiency. It's a single-slot card with a configurable 50W or 75W TDP. To bring the GP104 GPU down to those numbers, the card is limited to a relatively low boost clock of 1063 MHz. Despite that, Nvidia says the card is good for 22 TOPS, nearly half of its more power-hungry sibling. Nvidia also claims the Tesla P4 is sixty times more efficient than Xeon-based servers for inferencing applications.

Both of the new Tesla cards are equipped with GDDR5 memory. The Tesla P4 gets 8GB of memory clocked at 6 GT/s, while the much larger Tesla P40 gets 24GB of memory clocked at 7.2 GT/s bolted to the GP102 GPU's 384-bit bus.

Fancy new hardware is useless without fancy new software, of course, and Nvidia also announced two new software products this morning. TensorRT is a software library intended to make it easy for developers to migrate their existing neural nets to the new INT8-focused structure of Pascal. Once that's done, they might make use of the DeepStream SDK, which is a video analysis library that makes use of the video decode engines built into the Pascal GPUs on the new Tesla cards. Nvidia says DeepStream is more than 13x faster than CPU-based analysis.

Jen-Hsun didn't lay out the pricing for the new cards for us, but we do know that the P40 will be available in October, while the P4 will follow in November.

Tip: You can use the A/Z keys to walk threads.
View options