At his GTC Japan keynote, Nvidia CEO Jensen Huang noted that AI inferencing—or the use of trained neural network models—is set to become a $20-billion market over the next five years. More and more applications are going to demand services like natural language processing, translation, image and video searches, and AI-driven recommendation, according to Nvidia. To power that future, the company is putting the Turing architecture in data centers using the Tesla T4 inferencing card and letting models run on those cards with the TensorRT Hyperscale Platform.
The Tesla T4 accelerator
The Tesla T4 has 320 Turing tensor cores and 2560 CUDA cores that are good for 8.1 TFLOPS of single-precision FP32, 65 TFLOPS of half-precision FP16, 130 TOPS for INT8, and 260 TOPS for INT4 calculations. The Tesla P40 inference accelerator, by comparison, couldn't perform accelerated half-precision FP16 operations and tops out at 47 TOPS for INT8.
Nvidia also claims the T4 has "twice the decoding performance" for video streams as past products, and it can handle up to 38 1920×1080 video streams for applications like video search, among many others. The T4 delivers that impressive performance in a mere 75 W passively-cooled form factor, compared to 250 W for the Tesla P40.
To allow developers to take advantage of servers and data centers stuffed with T4 accelerators, Nvidia also announced the TensorRT 5 inference optimizer and runtime. TensorRT claims to optimize trained neural networks from practically every deep-learning framework for Tesla accelerators and to help developers tune their models for use with reduced-precision calculations like INT8 and FP16 data types.
Nvidia will also provide a TensorRT inference server application as a containerized microservice to deploy deep-learning models to production systems. The company claims the TensorRT inference server keeps Tesla accelerators fed and ensures peak throughput from Nvidia hardware. The inference server application is ready for use with Kubernetes and Docker, and Nvidia will make it available through its GPU Cloud resource.