So you remember Google's Tensor Processing Unit? If not, all you really need to know is that the chip is a custom ASIC designed by Google to accelerate the inference phase of machine learning tasks. Google initially said that the TPU could improve performance-per-watt in those tasks by a factor of ten in comparison to traditional CPUs and GPUs. Now, the company has released some performance data in the form of a study analyzing the performance of the TPU since its quiet introduction in 2015.
The short version is that predicting a 10x uplift in performance-per-watt was Google's way of being modest. The actual increase for that metric was between 30 and 80 times that of regular solutions, depending on the scenario. When it comes to raw speed, Google says its TPU is between 15x to 30x faster than using standard hardware. The software that runs on the TPU is based on Google's TensorFlow machine learning framework, and some of these performance gains came from optimizing it. The writers of the study say that there are further optimization gains on tap, too.
Apparently, Google saw the need for a chip like the TPU as far back as six years ago. Google uses machine learning algorithms in many of its projects, including Image Search, Photos, Cloud Vision, and Translate. By its nature, machine learning is computationally intensive. By way of example, the Google engineers said that if people used voice search for three minutes a day, running the associated speech recognition tasks without the TPU would have required the company to have twice as many datacenters.