Google’s Cloud TPU is ready for training and inferencing

The world of big iron computing seems to be laser-focused on machine learning these days. Whether it's graphics chip makers Nvidia and AMD producing silicon exclusively for machine learning, or search engine Bing using custom FPGAs to accelerate repetitive mathematical operations, every technology company these days seems to have an AI accelerator strategy.

Google isn't resting on its laurels, either. At its I/O conference today, the company introduced a second-generation version of its Tensor Processor Unit from a year ago, called the Cloud TPU. Google can combine multiple Cloud TPUs into four-chip clusters, and the cluster pictured below offers up to a claimed 180 TFLOPS of floating-point capability. The search giant didn't say whether that's for FP16 or FP32 math, but given the hardware's focus on machine learning tasks, that figure surely refers to reduced-precision number crunching. For comparison, Nvidia's just-introduced Tesla V100 accelerator leans on dedicated tensor hardware to provide 120 machine-learning TFLOPS.

Picture courtesy TR's anonymous Google I/O correspondent

The new TPUs can be assembled into what Google calls pods. Each pod contains 64 second-generation TPUs and should be good for an aggregate 11.5 PFLOPS of compute power. The previous TPU was used for inference (execution tasks) only, but Google says the new TPU can be used for both training and inference tasks. As an example, the company says that a training task that required a full day for "32 of the best commercially-available GPUs" can be done in an afternoon on eight of the new TPUs.

Google will be selling access to its TPUs through its Google Cloud Compute platform, where virtual machines can be used to group TPUs, Nvidia GPUs, and Intel CPU cores together as required to run the TensorFlow machine-learning framework. Gerbils who have used Shazam to identify a music track recently may have had results provided by a Nvidia GPU that is part of Google's Compute Cloud. While the majority of access to the Cloud Compute platform will be paid, the company will provide access to 1000 TPUs to machine-learning researchers as part of its TensorFlow Research Cloud.

Comments closed
    • moog
    • 4 years ago

    What’s the perf/W and how does it compare to Volta?

      • frenchy2k1
      • 4 years ago

      Well, Volta boast of 120 TFlops of tensor operations at 300W and nvidia sells their DGX1 server (8x volta in a 4U, similar to what seems to be here) for 960 TFLOPS.
      So, not sure about perfs per W, but in pure perfs and perfs density, Volta seems to win.

    • Leader952
    • 4 years ago

    The new TPU is only 45 TFLOPS not 180 TFLOPS as stated in the article.

    Four of these TPUs are bolted in a module that has 180 TFLOPS combined.

    [quote<]Google has developed its second-generation tensor processor—four 45-teraflops chips packed onto a 180 TFLOPS tensor processor unit (TPU) module [url<][/url<] [/quote<]

    • K-L-Waster
    • 4 years ago

    Now those are some seriously tall tower heat sinks.

      • willmore
      • 4 years ago

      It’s hard to grasp the scale, but that board looks like it’s 6″x12″, so maybe 4″ tall?

      That’s not crazy in the PC world. I’m pretty sure I have old over-the-top coolers in my colection that positively dwarf these. Then again, if I’m off on the scale…

        • UberGerbil
        • 4 years ago

        Yeah, but have you looked in a server rack lately? If the heatsinks are 3.5″ then it might fit in a 2U rack, but I doubt it.

          • willmore
          • 4 years ago

          Sure, but look at the second picture. I assume some of those racks are full of these boards. So it seems Google managed to stuff them into racks somehow, so crisis averted?

Pin It on Pinterest

Share This