Google boosts machine learning with its Tensor Processing Unit

Google has some new hardware out, and no, it's not a Nexus device. The search giant makes extensive use of machine learning to power services like RankBrain and Street View, and it felt it could give those tasks a little more oomph. Enter the Tensor Processing Unit, or TPU for short.

The TPU is a custom-designed ASIC small enough to fit into a hard drive slot in Google's data center racks. Although the TPU has only just been revealed to the world, Google says it's actually been using the hardware in its datacenters for over a year as a "stealthy project." Google's engineers say that the TPU offers a 10x performance-per-watt improvement over off-the-shelf solutions when dealing with machine learning tasks. Unsurprisingly, the TPU is optimized for the company's open-source TensorFlow machine intelligence library.

Google's TPU does use a dirty trick of sorts—it works with with "reduced precision," roughly meaning that the results of an operation are approximations of the "proper" result. Although the notion may sound counterintuitive at first, it's actually a perfectly acceptable method in some computing tasks. For example, a number of algorithms for calculating a square root rely on approximating the calculation to the actual answer until the deviation is small enough to not matter. Tailoring the TPU to reduced-precision tasks apparently netted Google big gains when it came to hardware design. That approach let the company kill off a substantial number of transistors that would otherwise have been necessary for common operations.

Comments closed
    • Voldenuit
    • 4 years ago

    I’m a little TPU, short and stout,
    This is my #handle, this is my sprint-out
    When the code is compiled hear me shout,
    Telnet into me and fprint out.

    • tipoo
    • 4 years ago

    “it works with with “reduced precision,” roughly meaning that the results of an operation are approximations of the “proper” result.”

    Does this mean FP16? Nvidia was also billing the (near double) half precision rate for machine learning.

      • NoOne ButMe
      • 4 years ago

      8int
      [url<]http://www.nextplatform.com/2016/05/19/google-takes-unconventional-route-homegrown-machine-learning-chips/[/url<]

    • derFunkenstein
    • 4 years ago

    [quote<]Google's TPU does use a dirty trick of sortsโ€”it works with with "reduced precision," roughly meaning that the results of an operation are approximations of the "proper" result.[/quote<] I thought I had heard something like this before, and I found [url=https://techreport.com/news/27978/nvidia-pascal-to-feature-mixed-precision-mode-up-to-32gb-of-ram<]this article[/url<] about Pascal. Reduced/mixed precision to increase throughput is apparently all the rage these days. I'm still looking for the TR article on Android Auto and Tegra, because I'm pretty sure there's more about it there, too.

      • willmore
      • 4 years ago

      Wasn’t FP16 a big thing when HL2 came out with an HDR version? Maybe the cards only used FP16 for the frame buffer and internally did all their math in FP24 or FP32? Seems like a horrible waste.

      • Redocbew
      • 4 years ago

      It’s true for a lot of algorithms outside of graphics or AI also. Many of the common tests for primality are to some degree probabilistic meaning there’s a very small chance they could return the “wrong” answer. There’s also a number of examples where algorithms that are intractable with large data sets become much easier to work with if some degree of probability is introduced.

    • chuckula
    • 4 years ago

    Pshaw. The original Pentium was doing this in 1993.

    #FDIV4LIFE

      • maxxcool
      • 4 years ago

      I wish I had more pluses to give ๐Ÿ™

        • BurntMyBacon
        • 4 years ago

        That’s OK. I gave one by proxy. ๐Ÿ™‚

      • Deanjo
      • 4 years ago

      That’s what I like about Intel, they are not just satisfied with FDIV bug, they make sure they upgrade the bugs as well with newer releases.

      [url<]https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/[/url<]

      • UberGerbil
      • 4 years ago

      I still have an original 90MHz Pentium system with that bug. I think the thing would boot if I plugged it in, but it’s been sitting in my junk closet for almost 20 years.

        • cygnus1
        • 4 years ago

        Ugh, that just made me feel really old…

          • BurntMyBacon
          • 4 years ago

          THAT made you feel old!?! $#1+

            • cygnus1
            • 4 years ago

            Yeah, things I can very clearly remember being in the 2+ decade ago range is still kind of new to me… Mid 30’s are rough ๐Ÿ˜€

            • Voldenuit
            • 4 years ago

            Haha, yeah. My first PC, an Apple II clone, had a 1 MHz MOS 6502 CPU.

      • ronch
      • 4 years ago

      And it was only until the Phenom II did AMD manage to catch up and put out similar functionality.

        • jihadjoe
        • 4 years ago

        Wait wasn’t it Phenom I that had problems?

Pin It on Pinterest

Share This