Google points deep-learning machines at audio effect subtitles

If you've ever looked through the subtitle options on a DVD or Blu-ray disc, you've likely noticed that there's often a set of subtitles for deaf and hard-of-hearing users. Much of the sound in videos we watch isn't pure human language, and those subtitles account for that, offering text descriptions of significant audio cues. Youtube has offered automatically-generated speech captioning for years. Now, Google is turning its machine-learning powers toward sound effects to bring audio-effect subtitling to its streaming video service.

Like it happened with automatic speech captioning, sound effect subtitling is starting out pretty basic, with [LAUGHTER], [APPLAUSE], and [MUSIC] denominators. Google explains in the blog post that while there are many more types of sound its machine-learning network is capable of recognizing, those sounds require the least contextual information. For contrast, Google engineer Sourish Chaudhuri explained that [RING] could be the ring of a bell, alarm, or phone.

One of the main challenges that the researchers encountered was having the system make an educated guess when it came across two sound effects simultaneously. In order to work around that problem, the team added a duration rule—if a sound effect isn't being detected for at least a certain period of time, then it doesn't get mentioned in the subtitles.

Google's blog post goes pretty deep into the weeds on the topic. If you're interested in the applications of deep learning, it's worth a look.

Comments closed
    • vikas.sm
    • 3 years ago

    One step closer to skynet!

    • Stochastic
    • 3 years ago

    I find Google’s speech-to-text when using the Google assistant to be impressively adept, but for whatever reason the automatic subtitles on Youtube are very hit-and-miss.

    • chuckula
    • 3 years ago

    Quasi-related but the alpha 4.0 version of the Tesseract OCR application, which has been taken over by Google, is now using a trained neural network as its detection engine. This is the first iteration of the software that uses neural networks for OCR (yes, believe it or not many OCR engines don’t use neural networks).

    [url<]https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM[/url<]

      • not@home
      • 3 years ago

      For a second there I thought you were saying that Google is not only building a Tesseract, but that they were on their fourth version of it. I was thinking “when did Google get so cool?”

        • UberGerbil
        • 3 years ago

        No, it’s Elon Musk who’s building the Tesseract. His tunnels are going to run through it.

Pin It on Pinterest

Share This