If you've ever looked through the subtitle options on a DVD or Blu-ray disc, you've likely noticed that there's often a set of subtitles for deaf and hard-of-hearing users. Much of the sound in videos we watch isn't pure human language, and those subtitles account for that, offering text descriptions of significant audio cues. Youtube has offered automatically-generated speech captioning for years. Now, Google is turning its machine-learning powers toward sound effects to bring audio-effect subtitling to its streaming video service.
Like it happened with automatic speech captioning, sound effect subtitling is starting out pretty basic, with [LAUGHTER], [APPLAUSE], and [MUSIC] denominators. Google explains in the blog post that while there are many more types of sound its machine-learning network is capable of recognizing, those sounds require the least contextual information. For contrast, Google engineer Sourish Chaudhuri explained that [RING] could be the ring of a bell, alarm, or phone.
One of the main challenges that the researchers encountered was having the system make an educated guess when it came across two sound effects simultaneously. In order to work around that problem, the team added a duration rule—if a sound effect isn't being detected for at least a certain period of time, then it doesn't get mentioned in the subtitles.
Google's blog post goes pretty deep into the weeds on the topic. If you're interested in the applications of deep learning, it's worth a look.
|Acer Spin 1 and Nitro 5 laptops are ready for school season||5|
|Ryzen AGESA 18.104.22.168 exposes more memory overclocking options||11|
|Zotac previews plenty of petite PCs for Computex 2017||3|
|Kingston KC1000 SSDs jump into the consumer NVMe space||4|
|Zotac readies a GTX 1080 Ti Mini and a slick external enclosure||22|
|Towel Day Shortbread||6|
|MSI gets the GTX 1080 Ti ready for USB-C monitors of the future||14|
|Cryorig Cu heatsinks are cool in copper||8|
|Cougar Conquer enclosure makes the PC a centerpiece||17|