If you've ever looked through the subtitle options on a DVD or Blu-ray disc, you've likely noticed that there's often a set of subtitles for deaf and hard-of-hearing users. Much of the sound in videos we watch isn't pure human language, and those subtitles account for that, offering text descriptions of significant audio cues. Youtube has offered automatically-generated speech captioning for years. Now, Google is turning its machine-learning powers toward sound effects to bring audio-effect subtitling to its streaming video service.
Like it happened with automatic speech captioning, sound effect subtitling is starting out pretty basic, with [LAUGHTER], [APPLAUSE], and [MUSIC] denominators. Google explains in the blog post that while there are many more types of sound its machine-learning network is capable of recognizing, those sounds require the least contextual information. For contrast, Google engineer Sourish Chaudhuri explained that [RING] could be the ring of a bell, alarm, or phone.
One of the main challenges that the researchers encountered was having the system make an educated guess when it came across two sound effects simultaneously. In order to work around that problem, the team added a duration rule—if a sound effect isn't being detected for at least a certain period of time, then it doesn't get mentioned in the subtitles.
Google's blog post goes pretty deep into the weeds on the topic. If you're interested in the applications of deep learning, it's worth a look.
|Adata D16750 power bank is tougher than the average juice pack||0|
|Deals of the week: fast memory, an AM4 motherboard, and more||0|
|Corsair RMx White Series PSUs take a walk on the snowy side||15|
|Intel crams 100 GFLOPS of neural-net inferencing onto a USB stick||25|
|Toshiba's XG5 1TB NVMe SSD reviewed||6|
|Microsoft and Johnson Controls put Cortana in a thermostat||21|
|Space Exploration Day Shortbread||17|
|Geil de-blings its Evo Spear memory modules||12|
|Thermaltake View 21 chassis doubles up on tempered glass||5|