Sign In

Communications of the ACM

ACM TechNews

Humans, Cover Your Mouths: Lip Reading Bots in the Wild

View as: Print Mobile App Share:
The new algorithm outperforms professional human lip readers.

New studies show that a machine can understand what you are saying without hearing a sound.

Credit: Joon Son Chung

Researchers at Oxford University in the U.K. and Google have developed an algorithm that has outperformed professional human lip readers, a breakthrough they say could lead to surveillance video systems that can show the content of speech in addition to the actions of an individual.

The researchers developed the algorithm by training Google's Deep Mind neural network on thousands of hours of subtitled BBC TV videos, showing a wide range of people speaking in a variety of poses, activities, and lighting.

The neural network, dubbed Watch, Listen, Attend, and Spell (WLAS), learned to transcribe videos of mouth motion to characters, using more than 100,000 sentences from the videos. By translating mouth movements into individual characters, WLAS was able to spell out words.

The Oxford researchers found a professional lip reader could correctly decipher less than 25% of the spoken words, while the neural network was able to decipher 50% of the spoken words.

From ZDNet
View Full Article


Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account