Sign In

Communications of the ACM

ACM TechNews

Extracting Audio From Visual Information

View as: Print Mobile App Share:
An algorithm recovered speech from the vibrations of a potato chip bag filmed through soundproof glass.

Researchers at the Massachusetts Institute of Technology have developed an algorithm that allows the reconstruction of an audio signal from an analysis of tiny vibrations of objects captured on video.

Credit: Christine Daniloff/MIT

Massachusetts Institute of Technology (MIT) researchers have developed an algorithm that can reconstruct an audio signal by analyzing tiny vibrations of objects depicted in video.

During testing, the researchers recovered intelligible speech from the vibrations of a potato chip bag photographed from 15 feet away through soundproof glass. "The motion of this vibration creates a very subtle visual signal that's usually invisible to the naked eye," says MIT's Abe Davis.

Reconstructing audio from video requires that the frequency of the video samples be higher than the frequency of the audio signal. The MIT researchers also measured the mechanical properties of the objects being filmed and found that the vibrations were about a tenth of a micrometer.

The researchers borrowed a technique from earlier algorithms that amplifies small variations in video, making visible previously undetectable motions, such as the breathing of an infant in the neonatal ward of a hospital, or the pulse in a subject's wrist. They employed this algorithm to create a new program that infers the motions of an object as a whole when it is struck by sound waves.

The new algorithm aligns all the measurements so they will not cancel each other out, assigning greater weight to measurements made at very distinct edges.

From MIT News
View Full Article


Abstracts Copyright © 2014 Information Inc., Bethesda, Maryland, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account