Mu'tah University researcher Ahmad Hassanat is developing a computerized system that can analyze the shapes human lips make as they produce different sounds.
These shapes, called visemes, have been difficult to analyze because there are dozens of visemes for the 40 to 50 sounds that make up the English language. Hassanat is developing a system that can detect the visual signature of entire words, using the appearance of the tongue and teeth as well as the lips.
He trained the system by filming 10 women and 16 men of different ethnicities as they read passages of text. First, the computer compared the recordings with a text it knew, and tried to guess what the volunteers were saying in a second video. When the system was allowed to use the same person's training speech, it was able to identify about 75 percent of the words spoken. However, when the person's original training video was excluded from the analysis, the program's accuracy fell to 33 percent on average.
Separately, Waseda University researcher Yasuhiro Oikawa in 2013 filmed a speaker's throat with a high-speed camera, measuring the tiny vibrations in the skin caused by the act of speaking. Oikawa says the precise frequencies of the vibrations could be used to reconstruct the word being spoken.
From The Economist
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA
No entries found