Microsoft has developed and publicly demonstrated nearly instantaneous Chinese-to-English spoken word translation software capable of preserving the distinctive cadence of the speaker's voice.
The system illustrates the advancement of Microsoft's speech-recognition technology, which is founded on learning software based on the operating mechanism of neural networks. The system recognizes the speaker's words, rapidly transforming the text into properly ordered Chinese sentences that are then passed to speech synthesis software trained to reproduce the speaker's voice. The software performs voice synthesis by modifying a stock text-to-speech model so that it produces certain sounds in the same manner the speaker does.
"Rather than having one word in four or five incorrect, now the error rate is one word in seven or eight," says Microsoft's Rick Rashid. Although he concedes the system is far from infallible, it offers sufficient capability to enable communication where none would otherwise be possible. "We don't yet know the limits on accuracy of this technology--it is really too new," Rashid notes. "As we continue to 'train' the system with more data, it appears to do better and better."
From Technology Review
View Full Article
No entries found