In a breakthrough discovery, Microsoft has managed to develop a system that translates someone’s speech into another language real time with the translated voice being played back instantly in the speaker’s own voice!
This was a fruitful product of the collaboration established two years ago between Microsoft Research and the University of Toronto. A new technique, Deep Neural Network (DNN), is patterned after the human speech behavior and is more efficient and less error-prone compared to the earlier used concept, the Hidden Markov Modeling introduced in 1979. DNN has an error rate of one word in 7-8 words while the HMM has got an error rate of one word in 4-5 words, thus signalling an increase of 5-10% in the error rate.
Speaking at Microsoft Research Asia’s 21st Century Computing event in Tianjin, China, Microsoft’s Chief Research Officer – Rick Rashid gave a demo underlining the new technological prowess. Rashid’s speech was translated in Mandarin Chinese which the audience could later hear and understand. Rashid was speaking in English but the audience was hearing him in Chinese in his own voice. It required a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of Rashid’s own voice taken from about one hour of pre-recorded (English) data.
“We can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed. The results are still not perfect, and there is still much work to be done, but the technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers” – Rick Rashid, Chief Research Officer – Microsoft
Kudos to the good people at Microsoft Research and University of Toronto for this remarkable breakthrough!