In a major breakthrough for artificial intelligence, Microsoft has developed a new speech recognition software that can identify actions and tasks almost as well as an actual human can.
In a paper published this week, a dedicated team from the Microsoft Artificial Intelligence and Research Department reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month. This makes this speech recognition system the best you can find in the world right now. Interestingly, the 5.9 percent error rate is about equal to that of people who were asked to transcribe the exact same conversation.
This new milestone in WER bests the previous goal set by the team last year, which was making a system that recognizes speech almost as well as how a person would.
“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group.
This milestone comes after decades of research into speech recognition technology since the early 1970s, when the DARPA, a US Government agency showed interest for improvement of national security through this tech. This encouraged a lot of the top tech giants to to invest.
This achievement means a new avenue for improved speech recognition on Microsoft’s consumer and enterprise products like Skype, Xbox One and Cortana, which depend mostly on automated functions.
An important thing to note, though, is that the research milestone doesn’t mean the computer recognized every word perfectly. Instead, it means that the error rate – or the rate at which the computer misheard a word like “have” for “is” or “a” for “the” – is the same as you’d expect from a person hearing the same conversation.