Microsoft’s Voice Recognition Technology Almost as Accurate as Humans

Microsoft reached a new milestone in the development of more accurate speech recognition.

Using a cluster of Tesla M40 GPUs and the cuDNN version of Computational Network Toolkit (CNTK), their latest version of the technology achieved the lowest word error rate (WER) in the industry.

“Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set,” said the researchers in their recent research paper. “We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data.”

MSFT wep
Historical progress of speech recognition WER on more and more difficult tasks. Twenty years ago, the error rate of the best published research system had a WER of greater than 43 percent.

These advances will directly benefit the future of digital assistants, like Cortana and their real-time Skype Translator service. Microsoft said “the speech research is significant to Microsoft’s overall artificial intelligence strategy of providing systems that can anticipate users’ needs instead of responding to their commands, and to the company’s overall ambitions for providing intelligent systems that can see, hear, speak and even understand, augmenting how humans work today.”’

Read more >

About Brad Nemire

Brad Nemire
Brad Nemire is on the Developer Marketing team and loves reading about all of the fascinating research being done by developers using NVIDIA GPUs. Reach out to Brad on Twitter @BradNemire and let him know how you’re using GPUs to accelerate your research. Brad graduated from San Diego State University and currently resides in San Jose, CA. Follow @BradNemire on Twitter
  • Muzufuzo

    Voice recognition has indeed become substantially better in the last 5 years but there is practically not much language understanding. We humans understand the context and associations. Machines can’t do that well. “Natural language understanding” is what needs to be done with the digital assistants if companies which build them want them to succeed. When I try using Siri, Cortana or Google Now, they still feel very very dumb, rather than intelligent. I hope the next 5 years will bring more real-life practical improvements than the previous 5.