Microsoft’s Voice Recognition Technology Almost as Accurate as Humans

Microsoft reached a new milestone in the development of more accurate speech recognition.

Using a cluster of Tesla M40 GPUs and the cuDNN version of Computational Network Toolkit (CNTK), their latest version of the technology achieved the lowest word error rate (WER) in the industry.

“Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set,” said the researchers in their recent research paper. “We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data.”

MSFT wep
Historical progress of speech recognition WER on more and more difficult tasks. Twenty years ago, the error rate of the best published research system had a WER of greater than 43 percent.

These advances will directly benefit the future of digital assistants, like Cortana and their real-time Skype Translator service. Microsoft said “the speech research is significant to Microsoft’s overall artificial intelligence strategy of providing systems that can anticipate users’ needs instead of responding to their commands, and to the company’s overall ambitions for providing intelligent systems that can see, hear, speak and even understand, augmenting how humans work today.”’

Read more >