Microsoft Research says it’s made a major breakthrough in converting human speech to text. Recognising continuous, conversational speech is really hard to do accurately, but Redmond reports it can do it as well as actual people can.
The researchers claim their technology makes far fewer mistakes. That’s a bold assertion, given the 40-odd-year history of speech-recognition research, peppered as it is with false dawns. But who knows—perhaps this time, things will be different.
So, what do you say? In today’s IT Newspro, we listen to reports and opinions, recognizing the very best.
What’s the story? Rik Henderson breaks the news—Cortana will soon be better at understanding English than you:
Microsoft claims to have made a significant breakthrough in speech recognition. … Its latest research…has resulted in technology that is on par with humans in recognising…free-flowing speech.
How is this relevant to Microsoft’s actual business? Dave Gershgorn is obsessed about Machines with Brains:
Transcribing a conversation…is one of those tasks that’s deceptively difficult for machines. … A new paper from Microsoft Research claims to slightly beat human-level transcription…even when the human transcript is double-checked by a second human.
Microsoft’s next challenge is making [it] work in noisier environments. … This is foundational for everything else Microsoft wants to achieve.
Did you hear that? Microsoft’s Allison Linn drums the point home: [You’re fired -Ed.]
The researchers reported a word error rate (WER) of 5.9 percent. [That’s] about equal to [humans] and it’s the lowest ever recorded against the industry standard.
The team has beat a goal they set less than a year ago — and greatly exceeded everyone else’s expectations. … The milestone will have broad implications for consumer and business products.
Geoffrey Zweig, who manages the Speech & Dialog research group…attributed the accomplishment to the systematic use of the latest neural network technology [with] neural language models in which words are represented as continuous vectors in space. … “This lets the models generalize very well from word to word.”
That’s hot. Yep, even Devin Coldewey thinks so:
It’s a red-letter day at Microsoft Research. [It] is one of those tasks that’s been pursued for decades by pretty much every major tech business and research outfit.
The team used Microsoft’s open-source Computational Network Toolkit. … No word on how soon we can expect this improved speech-to-text to hit Microsoft products.
And what about Skynet? Heed Brian Mastroianni’s calming words:
For those…who worry that this could lead to sentient machines…the research team offered some reassurances. … True comprehension is still a long way off.
Just as IBM’s Watson…and personal smartphone assistants…have been making waves…this technology has the potential to have a significant impact. … Of course, even at the level of “human parity,” the technology isn’t foolproof.
So that’s amazing, then? Trent Burroughs isn’t so sure:
The actual paper has a section on error analysis that is particularly enlightening. … If you consider the task to be speech recognition, rather than transcription…their claim about being better than humans isn’t definitely true. … It’s clearly great work, but reaching human parity is marketing fluff.
Oh really? jpm_sd snarks it up:
I look forward to being able to converse with Microsoft’s research team as easily as I can with humans. … I hope that…journalists can learn to write headlines with similarly low rates of error.
More great links from Petri, IT Unity, Thurrott and abroad:
Main image credit: Tony Hisgett (cc:by)