Microsoft claims “Breakthrough” in Speech Recognition — “Human Parity”

Microsoft speech recognition
Skeptical Snowy Owl is skeptical

Microsoft Research says it’s made a major breakthrough in converting human speech to text. Recognising continuous, conversational speech is really hard to do accurately, but Redmond reports it can do it as well as actual people can.

The researchers claim their technology makes far fewer mistakes. That’s a bold assertion, given the 40-odd-year history of speech-recognition research, peppered as it is with false dawns. But who knows—perhaps this time, things will be different.

So, what do you say? In today’s IT Newspro, we listen to reports and opinions, recognizing the very best.

Your humble newswatcher curated these news nuggets for your entertainment. Not to mention: Taking Bill Gates seriously


What’s the story? Rik Henderson breaks the news—Cortana will soon be better at understanding English than you:

Microsoft claims to have made a significant breakthrough in speech recognition.Its latest researchhas resulted in technology that is on par with humans in recognisingfree-flowing speech.


How is this relevant to Microsoft’s actual business? Dave Gershgorn is obsessed about Machines with Brains:

Transcribing a conversationis one of those tasks that’s deceptively difficult for machines.A new paper from Microsoft Research claims to slightly beat human-level transcriptioneven when the human transcript is double-checked by a second human.

Microsoft’s next challenge is making [it] work in noisier environments.This is foundational for everything else Microsoft wants to achieve.


Did you hear that? Microsoft’s Allison Linn drums the point home: [You’re fired -Ed.]

The researchers reported a word error rate (WER) of 5.9 percent. [That’s] about equal to [humans] and it’s the lowest ever recorded against the industry standard.

The team has beat a goal they set less than a year ago — and greatly exceeded everyone else’s expectations.The milestone will have broad implications for consumer and business products.

Geoffrey Zweig, who manages the Speech & Dialog research groupattributed the accomplishment to the systematic use of the latest neural network technology [with] neural language models in which words are represented as continuous vectors in space.“This lets the models generalize very well from word to word.”


That’s hot. Yep, even Devin Coldewey thinks so:

It’s a red-letter day at Microsoft Research. [It] is one of those tasks that’s been pursued for decades by pretty much every major tech business and research outfit.

The team used Microsoft’s open-source Computational Network Toolkit.No word on how soon we can expect this improved speech-to-text to hit Microsoft products.


And what about Skynet? Heed Brian Mastroianni’s calming words:

For thosewho worry that this could lead to sentient machinesthe research team offered some reassurances.True comprehension is still a long way off.

Just as IBM’s Watsonand personal smartphone assistantshave been making wavesthis technology has the potential to have a significant impact.Of course, even at the level of “human parity,” the technology isn’t foolproof.


So that’s amazing, then? Trent Burroughs isn’t so sure:

The actual paper has a section on error analysis that is particularly enlightening.If you consider the task to be speech recognition, rather than transcriptiontheir claim about being better than humans isn’t definitely true.It’s clearly great work, but reaching human parity is marketing fluff.


Oh really? jpm_sd snarks it up:

I look forward to being able to converse with Microsoft’s research team as easily as I can with humans.I hope thatjournalists can learn to write headlines with similarly low rates of error.

Buffer Overflow…

More great links from Petri, IT Unity, Thurrott and abroad:

And Finally…

At first, people didn’t take Bill Gates seriously

Full interview: The David Rubenstein Show: Peer-to-Peer Conversations

Main image credit: Tony Hisgett (cc:by)