Windows has a feature it doesn’t like to talk about. While the OS lets you scrawl notes with a stylus, log in with you face (or secure the Web) via Windows Hello, and even order Cortana to set a reminder, what it’s not so eager for you to do, apparently, is use its speech recognition engine to issue commands or take voice dictation.
The reason for its silence may go back 10 years, to when Microsoft product manager Shanen Boettcher demonstrated voice dictation inside Windows Vista—and flubbed it. The technology kept a low profile after that, and today, few users know you can dictate a document within Windows.
If there were ever a time for Windows to try again, though, it would seem to be now, when advances in computers and artificial intelligence provide a much better foundation for the technology. “
"This is such a great question," said Harry Shum, the executive vice president overseeing Microsoft’s speech-recognition research, as well as Cortana and Bing, when asked about dictation's future within Microsoft Office. "There is really no reason why it is not playing a much more prominent role yet."
We decided to give it another chance: We delved into Windows’ voice dictation features to see how they compared to more recent speech-based technologies.
Ask Word 2016 about dictation, and it’s like the app has never even heard the term. Word displays a similar response for “speech recognition.”
Why speech recognition can’t be too perfect
Some of us still think about voice dictation in the same way Doonesbury lampooned the Apple Newton, turning “I am writing a test sentence” into “Siam fighting atomic sentry.” And you’d be forgiven for thinking so, too: Windows Speech Recognition is powered by the Microsoft Speech Recognizer 8.0, which has remained literally unchanged since Vista. Shum called it a “grandpa” technology.
What has changed, however, is the hardware: Listening for and interpreting speech requires far less processing power than a decade ago. The quality of integrated array mics within PCs like the Surface Book mean that dedicated headsets aren’t necessarily required to achieve superior accuracy. Voice dictation for the masses is here, right?
When I tested Windows’ speech capabilities, however, I experienced firsthand the merciless perfection that’s required for the system to be usable. This story has 1,028 words in it, including subheadings. If you used voice dictation software to write it, a 95.0% accuracy rate would mean you’d have to correct more than fifty mistakes. That gets old fast.
In my tests, based on a methodology I developed for another speech recognition product I’m testing, Windows produced an accuracy rate of 93.6%, That’s pretty bad on paper, and somewhat behind the dedicated software I’m trying. Windows also had an odd habit of interjecting the word “comma” when I was dictating the punctuation mark. The speech community seems split on whether relatively minor mistakes like this are significant.
Sign up for CIO Asia eNewsletters.