Microsoft beefs up Windows Phone voice recognition technology

Good news for Windows Phone owners in the US: you can now shave 0.5 seconds off the time it takes to find a decent pizza in Seattle by speaking to your smartphone.

Okay, this isn’t earth-shattering news, even for pizzaphiles. But Microsoft is excited about the technology behind this development: improvements to the speed and accuracy of Windows Phone’s voice-to-text and voice search features.

“Now when you compose a text message or search using your voice, Bing will return results twice as fast as before and increase accuracy by 15 percent,”announces Bing’s speech team in a (possibly dictated) blog post.

The team has been working with Microsoft’s research division for a year to improve the technology. Here’s the science bit:

“To achieve the speed and accuracy improvements, we focused on an advanced approach called Deep Neural Networks (DNNs). DNN is a technology that is inspired by the functioning of neurons in the brain. In a similar way, DNN technology can detect patterns akin to the way biological systems recognize patterns.

By coupling MSR’s major research breakthroughs in the use of DNNs with the large datasets provided by Bing’s massive index, the DNNs were able to learn more quickly and help Bing voice capabilities get noticeably closer to the way humans recognize speech.”

Actually, there’s an even deeper science bit in a separate post on the Inside Microsoft Research blog, where senior researcher Dong Yu contributes this anecdote on a crucial point in the project:

“I first realized the effect of the DNN when we successfully achieved significant error-rate reduction on the voice-search data set after implementing the context-dependent deep-neural-network hidden Markov model. It was an exciting moment. I was so excited that I did not sleep that night.”

Don’t laugh: this is a genuinely charming insight into the work going on behind the scenes of the technologies we increasingly take for granted. Not least because Yu’s sleepless night may contribute to a much wider range of benefits than just slightly-quicker ordering of a deep-pan Hawaiian with extra pineapple.

It’s the smartphone battle between Apple, Google, Microsoft, BlackBerry and other platforms that’s pumping investment into speech recognition, voice search and related technologies with wide applications.

Or, as Yu puts it: “I believe this is just the first step in advancing the state of the art. Many difficult problems may be attacked under this framework, which might lead to even greater advances.”

Microsoft’s challenge is to make the fruits of this research a big selling point for Windows Phone, as it tries to secure a bigger foothold in the market against iPhone and Android, which both feature their own prominent voice recognition features.

Many people’s purchase decisions will come down to more basic questions: whether the phone looks nice, how good its camera is and whether their favourite apps are available for it, rather than its speech recognition speed and accuracy.

Nokia is working hard on the design and camera questions, while Microsoft seems well aware of the challenge faces on the apps side of things. Just this week, Business Insider claimed it is paying some developers up to $100k to bring popular apps to its platform.

In some areas, like games – N.O.V.A. 3, Temple Run: Brave, MapQuest, Jetpack Joyride, Rayman Jungle Run and Angry Birds Rio in the last month alone – its efforts are paying off. Elsewhere, even long-term holdout Instagram is rumoured to be on its way to Windows Phone, possibly as soon as the end of this month.

Microsoft’s efforts, whether in the research labs with DNN technology or out in developers’ offices with a cheque book, are important.

Apple and Google’s fierce rivalry with iOS and Android means neither can afford to rest on their laurels, but stronger competition from a third player in Microsoft / Windows Phone (with BlackBerry, Firefox OS and Tizen all hoping for a say as well) is good news for smartphone owners. Whatever their pizza preferences.

View Source