Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Microsoft brings video voice recognition for everyone

Mary Branscombe | Sept. 12, 2014
Azure Media Services is something Apple might want to consider for streaming its next keynote, rather than rolling its own system on Amazon Web Services and Akamai. It's what big-name broadcasters used to stream the 2014 Winter Olympics and the 2014 World Cup, it's what powers the Blinkbox streaming video service, and if you watched the Xbox One announcement you've already used it, so it's certainly proved its reliability.

Azure Media Services is something Apple might want to consider for streaming its next keynote, rather than rolling its own system on Amazon Web Services and Akamai. It's what big-name broadcasters used to stream the 2014 Winter Olympics and the 2014 World Cup, it's what powers the Blinkbox streaming video service, and if you watched the Xbox One announcement you've already used it, so it's certainly proved its reliability.

Now it's a public preview anyone can use to stream content — with or without digital rights management (DRM), on just about any device, through Flash, Silverlight or HTML5, with support for creating your own app for Windows, Windows Phone, iOS, Android and Xbox. If you have company training videos, or shareholder meetings you want to share, Azure Media Services gives your business a cloud service to do that.

If you just want somewhere to keep video, services from YouTube to Vimeo let you do that (although with far less control than Azure). But what's really interesting is the Azure Media Indexer service, which has just moved from preview to General Availability. This is a sophisticated voice recognition system for indexing audio and video, so someone can search for keywords, phrases, or clips; generate closed captions automatically; and even get full transcripts from your media.

How to wreck a nice beach

With the new system, when you search for a keyword, you're not just getting a video that has the word in the title, or in a tag someone has put on by hand; you can jump right to the second of the video that has someone saying the word you're looking for — and you can see a snippet of the automatic transcript to make sure it's what you're looking for. You can try that out with this Microsoft Video Web Search, which has about ten thousand hours of video clips from MSNBC you can search.

That's a demo put together by the MSR team who have been working on MAVIS (the Microsoft Audio Video Indexing Service that powers the Indexer) for the last seven years. Compare that to Siri or Cortana, which get better as they learn your voice; MAVIS doesn't have to learn about each person speaking and it can handle multiple speakers in the same conversation, even if they have different accents. And unlike specialist voice recognition systems for doctors and lawyers, which do extremely well at recognizing words as long as they're about those particular topics, MAVIS can handle almost any conversation.

If you use OneNote, you've had audio search since the 2007 Windows version (also built by the team behind MAVIS), but that just looks for phonemes (the sounds that make up individual words) in the recording. Look for "how to recognize speech" and you could easily get a match to "how to wreck a nice beach," because the sounds are very similar.

 

1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.