Hello Barbie enables children to chat with an artificial intelligence program in a remote data center.
It's easy to dismiss virtual assistants as parlor tricks, irrelevant gimmicks or even fatally flawed.
Microsoft CEO Satya Nadella didn't help matters last week when a demo of his company's Cortana virtual assistant failed spectacularly. At Salesforce.com's Dreamforce event in San Francisco, Nadella tried to show off Cortana as a business tool by asking it: "Show me my most at-risk opportunities." Cortana understood the request as: "Show me to buy milk at this opportunity."
The demo failed not because Cortana is fatally flawed and not because virtual assistants are bad interfaces. The demo failed because it was a demo. (Pro tip: Never demonstrate any voice system while using a microphone on stage in a crowded hall where everyone in the audience is using wireless.)
In fact, Nadella knows something that the public does not, which is: The technology behind virtual assistants like Cortana is about to transform our lives and change the world as we know it. This change will be simultaneously wonderful and horrible. But mostly wonderful.
Using a good virtual assistant, in the best of cases, feels like talking to a person. It seems like a single technological experience. In fact, it involves a long list of very different and unique technologies, including:
- Speech recognition (the ability to recognize talking, colloquialisms and accents while ignoring background speech and non-speech sounds -- as well as doing it all in real time -- while the user is still talking).
- File compression and transfer (the speed by which the voice file can be packed up and shipped off to the data center for processing).
- Artificial intelligence (the ability of the servers and software to "understand" the user input and decide what information to offer as the response).
- Data sources (access to knowledge bases, computational engines and other data to inform the response).
- User context (information extracted from email, calendars, contacts, location, history and whatever's on-screen at the moment).
- Conversation engine (the ability to phrase the response with variety, colloquial speech, humor and context).
- Agency (the ability to do things on behalf of the user, such as make reservations, reach out to contacts, buy things, launch apps and execute commands in those apps).
- Proactivity (the ability to choose what to do and when without being prompted by the user).
Each of these elements is separate from the others and is backed by various methods and technologies. Most importantly, each is rapidly evolving and is developing its own marketplace where there is choice and selection to any company that wants to deploy them as part of an interface to whatever product it happens to offer.
Sign up for CIO Asia eNewsletters.