Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Inside Siri's brain: The challenges of extending Apple's virtual assistant

Marco Tabini | April 9, 2013
Siri is one of the biggest features to hit iOS in recent years, and yet it remains severely limited in its capabilities. Alas, Apple--and third-party developers--must overcome many obstacles before voice interaction becomes a pervasive part of the mobile experience.

Assuming that the user has a passable command of their chosen language, this is usually a relatively easy problem to handle. The hard part comes when all those words have to be turned into some sort of actionable content that an app can process; to do this well, the system must have what is called domain knowledge--in other words, it must know the subject area you're talking about.

You've likely encountered a similar problem when asked to deal with a body of knowledge you're unfamiliar with: Your doctor, for example, may tell you that you need to be treated for dyspepsia, but unless you are a medical professional, you probably won't know that you just have indigestion and need an antacid or two. Apple would have to come up with a way for developers to explain to Siri what their apps can do, and provide all the appropriate terminology for those actions.

Of all the parts that make up Siri, this natural language analysis is probably the toughest for developers to tackle, because apps differ greatly, and it's hard to come up with a magic solution that can easily be applied to every possible situation. To make things worse, natural language analysis is not a familiar field for most programmers--who, until now, have mainly been concerned with point-and-click (or point-and-tap) interfaces.

Putting results to text

Once a request has been processed, Siri must convert the result back into text that can be spoken to the user. While not as hard as processing a user's commands, this task, known as natural language generation, still presents some challenges.

It's relatively easy to write software that uses data to cobble together syntactically correct sentences, but, without some hard work, the result is likely to sound artificial and unexciting. When you ask Siri about the weather, for example, instead of just rattling out a list of statistics on temperature, pressure, and cloud cover, the service will give you a generic comment, such as "It's sunny" or "It looks like rain."

This touch of personality may seem unimportant, but it a makes a big difference to a user, particularly during verbal communication. Luckily, there is a well-defined body of work that puts this capability well within the reach of most app developers. Even better, there is no need for this final portion of the Siri experience to take place on the server side; instead, Apple could conceivably come up with a technology that standardizes the creation of complex text, and then leave it to the apps to produce a response directly on each device without unduly taxing resources.

Siri for everyone

Allowing third-party apps to integrate with Siri would be a boon for both developers and users, but it's going to require a lot of effort for everyone involved, in large part because it would represent a significant departure from the way we are used to designing and interacting with our software.

 

Previous Page  1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.