Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Target's voice-recognition effort: How natural does natural language have to be?

Evan Schuman | Oct. 10, 2016
Target's voice-recognition trial misunderstands the allure of Amazon's Alexa and Apple's Siri, and also how shoppers think and communicate.

actor portraying alexander graham bell in an att promotional film 1926

Target has started toying with a voice-recognition device, positioned to compete against Amazon's Alexa, said The Chicago Tribune. The issue is seeing how far Target can push natural language comprehension. This, however, is a misunderstanding of the allure of Alexa and Apple's Siri, as well as how shoppers think and communicate.

The consumer attraction to Alexa and Siri involves proximity. In other words, in the normal lives of these consumers, these always-listening devices are right next to them. Indeed, Siri will always be as close as a user's phone, which is going to be really close an awful lot of the time. By the way, the ability of these devices to snap to life when you say their magic phrase means that they are constantly listening to you. No, that's not creepy at all. Can't think of any potential for massive privacy invasion there.

Let's bring this back to how consumers speak. For decades, the overwhelming challenge for voice recognition was the first step: having the software able to figure out what words the shopper was saying. Today, most of these devices have done quite well in mastering that skill, and some are even starting to differentiate one person's voice from another. It's far from foolproof, but it's a nice touch.

Now, however, comes the much harder part, which is understanding intent and meaning. Think that's easy? Try some Google searches and see how well it deals with complicated questions.

Siri - and Alexa is at roughly the same level - can't even master some obvious logical connections. For example, I just told Siri that I want to buy some apples. In context, most people would interpret that to be the fruit. Not Siri. It referred me to the Apple Store. I then clarified and said "I want to buy apple the fruit." I swear it then recommended the Apple Watch. It wasn't until I said "I want to buy some apple that is the fruit," that it gave me the info I sought.

Next up: clothing. I told Siri - and it correctly understood as it typed out my question - simply that "I want to buy a tie." It showed locations of Bow Tie Cinemas. Trying to be helpful, I said, "I want to buy a tie to wear." Again, it showed me movie listings. Four attempts later, it finally showed me ties when I said "Clothing tie." And that was on the third attempt. The first two times, it "helpfully" corrected "clothing tie" to "closing time" and asked me for the name of the business.

You get the idea. And those are relatively easy requests. If Target wants to get into retail voice recognition, it will need to deal with sentences such as "Limiting yourself to stores within 30 minutes of me, find me a dress in XX size and XX color and XX style." When a shopper can ask that kind of question to Siri while driving home from work, this will be getting somewhere.


1  2  Next Page 

Sign up for CIO Asia eNewsletters.