Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Real or virtual? The two faces of machine learning

Galen Gruman | June 7, 2016
The combination of big data, predictive analytics, AI, machine learning, and the Internet of things together powers two very different technology paths

The same notions and advances anchor the self-driving car efforts, which have long roots in robotics and AI work at Carnegie-Mellon University, MIT, IBM Research, and other organizations. (I was editing papers on these topics 30 years ago at IEEE!) But they have become more possible due to those advances in computing, networking, big data analytics, and sensors.

All of these industrial Internet and robotics notions rely on highly accurate models and measurements: the more perfect, the better. That's engineering in a nutshell.

The probabilistic path

Then there's the other approach to virtual assistants, bots, and recommendation engines. This is where much of Silicon Valley has been focused, mainly for marketing activities: Amazon product recommendations, Google search results, Facebook recommendations, "intelligent" marketing and ad targeting, and virtual assistants like Google Now, Siri, and Cortana.

Those aren't at all like physical objects. In fact, they're very different in key ways that mean what you're computing, analyzing, and ultimately doing shouldn't -- and can't -- be about perfection.

Think about search results: There are no perfect results. If there were, my perfect is not your perfect. It's all situational, contextual, and transitory. Google is doing a "good enough" match between your search terms and the knowledge it has cataloged on the Internet. It adjusts results based on the information Google has gathered about you, as well as on what most people tend to click as a rough guide to the good-enough results.

That's a probabilistic system. It applies equally to marketing and advertising (Silicon Valley's big AI and big data focus for the last decade) as it does to search, recommendations, and all the other stuff we read about. Much of the machine learning research is about optimizing these kinds of systems through feedback fabrics.

"Probabilistic" does not mean "inaccurate is OK," of course. But it does mean "accurate" is in the eye of the beholder, so there's both more freedom to be good enough and significantly more effort needed to understand all the legitimate options. A simulacrum of an engine needs to be an exact match of that engine, but for probabilistic analysis it needs to accept for a sometimes broad variety of possible realities and do the best it can under the circumstances.

If you think about how autocorrect and speech-to-text technologies work, you know what I mean. Language is not math, and for grammar, terminology, definitions, and syntax, there are both many legitimate variations and many illegitimate variations. Plus, many of those illegitimate variations are in wide use by those who don't know better -- so the algorithms contend with bad information that the users say is correct. Plus, language evolves, at different rates among different populations.

 

Previous Page  1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.