In a demonstration, a smartphone running Zensors was placed face-up on a table. A question was keyed in: "Is there a hand?" After holding a hand over the phone's camera, the app's graph changed, showing that Mechanical Turk workers had answered from afar. The researchers blamed network latency for the fact that the answer took about 30 seconds.
With better responsiveness, Zensors could be used in a variety of business and home applications. A restaurant manager could use it to learn when customers' glasses need to be refilled, and security companies could use it for automatic monitoring.
"We are the first ones, as far as I know, to fuse the crowd with machine learning training and actually doing it," said Gierad Laput, a PhD student at Carnegie Mellon's Human-Computer Interaction Institute, who also showed off new smartphone interfaces at CHI.
The cost of human monitoring is 2 cents per image, according to the researchers. It costs about US$15 worth of human-vetted data to train the algorithms so they can take over.
By contrast, having a programmer write computer-vision software for a sensor that answers a basic yes or no question could take over a month and cost thousands of dollars.
"Natural-language processing, machine learning and computer vision are three of the hardest problems in computer science," said Chris Harrison, an assistant professor of human-computer interaction at CMU. "The crowd lets us basically bypass a lot of that. But we just let the crowd do the bootstrapping work and we still get the benefits of machine learning."
The researchers plan to keep improving the Zensors app, now in beta, and then release it to the public.
Sign up for CIO Asia eNewsletters.