A new algorithm uses cutting-edge techniques to help computers identify human activity from video input far more quickly and efficiently than previous systems.
Its inventors, MIT post-doc Hamed Pirsiavash and University of California at Irvine professor Deva Ramanan, will present the algorithm at the Conference on Computer Vision and Pattern Recognition in Columbus, Ohio next month, according to a statement from MIT.
The researchers drew on natural language processing techniques similar to those used in IBM's Watson and other emergent machine learning projects to create a "grammar" for each action they wanted the system to recognize.
Pirsiavash and Ramanan's creation scales search times in a linear way, meaning that a video 10 times the length of another video will take 10 times as long to search some previous techniques would have taken 1,000 times as long. Additionally, the new algorithm can handle streaming video, because it can guess fairly accurately at the results of partial actions before they are completed.
Pirsiavash said in the statement that the process is much like the one a system such as Watson would use to diagram a sentence. Complicated actions are broken down into their component parts and the algorithm simply looks for a pattern that fits the grammar."When you make tea, for instance, it doesn't matter whether you first put the teabag in the cup or put the kettle on the stove. But it's essential that you put the kettle on the stove before pouring the water into the cup," he said.
Pirsiavash told Network World that he doesn't know when his algorithm might show up in real-world applications, but said it's definitely going to do so at some point.
"There are many companies working on commercializing computer vision systems," he said. "I am sure automatic action recognition will also be used in real products soon."
Sign up for CIO Asia eNewsletters.