If you don’t know all the labels, you can’t use them for training; instead, you score the results and leave your system to devise rules that make sense of the answers it gets right or wrong, in what’s known as unsupervised learning. The most common unsupervised learning algorithm is clustering, which derives the structure of your data by looking at relationships between variables in the data. Amazon’s product recommendation system that tells you what people who bought an item also bought uses unsupervised learning.
With reinforcement learning, the system learns as it goes by seeing what happens. You set up a clear set of rewards so the system can judge how successful its actions are. Reinforcement learning is well suited to game play because there are obvious rewards. Google’s DeepMind AlphaGo used reinforcement learning to learn Go, Microsoft’ Project Malmo system allows researchers to use Minecraft as a reinforcement learning environment, and a bot built with OpenAI’s reinforcement learning algorithm recently beat several top-ranked players at Valve’s Dota 2 game.
The complexity of creating accurate, useful rewards has limited the use of reinforcement learning, but Microsoft has been using a specific form of reinforcement learning called contextual bandits (based on the concept of a multi-armed slot machine) to significantly improve click-through rates on MSN. That system is now available as the Microsoft Custom Decision Service API. Microsoft is also using a reinforcement learning system in a pilot where customer service chatbots monitor how useful their automated responses are and offer to hand you off to a real person if the information isn’t what you need; the human agent also scores the bot to help it improve.
Combining machine learning algorithms for best results
Often, it takes more than one machine learning method to get the best result; ensemble learning systems use multiple machine learning techniques in combination. For example, the DeepMind system that beat expert human players at Go uses not only reinforcement learning but also supervised deep learning to learn from thousands of recorded Go matches between human players. That combination is sometimes known as semi-supervised learning.
Similarly, the machine learning system that Microsoft Kinect uses to recognize human movements was built with a combination of a discriminative system — to build that Microsoft rented a Hollywood motion-capture suite, extracted the position of the skeleton and labelled the individual body parts to classify which of the various known postures it was in — and a generative system, which used a model of the characteristics of each posture to synthesize thousands more images to give the system a large enough data set to learn from.
Sign up for CIO Asia eNewsletters.