Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

What deep learning really means

Martin Heller | Feb. 7, 2017
GPUs in the cloud put the predictive power of deep neural networks within reach of every developer

BOOL Swing_ax_at_dragon()
       BOOL retval = rand()>SOME_THRESHOLD;
            if (retval)
                        printf("Your ax hits. Dragon dies.");
printf("Your ax misses. Dragon spits flames.");
return retval;

In other words, if we want a conventional program to vary statistically instead of behaving consistently, we have to program the variation. Machine learning turns that idea on its head.

Machine learning

In machine learning (ML), the essential task is to create a predictor of future outputs from some set of inputs. This is accomplished by training the predictor statistically from historical data.

If the value predicted is a real number, then you are solving a regression problem, such as "What will the price of MSFT stock be on Tuesday at noon?" The complete history of MSFT stock transactions is available for training, as are all the related stocks, news, and economic data that might correlate to the stock price.

If you are predicting a yes or no response, then you are solving a binary or two-class classification problem, such as "Will the price of MSFT stock go up between now and Tuesday at noon?" The corpus of data is the same as the regression problem, but the algorithms for optimizing the predictor will be different.

If you are predicting more than two classes, then you are solving a multiclass classification problem, such as "What's the best action for MSFT stock? Buy, sell, or hold?" Again, the corpus of data is the same, but the algorithms might be different.

In general, when you do ML you first prepare the historical data (see my tutorial on Azure ML for an example), then split it randomly into two groups: one for training and one for testing. When you process the data for training, you use the known target value; when you process the data for testing, you predict the target value from the other data (no peeking!) and compute the error rates by comparing the prediction to the known target value.

Microsoft's Machine Learning Algorithm Cheat Sheet shown above is a good resource for picking algorithms, especially if you're using Azure ML or another general-purpose ML library or service. For the case of stock market data, Decision Forest (known for accuracy and fast training) might be a good first algorithm for regression, Logistic Regression (fast training, linear model) might be a good first algorithm for two-class classification, and Decision Jungle (accuracy, small memory footprint) might be a good first algorithm for multiclass classification.


Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.