One of the big sins in data analysis and problem solving is jumping to cause. Before you can figure out what happened and why, you need to step back and look at all the variables and their correlations.
Exploratory data analysis can quickly show you the ranges and distributions of all the variables, whether pairs of variables tend to be dependent or independent, where the clusters lie, and where there may be outliers. When you have highly correlated variables, it’s often useful to drop one or the other from the analysis, or to perform something akin to stepwise multiple linear regression to identify the best selection of variables. I don’t mean to imply that the final model will be linear, but it’s always useful to try simple linear models before introducing complications; if you have too many terms in your model, you can wind up with an overdetermined system.
You test many approaches to find the best models
There’s only one way to find the best model for a given data set: try all of them. If your objective is in a well-explored but challenging domain such as photographic feature identification and language recognition, you may be tempted to try only the “best” models from contests, but unfortunately those are often the most compute-intensive deep-learning models, with convolutional layers in the case of image recognition and long short-term memory (LSTM) layers for speech recognition. If you need to train those deep neural networks, you may need more computing power than you have in your office.
You have the computing capacity to train deep learning models
The bigger your dataset, and the more layers in your deep learning model, the more time it takes to train the neural network. Having lots of data helps you to train a better model, but hurts you because of the increase in training time. Having lots of layers helps you identify more features, but also hurts you because of the increase in training time. You probably can’t afford to wait a year to train each model; a week is more reasonable, especially since you will most likely need to tune your models tens of times.
One way around the training time issue is to use general purpose graphics processing units (GPGPUs), such as those made by Nvidia, to perform the vector and matrix computations (also called linear algebra) underlying neural network layers. One K80 GPU and one CPU together often give you 5 to 10 times the training speed of just the CPU if you can get the whole “kernel” of the network into the local memory of the GPU, and with a P100 GPU you can get up to 100 times the training speed of just the CPU.
Sign up for CIO Asia eNewsletters.