Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

What deep learning really means

Martin Heller | Feb. 7, 2017
GPUs in the cloud put the predictive power of deep neural networks within reach of every developer

There are many ways to approach deep learning, but none are perfect, at least not yet. There are only better and worse strategies for each application.

Deep learning strategies, tactics, and applications

For an example of an application of deep learning, let's take image recognition. Since living organisms process images with their visual cortex, many researchers have taken the architecture of the visual cortex as a model for neural networks designed to perform image recognition. The biological research goes back to the 1950s.

The breakthrough in the neural network field for vision was Yann LeCun's 1998 LeNet-5, a seven-level convolutional neural network (CNN) for recognition of handwritten digits digitized in 32-by-32-pixel images. To analyze higher-resolution images, the network would need more neurons and more layers.

Since then, packages for creating CNNs and other deep neural networks have proliferated. These include Caffe, Microsoft Cognitive Toolkit, MXNet, Neon, TensorFlow, Theano, and Torch.

Convolutional neural networks typically use convolutional, pooling, ReLU, fully connected, and loss layers to simulate a visual cortex. The convolutional layer basically takes the integrals of many small overlapping regions. The pooling layer performs a form of nonlinear down-sampling. ReLU layers, which we mentioned earlier, apply the nonsaturating activation function f(x) = max(0,x). In a fully connected layer, the neurons have full connections to all activations in the previous layer. A loss layer computes how the network training penalizes the deviation between the predicted and true labels, using a Softmax or cross-entropy loss for classification or an Euclidean loss for regression.

Besides image recognition, CNNs have been applied to natural language processing, drug discovery, and playing Go.

Natural language processing (NLP) is another major application area for deep learning. In addition to the machine translation problem addressed by Google Translate, major NLP tasks include automatic summarization, co-reference resolution, discourse analysis, morphological segmentation, named entity recognition, natural language generation, natural language understanding, part-of-speech tagging, sentiment analysis, and speech recognition.

In addition to CNNs, NLP tasks are often addressed with recurrent neural networks (RNNs), which include the Long Short Term Memory (LSTM) model. As I mentioned earlier, in recurrent neural networks, neurons can influence themselves, either directly, or indirectly through the next layer. In other words, RNNs can have loops, which gives them the ability to persist some information history when processing sequences -- and language is nothing without sequences. LSTMs are a particularly attractive form of RNN that have a more powerful update equation and a more complicated repeating module structure.

Running deep learning

Needless to say, deep CNNs and LSTMs often require serious computing power for training. Remember how the Google Brain team needed a couple thousand GPUs to train the new A.I. version of Google Translate? That's no joke. A training session that takes three hours on one GPU is likely to take 30 hours on a CPU. Also, the kind of GPU matters: For most deep learning packages, you need one or more CUDA-compatible Nvidia GPUs with enough internal memory to run your models.


Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.