This article aims to provide a mathematics and jargon free introduction to Deep Learning.
What is Deep Learning?
Deep Learning is a a type of Machine Learning built upon multi-layered Artificial Neural Networks:
- Machine Learning allows computers to learn, make predictions and describe data without being explicitly programmed.
- Artificial Neural Networks (ANNs) are inspired by the neurons of the human brain. ANNs consist of a set of interconnected nodes that work together to perform complex statistical operations on incoming data.
Before we delve into Deep Learning we must first understand Artificial Neural Networks.
The Anatomy of an Artificial Neural Network
The most commonly used architecture for ANNs is the Feed Forward architecture. In this section we'll look at a typical Feed Forward ANN and explain the process of training it.
Layers in an Artificial Neural Network
Feed Forward ANNs organise their nodes into three distinct layers, with output from each layer being fed in to the next.
- Input Layer: The Input Layer provides an entry point for incoming data. As such it needs to match the format or "shape" of the expected input. For example, an RGB image 28 pixels high and 28 pixels wide might require an Input Layer of 2352 nodes organised into a 3D structure (28 x 28 x 3). In such a structure each node would represent the Red, Green or Blue value for a given pixel.
- Hidden Layer: Hidden layers are so called because they sit between the Input and Output Layers and have no contact with the "outside world". Their role is to identify features from the input data and use these to correlate between a given input and the correct output. An ANN can have multiple Hidden Layers.
- Output Layer: The Output Layer delivers the end result from the ANN and is structured according to the use case you are working on. For example, if you wanted an ANN to recognise 10 different objects in images you might want 10 output nodes, each representing one of the objects you are trying to find. The final score from each output node would then indicate whether or not the associated object had been found by the ANN.
Nodes in an Artificial Neural Network
As already mentioned, nodes in the Input Layer exist merely as an entry point for incoming data and perform no additional processing.
Nodes in the Hidden and Output Layers are more complex.
Nodes in the Hidden and Output Layers function as follows:
- Inputs: In the diagram above our node is recieving two seperate inputs (i0 and i1) from upstream nodes in the network. In reality a node will likely have much more than 2 parallel inputs.
- Weights: Each incoming connection has a weight (w0 and w1) applied to it's incoming value. Weights can positive or negative, having the effect of either amplifying or dampening the effect of the incoming input. Tuning these weights are an important part of the training process (which is discussed in the next section).
- Sum of weighted inputs: All weighted inputs to our node are added up into a single value.
- Activation Function: The main goal of the activation function is to convert the sum of weighted inputs into an output signal that is then sent further in the network. There are many different types of activation function, all with their own benefits and drawbacks.
- Output: The output (o0) is the result of the activation function. A non-null output generally means that the node has been activated or "triggered".
The pattern of activated nodes in an ANN is basically the ANNs internal mapping from the input to the output.
To see some examples of the internal activation of an ANN you can check out Adam Harley's visualisation of an ANN trained on the The MNIST database of handwritten digits. This a 4 layer ANN with the input layer at the bottom, two hidden layers and finally an output layer at the top. The visualisation is interactive so feel free to click on the nodes and play around with it.
The next question we need to answer is - how does the ANN learn these internal representations?
Training an Artificial Neural Network with Supervised Learning
Supervised Learning is a Machine Learning paradigm where we "help" our ANN to find the correlation between and input and output by training it with example input:output pairs.
This is pretty much the same as how we teach children the alphabet. We point at a letter and at the same time say what it is. Over time the children learn to associate the visual input with the desired response.
The process of training an ANN can be described as follows.
For each input:output pair in the training set:
- Generate Prediction: The input data is fed forwards through the ANN resulting in an output value or prediction. This prediction have been influenced by the current state of the weights in the ANN.
- Calculate Error Margin: The network uses a Loss Function to compare the ANNs prediction against the expected output for the given input. Note that there are many different types of Loss Function, each with their own benefits and drawbacks.
- Update Weights: The result of the Loss Function is used to update the ANNs weights in a process known as back propogation. This is basically running backwards through the ANN and updating the weights in an attempt to reduce the error margin.
Through many training rounds the ANN builds up an internal representation of the connection between given input:output pairs.
But the end goal is not merely memorisation. What we really want is a generalised model that manages to give the correct result for inputs that it has never encountered before.
Deep Learning in Artificial Neural Networks
A common definition of Deep Learning is "ANNs with more than one Hidden Layer". In other words, the more Hidden Layers an ANN has, the "deeper" it can be said to be.
The role of the Hidden Layers is to identify features from the input data and use these to correlate between a given input and the correct output.
Deep Learning allows ANNs to handle complex feature hierarchies by supporting a step by step process of pattern recognition, where each Hidden Layer builds upon the "knowledge" from previous ones.
In the above facial recognition example we have three Hidden Layers. Each of these aggregates and recombines features from the previous layer.
A drawback of Deep Learning is that the feature representations in Hidden Layers are not always human readable like the above example. This means that it can be extremely difficult to get insight into what a Deep Learning ANN delivers a specific result, especially if ANN is working in more than 3 dimensions.
Types of Deep Learning Architectures
Many Deep Learning Architectures exist. These are all based on ANNs, but have specific optimatisations that make them good for certain use cases. Some examples:
- Multi Layer Perception (MLPs): These are great for classification and regression, but don't always perform as well as the more specialised ANNs below.
- Convolutional Neural Networks (CNNs): The most popular solution for Image Processing due to their ability to handle spatial data.
- Recurrent Neural Networks (RNNs): Built for sequential data and therefore popular for Natural Language Processing.
- Hybrid Models: Combining different type of ANNs can unlock complex use cases, such as combining CNN and RNN to process a sequence of images or frames in a video stream.
One of the cool aspects of Deep Learning is the possibility of Transfer Learning. Transfer Learning is where you copy a previously trained Deep Learning model and retrain it for another task. This can be a shortcut to getting a new model quickly up and running, or for dealing with use cases with small amounts of training data.
For Transfer Learning to succeed the new use case needs to share learned features with the new use case. You won't be able to retrain a facial recognition model to perfom text classification, for example, but you might be able to use it for other Image Processing tasks.
Questions to ask before using Deep Learning
Deep Learning has many potential applications, including Computer Vision, Natural Language Processing, Pattern Recognition. Process Optimisation, Feature Extraction and much more.
Here are some questions that provide a quick sanity check before choosing to apply Deep Learning to your next project.
Q: Do you have good knowledge of Mathematics and Applied Statistics?
This knowledge will be necessary to help you configure your Deep Learning ANN, prepare your training data, interpret the output from your ANN as well as helping you deal with unexpected results.
Q: Have you ruled out simpler Machine Learning methods?
Deep Learning is merely one tool in the Machine Learning toolbox. It is also one of the most complex and time consuming. Instead of going straight for Deep Learning, use some time to look into simpler alternatives.
Q: Do you have plenty of varied training data available?
Deep Learning projects generally perform better when they have access to large amounts of varied data. This gives Hidden Layers enough information perform automatic feature extraction. Lack of variation in your training data can give rise to biased models.
Q: Do you need insight to why your Deep Learning ANN made a specific decision?
One of the weaknesses of Deep Learning is the lack of Interpretability. Deep Learning ANNs build their own feature hierarchies from the training data, based upon mathematical operations. These feature hierarchies may not be easily interpretable by humans. This can be a problem if you have to explain to a customer why your ANN rejected their application for a credit extension.
Q: Do you have the necessary processing power and data storage available?
Deep Learning can be resource intensive. Not only do you need processing power for training the model, but you also need data storage for all your training data. In the era of cloud computing these problems are easily solvable, but can potentially lead to big costs.
Q: Is there a lack of domain understanding for feature extraction?
If the answer is "yes" then Deep Learning might be able to help you with this! Deep Learning ANNs perform automatic feature extraction without human intervention. This makes them very suitable for cases where manual feature identification is not possible.
Play with an Artificial Neural Network in your Browser
A Neural Network Playground is a browser based application that allows you to configure and run an Artificial Neural Network (with 0 to 6 Hidden Layers) to solve simple classification problems.
Thanks for reading!!
Mark West leads the Data Science team at Bouvet Oslo.