Introduction to Recurrent Neural Networks
What are Recurrent Neural Networks?
We already discussed Convolutional Neural Networks in previous chapters. They were mainly used for computer vision applications. But what happens if the input data is a sequence of text or numbers? To deal with these types of data (sequence-based data), we have a special type of neural network: a Recurrent Neural Network.
Let’s look at an example that explains why an Artificial Neural Network of a Multi-layer Perceptron is not suitable for the sequential data.
We have a Multi-layer Perceptron model which tries to give a rating from 1 to 5 for a particular movie review. For the review, “This is a great movie”, our MLP model predicts a rating of 4. On the other hand, for the review “This is not a good movie”, the MLP model again predicts a rating of 4 (which is wrong). This happens because a Multi-layer Perceptron is not capable of remembering past information and, thus, treats each word as an independent entity.
Therefore, we need sequence models (RNNs), that can process information from left to right (like humans) and maintain what has been read so far (which is what happens in the human brain). RNNs are designed to recognize the sequential characteristics of data and use trends to predict the next probable scenario. While RNNs learn similar to feed-forward neural networks while training, they also remember things learned from prior input(s) while generating output(s). Also, CNNs and ANNs are stateless.
Architecture of an RNN
RNNs can take one or more input vectors and generate one or more output vectors, and the output(s) is(are) determined not only by weights applied to inputs such as a standard NN but also by a “hidden” state vector, representing meaning based on input(s)/output(s) beforehand. Therefore, the same input may generate a different output depending on the series’ previous inputs.
A simple Recurrent Neural Network is depicted below: