Understanding Long Short-Term Memory Networks

In this lesson, we’ll first explain how an LSTM cell operates. In addition to the hidden states, we’ll see that a gating mechanism is in place to control information flow inside the cell. Then, we’ll work through a detailed example and see how gates and states help at various stages of the example to achieve desired behaviors, leading to the desired output. Finally, we will compare an LSTM against a standard RNN to learn how an LSTM differs from a standard RNN.

What is an LSTM?

LSTMs can be seen as a more complex and capable family of RNNs. Though LSTMs are a complicated beast, the underlying principles of LSTMs are as same as those of RNNs; they process a sequence of items by working on one input at a time in a sequential order. An LSTM is mainly composed of five different components:

  • Cell state: This is the internal cell state (that is, memory) of an LSTM cell.

  • Hidden state: This is the external hidden state exposed to other layers and used to calculate predictions.

  • Input gate: This determines how much of the current input is read into the cell state.

  • Forget gate: This determines how much of the previous cell state is sent into the current cell state.

  • Output gate: This determines how much of the cell state is output into the hidden state.

We can wrap the RNN to a cell architecture as follows: the cell will output some state (with a nonlinear activation function) that is dependent on the previous cell state and the current input. However, in RNNs, the cell state is continuously updated with every incoming input. This behavior is quite undesirable for storing long-term dependencies.

LSTMs can decide when to add, update, or forget information stored in each neuron in the cell state. In other words, LSTMs are equipped with a mechanism to keep the cell state unchanged (if warranted for better performance), giving them the ability to store long-term dependencies.

This is achieved by introducing a gating mechanism. LSTMs possess gates for each operation the cell needs to perform. The gates are continuous (often sigmoid functions) between 0 and 1, where 0 means no information flows through the gate, and 1 means all the information flows through the gate. An LSTM uses one such gate for each neuron in the cell. As explained above, these gates control the following:

  • How much of the current input is written to the cell state (input gate)

  • How much information is forgotten from the previous cell state (forget gate)

  • How much information is output into the final hidden state from the cell state (output gate)

Data functionality in LSTM models

The figure below illustrates this functionality for a hypothetical scenario. Each gate decides how much of various data (for example, the current input, the previous hidden state, or the previous cell state) flows into the states (that is, the final hidden state or the cell state). The thickness of each line represents how much information is flowing to or from that gate (in some hypothetical scenarios). For example, in this figure, we can see that the input gate is allowing more from the current input than from the previous final hidden state, whereas the forget gate allows more from the previous final hidden state than from the current input:

Get hands-on with 1200+ tech skills courses.