To solve the common problem of short-term memory in RNNs, LSTMs provide mechanisms to store the desired information. LSTMs are commonly used on top of a RNN architecture, with the difference in the activation blocks. By storing information in so called cells which are regulated by gates, LSTMs provide the option to store relevant information over longer sequences.
- Capability of storing long-term information
- Quite flexible in design
- More general and powerful than GRUs
- Sigmoid in gate is usefull for vanishing gradients
The LSTM consisting of three gates and one memory cell.
c: memory cell
a: activation || hidden state h
: update gate. Decides when to update (most of the time the value will be 0 or 1) || u || i
: forget gate. Decides when to forget an value || f
- the memory cell has the option of keeping the old value and adding the new
Peephole connection: giving information from preceiding memory cells (using