To solve the common problem of short-term memory in RNNs, GRUs provide mechanisms to store the desired information. GRUs are commonly used on top of a RNN architecture, with the difference in the activation blocks. By storing information in so called cells which are regulated by gates, GRUs provide the option to store relevant information over longer sequences.
- Capability of storing long-term information
- Faster calculation than LSTMs
- Quite easy implementation
- Sigmoid in gate is usefull for vanishing gradients
The GRU consisting of two gates and one memory cell.
a: activation
c: memory cell || hidden state h
: candidate for replacing c ||
: update gate. Decides when to update (most of the time the value will be 0 or 1) || u || z