LSTM - Long Short Term Memory

Idea

To solve the common problem of short-term memory in RNNs, LSTMs provide mechanisms to store the desired information. LSTMs are commonly used on top of a RNN architecture, with the difference in the activation blocks. By storing information in so called cells which are regulated by gates, LSTMs provide the option to store relevant information over longer sequences.

Improvement

Capability of storing long-term information
Quite flexible in design
More general and powerful than GRUs
Sigmoid in gate is usefull for vanishing gradients
- close to 0 ->

Concept

The LSTM consisting of three gates and one memory cell.

c: memory cell

a: activation || hidden state h

: candidate for replacing c

: update gate. Decides when to update (most of the time the value will be 0 or 1) || u || i

: forget gate. Decides when to forget an value || f

: output gate. || o

Calculus

the memory cell has the option of keeping the old value and adding the new

Architecture

- reference

Variation

Peephole connection: giving information from preceiding memory cells (using $c^{}$ instead of $a^{}$)

Evaluation

Production

References

LSTM - Wikipedia
LSTM Paper
LSTM Forget Gate Paper
Illustrated LSTM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lstm.md

lstm.md

LSTM - Long Short Term Memory

Idea

Improvement

Concept

Calculus

Architecture

Variation

Evaluation

Production

References

Files

lstm.md

Latest commit

History

lstm.md

File metadata and controls

LSTM - Long Short Term Memory

Idea

Improvement

Concept

Calculus

Architecture

Variation

Evaluation

Production

References