Skip to content

Latest commit

 

History

History
84 lines (53 loc) · 3.58 KB

lstm.md

File metadata and controls

84 lines (53 loc) · 3.58 KB

LSTM - Long Short Term Memory

Idea

To solve the common problem of short-term memory in RNNs, LSTMs provide mechanisms to store the desired information. LSTMs are commonly used on top of a RNN architecture, with the difference in the activation blocks. By storing information in so called cells which are regulated by gates, LSTMs provide the option to store relevant information over longer sequences.

Improvement

  • Capability of storing long-term information
  • Quite flexible in design
  • More general and powerful than GRUs
  • Sigmoid in gate is usefull for vanishing gradients
    • close to 0 ->

Concept

The LSTM consisting of three gates and one memory cell.

c: memory cell

a: activation || hidden state h

: candidate for replacing c

: update gate. Decides when to update (most of the time the value will be 0 or 1) || u || i

: forget gate. Decides when to forget an value || f

: output gate. || o

Calculus

  • the memory cell has the option of keeping the old value and adding the new

Architecture

LSTM - reference

Variation

Peephole connection: giving information from preceiding memory cells (using $c^{}$ instead of $a^{}$)

Evaluation

Production

References

  1. LSTM - Wikipedia
  2. LSTM Paper
  3. LSTM Forget Gate Paper
  4. Illustrated LSTM