Reproducing the associative model experiment on the paper
Using Fast Weights to Attend to the Recent Past by Jimmy Ba et al. (Incomplete)
Tensorflow (version >= 0.8)
Generate a dataset
$ python generator.py
This script generates a file called associative-retrieval.pkl
, which can be used for training.
Run the model
$ python fw.py
Currently, we are able to see that the accuracy easily exceeds 0.9 for R=20, and 0.97 for R=50, which can justify for the effectiveness of the model. The experiments are barely tuned.
Layer Normalization is extremely crucial for the success of training.
- Otherwise, training will not converge when the inner step is larger than 1.
- Even when inner step of 1, the performance without layer normalization is much worse. For R=20, only 0.4 accuracy can be achieved (which is same as the level of other models.)
- Even with Layer Normalization, using slow weights (ie. vanilla RNN) is much worse than using fast weights.
Further improvements:
- Complete fine-tuning
- Use accelerated version of A
- Add visualization
Using Fast Weights to Attend to the Recent Past. Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu.
Layer Normalization. Jimmy Ba, Ryan Kiros, Geoffery Hinton.