Skip to content

Self-attentive BiLSTM based on Coskun et al. (2018). Classification and analysis of human motion data.

License

Notifications You must be signed in to change notification settings

dhesenkamp/attentive-lstm

Repository files navigation

Self-Attention LSTM for Human Motion Analysis

This is a Tensorflow implementation of a bidirectional, self-attentive LSTM as proposed by Coskun et al. (2018) in their paper Human Motion Analysis with Deep Metric Learning. The model differs in the following way from the one proposed in the paper:

  • No layer normalization on the LSTM
  • Does not use triplet loss

TODO

  • TQDM progress bar not working - fix?

Use

The model is ready-to-use for classification tasks on the Human3.6M dataset. It works with the 2D and 3D joint position data. Please visit http://vision.imar.ro/human3.6m/ in order to contact the maintainers of the dataset and request access. I do not own the dataset and do not have permission to redistribute the data.

python main.py --path PATH_TO_DATA

Requirements

environment.yml contains the environment that I used to work on and run the model. NB: The environment was created for an M1 Apple Silicon chip. Not all packages might be compatible across platforms.

The following are required:

  • Python 3
  • tensorflow
  • tensorflow-addons
  • cdflib to work with the .cdf files which contain the joint position data
  • numpy
  • sklearn

Arguments

The following arguments can be supplied when running the script:

  • --path: path to the data. The model can work with both the 2D and 3D joint position data of the Human3.6M dataset
  • --seq_len: maximum length of the sequences. Sequences in the dataset will be cut down to this length. Should be >= shortest sequence length
  • --downsample_rate: rate by which to downsample existing sequences (e.g., 5 means only keep every 5th frame)
  • --normalize:
  • --onehot:
  • --add_noise:
  • --noise_factor:
  • --shuffle_size:
  • --train_test_split:
  • --lstm_size:
  • --dropout_rate:
  • --R:
  • --D:
  • --embedding_size:
  • --classification:
  • --batch_size:
  • --epochs:

Network architecture

alt text
A-LSTM network architecture - figure taken from Coskun et al. (2018)

The network largely follows the architecture from Coskun et al. (2018). Some minor tweaks have been made:

  • Leaky ReLU activation on the FC layers (instead of normal ReLU)

Data

The model uses joint position data from the Human3.6M dataset, a large motion capture dataset, as first introduced in Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments by Ionescu et al. It can work with both the 2D and 3D joint position data. I neither own the dataset nor do i have permission to redistribute it, so please visit http://vision.imar.ro/human3.6m/ and follow the instructions in order to get access.

Technically, the model can work with any kind of joint position/joint angle data. When fed into the network, it has to be

Resources

https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e
https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
https://omoindrot.github.io/triplet-loss
https://aiden.nibali.org/blog/2016-09-06-neural-network-implementation-tricks/

References

Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. https://doi.org/10.48550/arxiv.1607.06450

Coskun, H., Tan, D. J., Conjeti, S., Navab, N., & Tombari, F. (2018). Human Motion Analysis with Deep Metric Learning. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11218 LNCS, 693–710. https://doi.org/10.48550/arxiv.1807.11176

Lin, Z., Feng, M., dos Santos, C. N., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A Structured Self-attentive Sentence Embedding. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings. https://doi.org/10.48550/arxiv.1703.03130

About

Self-attentive BiLSTM based on Coskun et al. (2018). Classification and analysis of human motion data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages