Source code repository for my Master Thesis at the University of Zagreb, Faculty of Electrical Engineering and Computing.
Bayesian deep learning merges Bayesian probability theory with deep learning, allowing principled uncertainty estimates from deep learning architectures. For an excellent quick introduction to Bayesian deep learning, check out Demystifying Bayesian Deep Learning. One of the most elegant practical Bayesian deep learning approaches is the Bayes-by-Backprop algorithm, first introduced in the paper titled Weight Uncertainty in Neural Network. The main idea is to replace weights with weight distributions and learn the weight distribution parameters instead of the network parameters directly. The approach was extended from fully connected networks to both RNNs and CNNs.
Relevant papers:
- Weight Uncertainty in Neural Networks
- Bayesian Recurrent Neural Networks
- Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference
Text classification architectures can be expressed as a simple four-step procedure: embed, encode, attend, predict. The classifiers implemented in this repository omit the attend step, use GloVe embeddings, softmax layer for predictions, and use either an LSTM (Long short-term memory) or TCN (Temporal convolutional network) encoder.
Relevant papers:
- GloVe: Global Vectors for Word Representation
- Long Short-term Memory
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Install the conda
environment from environment.yml.
Run train.py for training and test.py for testing, with appropriate cmd arguments (see corresponding .py
files for details).
The goal was to compare Bayes-by-Backprop text classifiers with either Normal or Laplace weight priors to plain deep learning text classifiers with or without Dropout. The tables below contain the test set accuracies on the binary version of the Stanford Sentiment Treebank dataset (SST-2), the IMDb dataset, and the fine-grained version of the Yelp 2015 dataset (Yelp-f). Bayes-by-Backprop text classifiers achieve performance comparable to non-Bayesian Dropout variants, while doubling the number of parameters.
TCN classifier accuracies:
Dataset | Plain | Dropout | BBB+Normal | BBB+Laplace |
---|---|---|---|---|
SST-2 | .83 | .82 | .81 | .81 |
IMDb | .89 | .89 | .88 | .88 |
Yelp-f | .62 | .62 | .62 | .62 |
LSTM classifier accuracies:
Dataset | Plain | Dropout | BBB+Normal | BBB+Laplace |
---|---|---|---|---|
SST-2 | .81 | .81 | .82 | .82 |
IMDb | .83 | .83 | .81 | .81 |
Yelp-f | .63 | .62 | .63 | .63 |
If you are interested in state-of-the-art performance on the used datasets, check out NLP-Progress.
Another set of experiments, involving selective classification (or classification with reject option), yielded the same outcome as did the experiments in Selective Classification for Deep Neural Networks - baseline softmax activation value vastly outperforms Bayesian deep model uncertainty as a proxy for neural network prediction confidence.
- MXNet Bayes-by-backprop tutorial link
- source code for Bayesian Recurrent Neural Networks link
- source code for Weight Uncertainty in Neural Networks link
- source code for PyTorch layers link
- source code for selective deep learning link
No longer actively developed!
- Note: My goal was to code the Bayesian layers from scratch. For up-to-date Bayesian deep learning layer implementations, check out the awesome TensorFlow Probability.
MIT © Antonio Šajatović