Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reweighted wake sleep deep generative model example #3

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

marcoct
Copy link
Contributor

@marcoct marcoct commented May 19, 2021

Pending merge of probcomp/Gen.jl#417 into Gen.jl

(Currently the rws_mnist/ project depends on the branch for probcomp/Gen.jl#417, which adds support for multi-threaded gradient estimation and removes some unnecessary parameter allocations, but before this PR is merged the branch of Gen used in rws_mnist/ should be changed to master)

Some conclusions:

  • It is possible to use Gen to successfully train the 10-200-200 generative model (with stochastic hidden layers) and associated inference network in https://arxiv.org/abs/1406.2751 on the binarized MNIST data, roughly on the order of a day or two, without using a GPU and without vectorizing the model, using multi-threaded gradient estimation. The Gen implementation is considerably higher-level and easier to follow than this implementation in Theano.

  • In some preliminary experiments, multi-threaded gradient estimation currently gives some significant speedup (>4x) for minibatches of size 16-32 on a c4.8xlarge EC2 instance. But more thorough benchmarking, including on bare metal instances, would be helpful.

  • Some profiling of this benchmark and optimization of Gen for it, for large multi-core cloud instances, would be helpful, since the performance is likely to carry over to other relevant use cases, such as learning generative models and inference networks (perhaps with comparable or somewhat smaller neural networks) where the trace has stochastic structure. (The use case for non-vectorized CPU-based gradient estimation is more compelling in the case of highly stochastic structure -- e.g. for Bayesian program synthesis -- in which vectorization is more difficult and throughput advantage of GPU is reduced).

@marcoct marcoct changed the title Add reweighted wake sleep deep generative model example Reweighted wake sleep deep generative model example May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant