This gif shows our green MCTS agent trying to catch the yellow goal while avoiding the cyan/blue obstacles. We learned a dynamics model of the agent-independent environment and use this imagined future to select actions at every state.
The below image shows the same episode as above with the imagined future model depicted. The first column is the observed state, the second column is the oracle rollout (for human reference only) and the third column is the model rollout that the agent used for planning. The fourth column describes model error where red pixels are false negatives (predicted free space where there is an obstacle) and blue pixels indicate false positives (predicted obstacle where there was free space). In the error plot, the predicted goal is plotted in orange over the true yellow goal.
More agent examples can be found at https://imgur.com/a/6DJbrB1
Please refer to our paper presented at the PGMRL Workshop at ICML 2018 for implementation details.
To run our example:
-
Create a directory named './../models/' and download our trained models into it: pretrained-models
-
Run the agent with desired arguments: roadway_model.py
To train the environment model, complete the following steps:
-
Generate a training set in pixel-space of the environment: road.py.
-
Train VQ-VAE to learn a discrete latent representation of individual frames: train_vqvae.py.
-
Train a PixelCNN on the latent space: train_pixel_cnn.py.
-
Update the file paths for the agent and run the agent with desired arguments: roadway_model.py.
Below is a demonstration of the reconstruction from our VQ-VAE model:
Below is a demonstration of the reconstruction from our VAE model:
We based our VQ-VAE implementation on the excellent code from @Rithesh Kumar. The implementation of discretized logistic mixture loss we use is from @Lucas Caccia.
Thanks to @kastnerkyle for discussions and advice on all things.