- PPO
- DDPG
- TD3
- SAC
- Dreamer
- BC
- GAIL
rolf/
:main.py
: sets up experiment and runs training usingtrainer.py
trainer.py
: contains training and evaluation codealgorithms/
: implementation of all RL and IL algorithmsconfig/
: hyperparameters in yaml (using hydra)algo/
: hyperparameters for algorithmsenv/
: hyperparameters for environments
networks/
: implementation of networks, such as policy and value functionutils/
: contains helper functions
- Ubuntu 18.04 or above
- Python 3.9
- MuJoCo 2.1.0 and MuJoCo 2.1.1
- Install MuJoCo 2.1.0 and MuJoCo 2.1.1, and add the following environment variables into
~/.bashrc
or~/.zshrc
# download MuJoCo 2.1.0 for mujoco-py
$ mkdir ~/.mujoco
$ wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz -O mujoco210_linux.tar.gz
$ tar -xvzf mujoco210_linux.tar.gz -C ~/.mujoco/
$ rm mujoco210_linux.tar.gz
# download MuJoCo 2.1.1 for dm_control
$ wget https://github.com/deepmind/mujoco/releases/download/2.1.1/mujoco-2.1.1-linux-x86_64.tar.gz -O mujoco211_linux.tar.gz
$ tar -xvzf mujoco211_linux.tar.gz -C ~/.mujoco/
$ rm mujoco211_linux.tar.gz
# add MuJoCo 2.1.0 to LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
# for GPU rendering
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
- Install required dependencies
$ sudo apt-get install cmake libopenmpi-dev libgl1-mesa-dev libgl1-mesa-glx libosmesa6-dev patchelf libglew-dev
# software rendering
$ sudo apt-get install libgl1-mesa-glx libosmesa6 patchelf
# window rendering
$ sudo apt-get install libglfw3 libglew-dev
- Install appropriate version of PyTorch
# PyTorch 1.10.2, Linux, CUDA 11.3
$ pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
- Finally, install
robot_learning (rolf)
package
# at the root directory (`robot_learning/`)
$ pip install -e .
Use following commands to run RL/IL algorithms. Each experiment is represented as [ENV].[ALGORITHM].[RUN_PREFIX].[SEED]
and checkpoints and videos are stored in log/[ENV].[ALGORITHM].[RUN_PREFIX].[SEED]
. run_prefix
can be used to differentiate runs with different hyperparameters. See rolf/config/default_config.yaml
for the default hyperparameters.
$ python -m rolf.main run_prefix=test algo@rolf=ppo env.id=Hopper-v2
$ python -m rolf.main run_prefix=test algo@rolf=ddpg env.id=Hopper-v2
$ python -m rolf.main run_prefix=test algo@rolf=td3 env.id=Hopper-v2
$ python -m rolf.main run_prefix=test algo@rolf=sac env.id=Hopper-v2
- Generate demo using PPO
# train ppo expert agent
$ python -m rolf.main run_prefix=test algo@rolf=ppo env.id=Hopper-v2
# collect expert trajectories using ppo expert policy
$ python -m rolf.main run_prefix=test algo@rolf=ppo env.id=Hopper-v2 is_train=False record_video=False record_demo=True num_eval=100
# 100 trajectories are stored in log/Hopper-v2.ppo.test.123/demo/Hopper-v2.ppo.test.123_step_00001000000_100.pkl
- Run BC
$ python -m rolf.main run_prefix=test algo@rolf=bc env.id=Hopper-v2 demo_path=log/Hopper-v2.ppo.test.123/demo/Hopper-v2.ppo.test.123_step_00001000000_100.pkl
$ python -m rolf.main run_prefix=test algo@rolf=gail env.id=Hopper-v2 demo_path=log/Hopper-v2.ppo.test.123/demo/Hopper-v2.ppo.test.123_step_00001000000_100.pkl
# GAIL with BC initialization
$ python -m rolf.main run_prefix=test algo@rolf=gail env.id=Hopper-v2 demo_path=log/Hopper-v2.ppo.test.123/demo/Hopper-v2.ppo.test.123_step_00001000000_100.pkl init_ckpt_path=log/Hopper-v2.bc.test.123/ckpt_00000020.pt init_ckpt_pretrained=True
Implement your own run.py
for experiment setup, your_config.yaml
for configuration, your_trainer.py
for training/evaluation loop, your_agent.py
for algorithm, your_rollout.py
for rollout, your_network.py
for models.
Please refer to skill-chaining
repository for an example. It implements run.py
for experiment setup, policy_sequencing_config.yaml
for configuration, policy_sequencing_trainer.py
for training/evaluation loop, policy_sequencing_agent.py
for algorithm, policy_sequencing_rollout.py
for rollout.
- Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization (CoRL 2021)
- Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding (RSS 2021)
- IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks (ICRA 2021)
- Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)
- Learning to Coordinate Manipulation Skills via Skill Behavior Diversification (ICLR 2020)