diff --git a/.gitignore b/.gitignore index 03304668..65249c39 100644 --- a/.gitignore +++ b/.gitignore @@ -1,8 +1,8 @@ # pip distribution folder dist/ -# datasets folder -datasets/ +# datasets folder at top-level (leading slash) +/datasets/ # local test dataset that is lazily downloaded by example scripts tests/assets/test.hdf5 diff --git a/README.md b/README.md index 1593cbe4..28a5dd27 100644 --- a/README.md +++ b/README.md @@ -11,45 +11,62 @@

-[**[Homepage]**](https://arise-initiative.github.io/robomimic-web/)   [**[Documentation]**](https://arise-initiative.github.io/robomimic-web/docs/introduction/overview.html)   [**[Study Paper]**](https://arxiv.org/abs/2108.03298)   [**[Study Website]**](https://arise-initiative.github.io/robomimic-web/study/)   [**[ARISE Initiative]**](https://github.com/ARISE-Initiative) +[**[Homepage]**](https://robomimic.github.io/)   [**[Documentation]**](https://robomimic.github.io/docs/introduction/overview.html)   [**[Study Paper]**](https://arxiv.org/abs/2108.03298)   [**[Study Website]**](https://robomimic.github.io/study/)   [**[ARISE Initiative]**](https://github.com/ARISE-Initiative) ------- ## Latest Updates +- [05/23/2022] **v0.2.1**: Updated website and documentation to feature more tutorials :notebook_with_decorative_cover: - [12/16/2021] **v0.2.0**: Modular observation modalities and encoders :wrench:, support for [MOMART](https://sites.google.com/view/il-for-mm/home) datasets :open_file_folder: - [08/09/2021] **v0.1.0**: Initial code and paper release ------- -**robomimic** is a framework for robot learning from demonstration. It offers a broad set of demonstration datasets collected on robot manipulation domains, and learning algorithms to learn from these datasets. This project is part of the broader [Advancing Robot Intelligence through Simulated Environments (ARISE) Initiative](https://github.com/ARISE-Initiative), with the aim of lowering the barriers of entry for cutting-edge research at the intersection of AI and Robotics. +**robomimic** is a framework for robot learning from demonstration. +It offers a broad set of demonstration datasets collected on robot manipulation domains and offline learning algorithms to learn from these datasets. +**robomimic** aims to make robot learning broadly *accessible* and *reproducible*, allowing researchers and practitioners to benchmark tasks and algorithms fairly and to develop the next generation of robot learning algorithms. -Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. The overarching goal of **robomimic** is to provide researchers and practitioners with: +## Core Features -- a **standardized set of large demonstration datasets** across several benchmarking tasks to facilitate fair comparisons, with a focus on learning from human-provided demonstrations -- a **standardized set of large demonstration datasets** across several benchmarking tasks to facilitate fair comparisons, with a focus on learning from human-provided demonstrations (see [this link](https://arise-initiative.github.io/robomimic-web/docs/introduction/quickstart.html#supported-datasets) for a list of supported datasets) -- **high-quality implementations of several learning algorithms** for training closed-loop policies from offline datasets to make reproducing results easy and lower the barrier to entry -- a **modular design** that offers great flexibility in extending algorithms and designing new algorithms +

+ +

-This release of **robomimic** contains seven offline learning [algorithms](https://arise-initiative.github.io/robomimic-web/docs/modules/algorithms.html) and standardized [datasets](https://arise-initiative.github.io/robomimic-web/docs/introduction/results.html) collected across five simulated and three real-world multi-stage manipulation tasks of varying complexity. We highlight some features below (for a more thorough list of features, see [this link](https://arise-initiative.github.io/robomimic-web/docs/introduction/quickstart.html#features-overview)): + -## Troubleshooting -Please see the [troubleshooting](https://arise-initiative.github.io/robomimic-web/docs/miscellaneous/troubleshooting.html) section for common fixes, or [submit an issue](https://github.com/ARISE-Initiative/robomimic/issues) on our github page. +## Reproducing benchmarks -## Reproducing study results +The robomimic framework also makes reproducing the results from different benchmarks and datasets easy. See the [datasets page](https://robomimic.github.io/docs/datasets/overview.html) for more information on downloading datasets and reproducing experiments. -The **robomimic** framework also makes reproducing the results from this [study](https://arise-initiative.github.io/robomimic-web/study) easy. See the [results documentation](https://arise-initiative.github.io/robomimic-web/docs/introduction/results.html) for more information. +## Troubleshooting + +Please see the [troubleshooting](https://robomimic.github.io/docs/miscellaneous/troubleshooting.html) section for common fixes, or [submit an issue](https://github.com/ARISE-Initiative/robomimic/issues) on our github page. + +## Contributing to robomimic +This project is part of the broader [Advancing Robot Intelligence through Simulated Environments (ARISE) Initiative](https://github.com/ARISE-Initiative), with the aim of lowering the barriers of entry for cutting-edge research at the intersection of AI and Robotics. +The project originally began development in late 2018 by researchers in the [Stanford Vision and Learning Lab](http://svl.stanford.edu/) (SVL). +Now it is actively maintained and used for robotics research projects across multiple labs. +We welcome community contributions to this project. +For details please check our [contributing guidelines](https://robomimic.github.io/docs/miscellaneous/contributing.html). -## Citations +## Citation Please cite [this paper](https://arxiv.org/abs/2108.03298) if you use this framework in your work: @@ -57,7 +74,7 @@ Please cite [this paper](https://arxiv.org/abs/2108.03298) if you use this frame @inproceedings{robomimic2021, title={What Matters in Learning from Offline Human Demonstrations for Robot Manipulation}, author={Ajay Mandlekar and Danfei Xu and Josiah Wong and Soroush Nasiriany and Chen Wang and Rohun Kulkarni and Li Fei-Fei and Silvio Savarese and Yuke Zhu and Roberto Mart\'{i}n-Mart\'{i}n}, - booktitle={arXiv preprint arXiv:2108.03298}, + booktitle={Conference on Robot Learning (CoRL)}, year={2021} } ``` diff --git a/docs/conf.py b/docs/conf.py index e5ffa057..7402a994 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -14,7 +14,7 @@ import sys sys.path.insert(0, os.path.abspath('.')) -import sphinx_rtd_theme +import sphinx_book_theme import robomimic @@ -29,7 +29,6 @@ # ones. extensions = [ 'sphinx.ext.napoleon', - 'sphinx_rtd_theme', 'sphinx_markdown_tables', 'sphinx.ext.mathjax', 'sphinx.ext.githubpages', @@ -60,7 +59,7 @@ # General information about the project. project = 'robomimic' -copyright = '2021, Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang' +copyright = '2022, Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang' author = 'Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang' # The version info for the project you're documenting, acts as replacement for @@ -98,7 +97,7 @@ # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # -html_theme = 'sphinx_rtd_theme' +html_theme = 'sphinx_book_theme' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the @@ -111,11 +110,11 @@ # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] -html_context = { - 'css_files': [ - '_static/theme_overrides.css', # override wide tables in RTD theme - ], -} +# html_context = { +# 'css_files': [ +# '_static/theme_overrides.css', # override wide tables in RTD theme +# ], +# } # -- Options for HTMLHelp output ------------------------------------------ diff --git a/docs/datasets/d4rl.md b/docs/datasets/d4rl.md new file mode 100644 index 00000000..ba19e82a --- /dev/null +++ b/docs/datasets/d4rl.md @@ -0,0 +1,55 @@ +# D4RL + +## Overview +The [D4RL](https://arxiv.org/abs/2004.07219) benchmark provides a set of locomotion tasks and demonstration datasets. + +## Downloading + +Use `convert_d4rl.py` in the `scripts/conversion` folder to automatically download and postprocess the D4RL dataset in a single step. For example: + +```sh +# by default, download to robomimic/datasets +$ python convert_d4rl.py --env walker2d-medium-expert-v0 +# download to specific folder +$ python convert_d4rl.py --env walker2d-medium-expert-v0 --folder /path/to/output/folder/ +``` + +- `--env` specifies the dataset to download +- `--folder` specifies where you want to download the dataset. If no folder is provided, the `datasets` folder at the top-level of the repository will be used. + +The script will download the raw hdf5 dataset to `--folder`, and the converted one that is compatible with this repository into the `converted` subfolder. + +## Postprocessing + +No postprocessing is required, assuming the above script is run! + +## D4RL Results + +Below, we provide a table of results on common D4RL datasets using the algorithms included in the released codebase. We follow the convention in the TD3-BC paper, where we average results over the final 10 rollout evaluations, but we use 50 rollouts instead of 10 for each evaluation. Apart from a small handful of the halfcheetah results, the results align with those presented in the [TD3_BC paper](https://arxiv.org/abs/2106.06860). We suspect the halfcheetah results are different because we used `mujoco-py` version `2.0.2.13` in our evaluations, as opposed to `1.5` in order to be consistent with the version we were using for robosuite datasets. The results below were generated with `gym` version `0.17.3` and this `d4rl` [commit](https://github.com/rail-berkeley/d4rl/tree/9b68f31bab6a8546edfb28ff0bd9d5916c62fd1f). + +| | **BCQ** | **CQL** | **TD3-BC** | +| ----------------------------- | ------------- | ------------- | ------------- | +| **HalfCheetah-Medium** | 40.8% (4791) | 38.5% (4497) | 41.7% (4902) | +| **Hopper-Medium** | 36.9% (1181) | 30.7% (980) | 97.9% (3167) | +| **Walker2d-Medium** | 66.4% (3050) | 65.2% (2996) | 77.0% (3537) | +| **HalfCheetah-Medium-Expert** | 74.9% (9016) | 21.5% (2389) | 79.4% (9578) | +| **Hopper-Medium-Expert** | 83.8% (2708) | 111.7% (3614) | 112.2% (3631) | +| **Walker2d-Medium-Expert** | 70.2% (3224) | 77.4% (3554) | 102.0% (4683) | +| **HalfCheetah-Expert** | 94.3% (11427) | 29.2% (3342) | 95.4% (11569) | +| **Hopper-Expert** | 104.7% (3389) | 111.8% (3619) | 112.2% (3633) | +| **Walker2d-Expert** | 80.5% (3699) | 108.0% (4958) | 105.3% (4837) | + + +### Reproducing D4RL Results + +In order to reproduce the results above, first make sure that the `generate_paper_configs.py` script has been run, where the `--dataset_dir` argument is consistent with the folder where the D4RL datasets were downloaded using the `convert_d4rl.py` script. This is also the first step for reproducing results on the released robot manipulation datasets. The `--config_dir` directory used in the script (`robomimic/exps/paper` by default) will contain a `d4rl.sh` script, and a `d4rl` subdirectory that contains all the json configs. The table results above can be generated simply by running the training commands in the shell script. + +## Citation +```sh +@article{fu2020d4rl, + title={D4rl: Datasets for deep data-driven reinforcement learning}, + author={Fu, Justin and Kumar, Aviral and Nachum, Ofir and Tucker, George and Levine, Sergey}, + journal={arXiv preprint arXiv:2004.07219}, + year={2020} +} +``` \ No newline at end of file diff --git a/docs/datasets/momart.md b/docs/datasets/momart.md new file mode 100644 index 00000000..b76195a1 --- /dev/null +++ b/docs/datasets/momart.md @@ -0,0 +1,57 @@ +# MOMART Datasets and Experiments + +## Overview +[Mobile Manipulation RoboTurk (MoMaRT)](https://sites.google.com/view/il-for-mm/home) datasets are a collection of demonstrations collected on 5 long-horizon robot mobile manipulation tasks in a realistic simulated kitchen. + +

+ + + + + + + + + + +

+ +## Downloading + + +
+

Warning!

+ +When working with these datasets, please make sure that you have installed [iGibson](http://svl.stanford.edu/igibson/) from source and are on the `momart` branch. Exact steps for installing can be found [HERE](https://sites.google.com/view/il-for-mm/datasets#h.qw0vufk0hknk). + +
+ +We provide two ways for downloading MOMART datasets: + +### Method 1: Using `download_momart_datasets.py` (Recommended) +`download_momart_datasets.py` is a python script that provides a programmatic way of installing all datasets. This is the preferred method, because this script also sets up a directory structure for the datasets that works out of the box with examples for reproducing [MOMART paper's](https://arxiv.org/abs/2112.05251) results. + +```sh +# Use --help flag to view download and options +python /robomimic/scripts/download_momart_datasets.py +``` + +### Method 2: Using Direct Download Links + +For each type of dataset, we also provide a direct download links that will download the raw HDF5 file [HERE](https://sites.google.com/view/il-for-mm/datasets#h.ko0ilbky4y5u). + +## Postprocessing + +No postprocessing is needed for these datasets! + +## Citation +```sh +@inproceedings{wong2022error, + title={Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation}, + author={Wong, Josiah and Tung, Albert and Kurenkov, Andrey and Mandlekar, Ajay and Fei-Fei, Li and Savarese, Silvio and Mart{\'\i}n-Mart{\'\i}n, Roberto}, + booktitle={Conference on Robot Learning}, + pages={1367--1378}, + year={2022}, + organization={PMLR} +} +``` \ No newline at end of file diff --git a/docs/datasets/overview.md b/docs/datasets/overview.md new file mode 100644 index 00000000..ac9731e7 --- /dev/null +++ b/docs/datasets/overview.md @@ -0,0 +1,166 @@ +# Overview + +## Dataset Pipeline + + + +Datasets capture recorded environment data and are used as inputs to a given offline RL or IL algorithm in **robomimic**. In general, you can use datasets with **robomimic** by: + +1. **Downloading** the desired dataset +2. **Postprocessing** the dataset to guarantee compatibility with robomimic +3. **Training** agent(s) in robomimic with dataset + +**robomimic** currently supports the following datasets out of the box. Click on the corresponding **(1) Downloading** link to download the dataset and the corresponding **(2) Postprocessing** link for postprocessing that dataset. + + +| **Dataset** |
**Task Types**
| **Downloading** | **Postprocessing** | +| ----------------------------- | :-------------: | :-------------: | :-------------: | +| [**robomimic v0.1**](robomimic_v0.1.html)| Sim + Real Robot Manipulation | [Link](robomimic_v0.1.html#downloading) | [Link](robomimic_v0.1.html#postprocessing) | +| [**D4RL**](d4rl.html) | Sim Locomotion | [Link](d4rl.html#downloading) | [Link](d4rl.html#postprocessing) | +| [**MOMART**](momart.html) | Sim Mobile Manipulation | [Link](momart.html#downloading) | [Link](momart.html#postprocessing) | +| [**RoboTurk Pilot**](roboturk_pilot.html) | Sim Robot Manipulation | [Link](roboturk_pilot.html#downloading) | [Link](roboturk_pilot.html#postprocessing) | + + +After downloading and postprocessing, **(3) Training** with the dataset is straightforward and unified across all datasets: + +```sh +python train.py --dataset --config +``` + +## Generating Your Own Dataset + +**robomimic** provides tutorials for collecting custom datasets for specific environment platforms. Click on any of the links below for more information for the specific environment setup: + +| **Environment Platform** | **Task Types** | +| ----------------------------- | :---------------------: | +| [**robosuite**](robosuite.html)| Robot Manipulation | + +
+

Create Your Own Environment Wrapper!

+ +If you want to generate your own dataset in a custom environment platform that is not listed above, please see [THIS PAGE](../modules/environments.md#implement-an-environment-wrapper). + +
+ + +## Dataset Structure + +All postprocessed **robomimic** compatible datasets share the same data structure. A single dataset is a single HDF5 file with the following structure: + +
+ HDF5 Structure (click to expand) +

+ +- **`data`** (group) + + - **`total`** (attribute) - number of state-action samples in the dataset + + - **`env_args`** (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data. Three keys: `env_name`, the name of the environment or task to create, `env_type`, one of robomimic's supported [environment types](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/envs/env_base.py#L9), and `env_kwargs`, a dictionary of keyword-arguments to be passed into the environment of type `env_name`. + + - **`demo_0`** (group) - group for the first trajectory (every trajectory has a group) + + - **`num_samples`** (attribute) - the number of state-action samples in this trajectory + + - **`model_file`** (attribute) - the xml string corresponding to the MJCF MuJoCo model. Only present for robosuite datasets. + + - **`states`** (dataset) - flattened raw MuJoCo states, ordered by time. Shape (N, D) where N is the length of the trajectory, and D is the dimension of the state vector. Should be empty or have dummy values for non-robosuite datasets. + + - **`actions`** (dataset) - environment actions, ordered by time. Shape (N, A) where N is the length of the trajectory, and A is the action space dimension + + - **`rewards`** (dataset) - environment rewards, ordered by time. Shape (N,) where N is the length of the trajectory. + + - **`dones`** (dataset) - done signal, equal to 1 if playing the corresponding action in the state should terminate the episode. Shape (N,) where N is the length of the trajectory. + + - **`obs`** (group) - group for the observation keys. Each key is stored as a dataset. + + - **``** (dataset) - the first observation key. Note that the name of this dataset and shape will vary. As an example, the name could be "agentview_image", and the shape could be (N, 84, 84, 3). + + ... + + - **`next_obs`** (group) - group for the next observations. + + - **``** (dataset) - the first observation key. + + ... + + - **`demo_1`** (group) - group for the second trajectory + + ... + +- **`mask`** (group) - this group will exist in hdf5 datasets that contain filter keys + + - **``** (dataset) - the first filter key. Note that the name of this dataset and length will vary. As an example, this could be the "valid" filter key, and contain the list ["demo_0", "demo_19", "demo_35"], corresponding to 3 validation trajectories. + + ... + +

+
+ +### Data Conventions + +**robomimic**-compatible datasets expect certain values (such as images and actions) to be formatted a specific way. See the below sections for further details: + +
+ Storing images +

+

+

Warning!

+ +Dataset images should be of type `np.uint8` and be stored in channel-last `(H, W, C)` format. This is because: + +- **(1)** this is a common format that many `gym` environments and all `robosuite` environments return image observations in +- **(2)** using `np.uint8` (vs floats) saves space in dataset storage + +Note that the robosuite observation extraction script (`dataset_states_to_obs.py`) already stores images in the correct format. + +
+ +

+
+ + +
+ Storing actions +

+

+

Warning!

+ +Actions should be **normalized between -1 and 1**. This is because this range enables easier policy learning via the use of `tanh` layers). + +The `get_dataset_info.py` script can be used to sanity check stored actions, and will throw an `Exception` if there is a violation. + +
+ +

+
+ +### Filter Keys + +Filter keys enable arbitrary splitting of a dataset into sub-groups, and allow training on a specific subset of the data. + +A common use-case is to split data into train-validation splits. We provide a convenience script for doing this in the `robomimic/scripts` directory: + +```sh +$ python split_train_val.py --dataset /path/to/dataset.hdf5 --ratio 0.1 --filter_key +``` + +- `--dataset` specifies the path to the hdf5 dataset +- `--ratio` specifies the amount of validation data you would like to create. In the example above, 10% of the demonstrations will be put into the validation group. +- `--filter_key` (optional) By default, this script splits all demonstration keys in the hdf5 into 2 new hdf5 groups - one under `mask/train`, and one under `mask/valid`. If this argument is provided, the demonstration keys corresponding to this filter key (under `mask/`) will be split into 2 groups - `mask/_train` and `mask/_valid`. + +
+

Note!

+ +You can easily list the filter keys present in a dataset with the `get_dataset_info.py` script (see [this link](../tutorials/dataset_contents.html#view-dataset-structure-and-videos)), and you can even pass a `--verbose` flag to list the exact demonstrations that each filter key corresponds to. + +
+ +Using filter keys during training is easy. To use the generated train-valid split, you can set `config.experiment.validate=True` so that the demos under `mask/train` are used for training, and the demos under `mask/valid` are used for validation. + +You can also use a custom filter key for training by setting `config.train.hdf5_filter_key=`. This ensures that only the demos under `mask/` are used during training. If you also set `config.experiment.validate=True`, this filter key's train-valid split will be used. + + diff --git a/docs/introduction/results.md b/docs/datasets/robomimic_v0.1.md similarity index 68% rename from docs/introduction/results.md rename to docs/datasets/robomimic_v0.1.md index 8e172965..d8134ec9 100644 --- a/docs/introduction/results.md +++ b/docs/datasets/robomimic_v0.1.md @@ -1,30 +1,28 @@ -# Reproducing Study Results +# robomimic v0.1 (CoRL 2021) -This section provides a guide on how to reproduce different experiment results from the study. Please see the [paper](https://arxiv.org/abs/2108.03298) and the [study website](https://arise-initiative.github.io/robomimic-web/study/) for more information. +## Overview +robomimic v0.1 datasets is a large-scale, diverse collection of task demonstrations spanning: -**Warning:** When working with the robosuite datasets, please make sure that you have installed [robosuite](https://robosuite.ai/), and that you are on the `offline_study` branch of robosuite. +- multiple human demonstrators of varying quality +- multiple robot manipulation tasks of varying difficulty +- both simulated and real data -## Quick Example +## Downloading -In this section, we show a simple example of how to reproduce one of the results from the study - the BC-RNN result on the Lift (Proficient-Human) low-dim dataset. -```sh -# default behavior for download script - just download lift proficient-human low-dim dataset to robomimic/../datasets -$ python download_datasets.py +
+

Warning!

-# generate json configs for running all experiments at robomimic/exps/paper -$ python generate_paper_configs.py --output_dir /tmp/experiment_results +When working with these datasets, please make sure that you have installed [robosuite](https://robosuite.ai/) from source and are on the `offline_study` branch. -# the training command can be found in robomimic/exps/paper/core.sh -# Training results can be viewed at /tmp/experiment_results (--output_dir when generating paper configs). -$ python train.py --config ../exps/paper/core/lift/ph/low_dim/bc.json -``` +
-See the [downloading released datasets](./results.html#downloading-released-datasets) section below for more information on downloading different datasets, and the [results on released datasets](./results.html#results-on-released-datasets) section below for more detailed information on reproducing different results from the study. +We provide two ways for downloading robomimic v0.1 datasets: -## Downloading Released Datasets +### Method 1: Using `download_datasets.py` (Recommended) +`download_datasets.py` is a python script that provides a programmatic way of installing all datasets. This is the preferred method, because this script also sets up a directory structure for the datasets that works out of the box with examples for reproducing benchmark results. -Released datasets can be downloaded easily by using the `download_datasets.py` script. **This is the preferred method for downloading the datasets**, because the script will also set up a directory structure for the datasets that works out of the box with examples for reproducing some benchmark results with the repository. A few examples of using this script are provided below. +A few examples of using this script are provided below: ```sh # default behavior - just download lift proficient-human low-dim dataset @@ -48,11 +46,15 @@ $ python download_datasets.py --tasks real $ python download_datasets.py --download_dir /tmp/datasets ``` -For convenience, we also provide links to each dataset below - to make it easy to manually download any dataset of interest. See the [study](https://arise-initiative.github.io/robomimic-web/study/) for more information on the datasets. +### Method 2: Using Direct Download Links -### Proficient-Human (PH) +We also provide direct download links for each hdf5 dataset (the download links for the raw datasets are also included – they allow flexibility in extracting different kinds of observations and rewards during postprocessing): -These datasets were collected by 1 operator using the [RoboTurk](https://roboturk.stanford.edu/) platform. Each dataset consists of 200 successful trajectories. +**Proficient-Human (PH)** [**info**](robomimic_v0.1.html#proficient-human-ph) + +
+ Download Links +

proficient_human @@ -68,12 +70,17 @@ These datasets were collected by 1 operator using the [RoboTurk](https://robotur | ![lift_real](../images/lift_real.jpg) | ![can_real](../images/can_real.jpg) | ![tool_hang_real](../images/tool_hang_real.jpg) | | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/lift_real/ph/demo.hdf5) (1.9 GB) | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can_real/ph/demo.hdf5) (5.3 GB) | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/tool_hang_real/ph/demo.hdf5) (58 GB) | +

+
+
+
-### Multi-Human (MH) - -These datasets were collected by 6 operators using the [RoboTurk](https://roboturk.stanford.edu/) platform. Each dataset consists of 50 trajectories provided by each operator, for a total of 300 successful trajectories. The operators were varied in proficiency -- there were 2 "worse" operators, 2 "okay" operators, and 2 "better" operators, resulting in diverse, mixed quality datasets. +**Multi-Human (MH)** [**info**](robomimic_v0.1.html#multi-human-mh) +
+ Download Links +

multi_human | **Lift
(MH)** | **Can
(MH)** | **Square
(MH)** | **Transport
(MH)** | @@ -83,11 +90,17 @@ These datasets were collected by 6 operators using the [RoboTurk](https://robotu | [low_dim](http://downloads.cs.stanford.edu/downloads/rt_benchmark/lift/mh/low_dim.hdf5)
(46 MB) | [low_dim](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/mh/low_dim.hdf5)
(108 MB) | [low_dim](http://downloads.cs.stanford.edu/downloads/rt_benchmark/square/mh/low_dim.hdf5)
(119 MB) | [low_dim](http://downloads.cs.stanford.edu/downloads/rt_benchmark/transport/mh/low_dim.hdf5)
(609 MB) | | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/lift/mh/image.hdf5)
(2.6 GB) | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/mh/image.hdf5)
(5.1 GB) | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/square/mh/image.hdf5)
(6.5 GB) | [image](http://downloads.cs.stanford.edu/downloads/rt_benchmark/transport/mh/image.hdf5)
(32 GB) | +

+
+
+
-### Machine-Generated (MG) +**Machine-Generated (MG)** [**info**](robomimic_v0.1.html#machine-generated-mg) -These datasets were generated by [training](https://github.com/ARISE-Initiative/robosuite-benchmark) an [SAC](https://arxiv.org/abs/1801.01290) agent for each task, and then using each policy checkpoint saved during training to generate a mixed quality dataset. 300 rollouts were collected for each checkpoint, with 5 checkpoints for the Lift dataset (total of 1500 trajectories), and 13 checkpoints for the Can dataset (total of 3900 trajectories). +
+ Download Links +

machine_generated @@ -100,11 +113,17 @@ These datasets were generated by [training](https://github.com/ARISE-Initiative/ | [image (sparse)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/lift/mg/image_sparse.hdf5)
(19 GB) | [image (sparse)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/mg/image_sparse.hdf5)
(48 GB) | | [image (dense)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/lift/mg/image_dense.hdf5)
(19 GB) | [image (dense)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/mg/image_dense.hdf5)
(48 GB) | +

+
+
+
-### Paired +**Paired** [**info**](robomimic_v0.1.html#paired) -This is a diagnostic dataset to test the ability of algorithms to learn from mixed quality human data. A single experienced operator collected 2 demonstrations for each of 100 task initializations on the Can task, resulting in 200 total demonstrations. Each pair of demonstrations consists of a "good" trajectory, where the can is picked up and placed in the correct bin, and a "bad" trajectory, where the can is picked up, and tossed outside of the robot workspace. Since the task initializations are identical, and the first part of each trajectory leading up to the can grasp is similar, there is a strong expectation for algorithms that deal with suboptimal data, to be able to filter the good trajectories from the bad ones, and achieve near-perfect performance. +
+ Download Links +

| **Can Paired** | | :----------------------------------------------------------: | @@ -113,15 +132,39 @@ This is a diagnostic dataset to test the ability of algorithms to learn from mix | [low_dim (sparse)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/paired/low_dim.hdf5)
(39 MB) | | [image (sparse)](http://downloads.cs.stanford.edu/downloads/rt_benchmark/can/paired/image.hdf5)
(1.7 GB) | +

+
+ +## Postprocessing +If a **low_dim** or **image** dataset was downloaded, the dataset works out of the box! No postprocessing is needed. + +If a **raw** dataset was downloaded, the dataset must be postprocessed since there are no observations stored. You must run `dataset_states_to_obs.py`. For more information, see [this page](robosuite.html#extracting-observations-from-mujoco-states). + +## Info +Below, we provide information on each dataset provided: -## Results on Released Datasets +### Proficient-Human (PH) + +These datasets were collected by 1 operator using the [RoboTurk](https://roboturk.stanford.edu/) platform. Each dataset consists of 200 successful trajectories. -This section discusses how to reproduce the results from the [study](https://arise-initiative.github.io/robomimic-web/study/). +### Multi-Human (MH) -### Reproducing Experiments +These datasets were collected by 6 operators using the [RoboTurk](https://roboturk.stanford.edu/) platform. Each dataset consists of 50 trajectories provided by each operator, for a total of 300 successful trajectories. The operators were varied in proficiency -- there were 2 "worse" operators, 2 "okay" operators, and 2 "better" operators, resulting in diverse, mixed quality datasets. + +### Machine-Generated (MG) -After downloading the appropriate datasets you're interested in using by running the `download_datasets.py` script, the `generate_paper_configs.py` script can be used to generate all training config json files necessary to reproduce the experiments in the [study](https://arise-initiative.github.io/robomimic-web/study/). The script takes 3 important arguments -- `--config_dir` can be used to specify where the config json files will be generated (defaults to `robomimic/exps/paper`). The `--dataset_dir` specifies where the released datasets can be found, and should be consistent with the `--download_dir` argument supplied to `download_datasets.py` earlier (if omitted, both scripts default to `robomimic/../datasets`). The `--output_dir` argument specifies where training results will be written (including model checkpoints, logs, and rollout videos). A few examples are below. +These datasets were generated by [training](https://github.com/ARISE-Initiative/robosuite-benchmark) an [SAC](https://arxiv.org/abs/1801.01290) agent for each task, and then using each policy checkpoint saved during training to generate a mixed quality dataset. 300 rollouts were collected for each checkpoint, with 5 checkpoints for the Lift dataset (total of 1500 trajectories), and 13 checkpoints for the Can dataset (total of 3900 trajectories). + +### Paired + +This is a diagnostic dataset to test the ability of algorithms to learn from mixed quality human data. A single experienced operator collected 2 demonstrations for each of 100 task initializations on the Can task, resulting in 200 total demonstrations. Each pair of demonstrations consists of a "good" trajectory, where the can is picked up and placed in the correct bin, and a "bad" trajectory, where the can is picked up, and tossed outside of the robot workspace. Since the task initializations are identical, and the first part of each trajectory leading up to the can grasp is similar, there is a strong expectation for algorithms that deal with suboptimal data, to be able to filter the good trajectories from the bad ones, and achieve near-perfect performance. + +## Reproduce [Study](https://arise-initiative.github.io/robomimic-web/study/) Results + +### Running Experiments + +After downloading the appropriate datasets you're interested in using by running the `download_datasets.py` script, the `generate_paper_configs.py` script can be used to generate all training config json files necessary to reproduce the experiments in the [study](https://arise-initiative.github.io/robomimic-web/study/). A few examples are below. ```sh # Assume datasets already exist in robomimic/../datasets folder. Configs will be generated under robomimic/exps/paper, and training results will be at /tmp/experiment_results when launching training runs. @@ -131,7 +174,11 @@ $ python generate_paper_configs.py --output_dir /tmp/experiment_results $ python generate_paper_configs.py --config_dir /tmp/configs --dataset_dir /tmp/datasets --output_dir /tmp/experiment_results ``` -Then, to reproduce a specific set of training runs for different experiment groups (see below), we can simply navigate to the generated config directory, and copy training commands from the generated shell script there. As an example, we can reproduce the low-dim BC and BC-RNN training results on the Lift PH dataset, by looking for the correct set of commands in `robomimic/exps/paper/core.sh` and running them. The relevant section of the shell script is reproduced below. +- `--config_dir` specifies where the config json files will be generated (defaults to `robomimic/exps/paper`) +- `--dataset_dir` specifies where the released datasets can be found, and should be consistent with the `--download_dir` argument supplied to `download_datasets.py` earlier. (if omitted, both scripts default to `robomimic/../datasets`) +- `--output_dir` specifies where training results will be written (including model checkpoints, logs, and rollout videos) + +Then, to reproduce a specific set of training runs for different experiment groups, we can simply navigate to the generated config directory, and copy training commands from the generated shell script there. As an example, we can reproduce the low-dim BC and BC-RNN training results on the Lift PH dataset, by looking for the correct set of commands in `robomimic/exps/paper/core.sh` and running them. The relevant section of the shell script is reproduced below. ```bash # task: lift @@ -141,7 +188,14 @@ python /path/to/robomimic/scripts/train.py --config /path/to/robomimic/exps/pape python /path/to/robomimic/scripts/train.py --config /path/to/robomimic/exps/paper/core/lift/ph/low_dim/bc_rnn.json ``` +
+

Want to Run Experiments on Custom Observations?

+ +We provide the raw (observation-free) `demo.hdf5` datasets so that you can generate your own custom set of observations, such as additional camera viewpoints. For information, see [Extracting Observations from Datasets](robosuite.md#extracting-observations-from-mujoco-states). +**NOTE**: To compare against how our paper's released datasets were generated, please see the `extract_obs_from_raw_datasets.sh` script. + +
### Overview of Included Experiments @@ -159,42 +213,30 @@ Each group of experiments below has a shell script (for example `core.sh`) and a - **d4rl:** results on D4RL datasets (see section below) - - -## D4RL - -Below, we provide a table of results on common D4RL datasets using the algorithms included in the released codebase. We follow the convention in the TD3-BC paper, where we average results over the final 10 rollout evaluations, but we use 50 rollouts instead of 10 for each evaluation. Apart from a small handful of the halfcheetah results, the results align with those presented in the [TD3_BC paper](https://arxiv.org/abs/2106.06860). We suspect the halfcheetah results are different because we used `mujoco-py` version `2.0.2.13` in our evaluations, as opposed to `1.5` in order to be consistent with the version we were using for robosuite datasets. The results below were generated with `gym` version `0.17.3` and this `d4rl` [commit](https://github.com/rail-berkeley/d4rl/tree/9b68f31bab6a8546edfb28ff0bd9d5916c62fd1f). - -| | **BCQ** | **CQL** | **TD3-BC** | -| ----------------------------- | ------------- | ------------- | ------------- | -| **HalfCheetah-Medium** | 40.8% (4791) | 38.5% (4497) | 41.7% (4902) | -| **Hopper-Medium** | 36.9% (1181) | 30.7% (980) | 97.9% (3167) | -| **Walker2d-Medium** | 66.4% (3050) | 65.2% (2996) | 77.0% (3537) | -| **HalfCheetah-Medium-Expert** | 74.9% (9016) | 21.5% (2389) | 79.4% (9578) | -| **Hopper-Medium-Expert** | 83.8% (2708) | 111.7% (3614) | 112.2% (3631) | -| **Walker2d-Medium-Expert** | 70.2% (3224) | 77.4% (3554) | 102.0% (4683) | -| **HalfCheetah-Expert** | 94.3% (11427) | 29.2% (3342) | 95.4% (11569) | -| **Hopper-Expert** | 104.7% (3389) | 111.8% (3619) | 112.2% (3633) | -| **Walker2d-Expert** | 80.5% (3699) | 108.0% (4958) | 105.3% (4837) | - +### Quick Example -### Downloading D4RL Datasets - -To download and convert D4RL datasets to be compatible with this repository, use the following conversion script in the `scripts/conversion` folder, and specify the `--env` that the dataset corresponds to, and optionally the `--folder` where you want to download the dataset. If no folder is provided, the `datasets` folder at the top-level of the repository will be used. The example below downloads and converts the `walker2d-medium-expert-v0` dataset. This should be done for all D4RL datasets of interest. +Below, we show a simple example of how to reproduce one of the results from the study - the BC-RNN result on the Lift (Proficient-Human) low-dim dataset: ```sh -# by default, download to robomimic/datasets -$ python convert_d4rl.py --env walker2d-medium-expert-v0 -# download to specific folder -$ python convert_d4rl.py --env walker2d-medium-expert-v0 --folder /path/to/output/folder/ -``` - -The script will download the raw hdf5 dataset to the folder, and the converted one that is compatible with this repository into the `converted` subfolder. - +# default behavior for download script - just download lift proficient-human low-dim dataset to robomimic/../datasets +$ python download_datasets.py +# generate json configs for running all experiments at robomimic/exps/paper +$ python generate_paper_configs.py --output_dir /tmp/experiment_results -### Reproduce D4RL Results +# the training command can be found in robomimic/exps/paper/core.sh +# Training results can be viewed at /tmp/experiment_results (--output_dir when generating paper configs). +$ python train.py --config ../exps/paper/core/lift/ph/low_dim/bc.json +``` -In order to reproduce the results above, first make sure that the `generate_paper_configs.py` script has been run, where the `--dataset_dir` argument is consistent with the folder where the D4RL datasets were downloaded using the `convert_d4rl.py` script. This is also the first step for reproducing results on the released robot manipulation datasets. The `--config_dir` directory used in the script (`robomimic/exps/paper` by default) will contain a `d4rl.sh` script, and a `d4rl` subdirectory that contains all the json configs. The table results above can be generated simply by running the training commands in the shell script. +## Citation +```sh +@inproceedings{mandlekar2021matters, + title={What Matters in Learning from Offline Human Demonstrations for Robot Manipulation}, + author={Mandlekar, Ajay and Xu, Danfei and Wong, Josiah and Nasiriany, Soroush and Wang, Chen and Kulkarni, Rohun and Fei-Fei, Li and Savarese, Silvio and Zhu, Yuke and Mart{\'\i}n-Mart{\'\i}n, Roberto}, + booktitle={5th Annual Conference on Robot Learning}, + year={2021} +} +``` \ No newline at end of file diff --git a/docs/datasets/robosuite.md b/docs/datasets/robosuite.md new file mode 100644 index 00000000..91a66f4a --- /dev/null +++ b/docs/datasets/robosuite.md @@ -0,0 +1,85 @@ +# robosuite Datasets + +The repository is fully compatible with datasets collected using [robosuite](https://robosuite.ai/). See [this link](https://robosuite.ai/docs/algorithms/demonstrations.html) for more information on collecting your own human demonstrations using robosuite. + +## Converting robosuite hdf5 datasets + +The raw `demo.hdf5` file generated by the `collect_human_demonstrations.py` robosuite script can easily be modified in-place to be compatible with **robomimic**: + +```sh +$ python conversion/convert_robosuite.py --dataset /path/to/demo.hdf5 +``` + +
+

Post-Processed Dataset Structure

+ +This post-processed `demo.hdf5` file in its current state is _missing_ observations (e.g.: proprioception, images, ...), rewards, and dones, which are necessary for training policies. + +However, keeping these observation-free datasets is useful because it **allows flexibility in [extracting](robosuite.md#extracting-observations-from-mujoco-states) different kinds of observations and rewards**. + +
+ Dataset Structure (click to expand) +

+ +- `data` (group) + + - `total` (attribute) - number of state-action samples in the dataset + + - `env_args` (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data + + - `demo_0` (group) - group for the first demonstration (every demonstration has a group) + + - `num_samples` (attribute) - the number of state-action samples in this trajectory + - `model_file` (attribute) - the xml string corresponding to the MJCF MuJoCo model + - `states` (dataset) - flattened raw MuJoCo states, ordered by time + - `actions` (dataset) - environment actions, ordered by time + + - `demo_1` (group) - group for the second demonstration + + ... +

+
+ +
+ + +Next, we will extract observations from this raw dataset. + + +## Extracting Observations from MuJoCo states + +
+

Warning! Train-Validation Data Splits

+ +For robosuite datasets, if using your own [train-val splits](overview.md#filter-keys), generate these splits _before_ extracting observations. This ensures that all postprocessed hdf5s generated from the `demo.hdf5` inherits the same filter keys. + +
+ +Generating observations from a dataset is straightforward and can be done with a single command from `robomimic/scripts`: + +```sh +# For low dimensional observations only, with done on task success +$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name low_dim.hdf5 --done_mode 2 + +# For including image observations +$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image.hdf5 --done_mode 2 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 + +# Using dense rewards +$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_dense.hdf5 --done_mode 2 --dense --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 + +# Only writing done at the end of the trajectory +$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_done_1.hdf5 --done_mode 1 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 + +# For seeing descriptions of all the command-line args available +$ python dataset_states_to_obs.py --help +``` + +## Citation +```sh +@article{zhu2020robosuite, + title={robosuite: A modular simulation framework and benchmark for robot learning}, + author={Zhu, Yuke and Wong, Josiah and Mandlekar, Ajay and Mart{\'\i}n-Mart{\'\i}n, Roberto}, + journal={arXiv preprint arXiv:2009.12293}, + year={2020} +} +``` \ No newline at end of file diff --git a/docs/datasets/roboturk_pilot.md b/docs/datasets/roboturk_pilot.md new file mode 100644 index 00000000..78e4b5d8 --- /dev/null +++ b/docs/datasets/roboturk_pilot.md @@ -0,0 +1,42 @@ +# RoboTurk Pilot + +## Overview + +The first [RoboTurk paper](https://arxiv.org/abs/1811.02790) released [large-scale pilot datasets](https://roboturk.stanford.edu/dataset_sim.html) collected with robosuite `v0.3`. These datasets consist of over 1000 task demonstrations each on several Sawyer `PickPlace` and `NutAssembly` task variants, collected by several human operators. This repository is fully compatible with these datasets. + +

+ +

+ +## Downloading + +
+

Warning!

+ +When working with these datasets, please make sure that you have installed [robosuite](https://robosuite.ai/) from source and are on the `roboturk_v1` branch. + +
+ +Download the dataset [here](http://cvgl.stanford.edu/projects/roboturk/RoboTurkPilot.zip) (~9 GB download), and unzip the file, resulting in a `RoboTurkPilot` folder. + +## Postprocessing + +First, the dataset must be updated to a format compatible with **robomimic**. Run the following script (these arguments are provided as an example): +```sh +# convert the Can demonstrations, and also create a "fastest_225" filter_key (prior work such as IRIS has trained on this subset) +$ python conversion/convert_roboturk_pilot.py --folder /path/to/RoboTurkPilot/bins-Can --n 225 +``` +Then, the dataset must be postprocessed since there are no observations stored. You must run `dataset_states_to_obs.py`. For more information, see [this page](robosuite.html#extracting-observations-from-mujoco-states). + +## Citation + +```sh +@inproceedings{mandlekar2018roboturk, + title={Roboturk: A crowdsourcing platform for robotic skill learning through imitation}, + author={Mandlekar, Ajay and Zhu, Yuke and Garg, Animesh and Booher, Jonathan and Spero, Max and Tung, Albert and Gao, Julian and Emmons, John and Gupta, Anchit and Orbay, Emre and others}, + booktitle={Conference on Robot Learning}, + pages={879--893}, + year={2018}, + organization={PMLR} +} +``` \ No newline at end of file diff --git a/docs/images/core_features.png b/docs/images/core_features.png new file mode 100644 index 00000000..3251fcf8 Binary files /dev/null and b/docs/images/core_features.png differ diff --git a/docs/images/tensorboard.png b/docs/images/tensorboard.png new file mode 100644 index 00000000..3d421867 Binary files /dev/null and b/docs/images/tensorboard.png differ diff --git a/docs/index.rst b/docs/index.rst index 5c51c748..e8c60ab2 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -12,13 +12,39 @@ Welcome to robomimic's documentation! introduction/overview introduction/installation - introduction/quickstart - introduction/features - introduction/advanced - introduction/examples - introduction/datasets - introduction/model_zoo - introduction/results + introduction/implemented_algorithms + introduction/getting_started + +.. toctree:: + :maxdepth: 1 + :caption: Datasets + + datasets/overview + datasets/robomimic_v0.1 + datasets/robosuite + datasets/d4rl + datasets/momart + datasets/roboturk_pilot + +.. toctree:: + :maxdepth: 1 + :caption: Pretrained Models + + model_zoo/robomimic_v0.1 + +.. toctree:: + :maxdepth: 1 + :caption: Tutorials + + tutorials/configs + tutorials/viewing_results + tutorials/hyperparam_scan + tutorials/reproducing_experiments + tutorials/dataset_contents + tutorials/using_pretrained_models + tutorials/observations + tutorials/custom_algorithms + tutorials/tensor_collections .. toctree:: :maxdepth: 1 @@ -26,7 +52,6 @@ Welcome to robomimic's documentation! modules/overview modules/dataset - modules/observations modules/algorithms modules/models modules/configs diff --git a/docs/introduction/datasets.md b/docs/introduction/datasets.md deleted file mode 100644 index 565e75b6..00000000 --- a/docs/introduction/datasets.md +++ /dev/null @@ -1,210 +0,0 @@ -# Using Demonstration Datasets - -This section contains information on the hdf5 dataset structure used by **robomimic**, and additional dataset types that we offer conversion scripts for. - -## Dataset Structure - -The repository expects hdf5 datasets with a certain structure. The structure is shown below. - -data (group) - -- `total` (attribute) - number of state-action samples in the dataset - -- `env_args` (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data - -- `mask` (group) - this group will exist in hdf5 datasets that contain filter keys - - - `` (dataset) - the first filter key. Note that the name of this dataset and length will vary. As an example, this could be the "valid" filter key, and contain the list ["demo_0", "demo_19", "demo_35"], corresponding to 3 validation trajectories. - -- `demo_0` (group) - group for the first trajectory (every trajectory has a group) - - - `num_samples` (attribute) - the number of state-action samples in this trajectory - - - `model_file` (attribute) - the xml string corresponding to the MJCF MuJoCo model. Only present for robosuite datasets. - - - `states` (dataset) - flattened raw MuJoCo states, ordered by time. Shape (N, D) where N is the length of the trajectory, and D is the dimension of the state vector. Should be empty or have dummy values for non-robosuite datasets. - - - `actions` (dataset) - environment actions, ordered by time. Shape (N, A) where N is the length of the trajectory, and A is the action space dimension - - - `rewards` (dataset) - environment rewards, ordered by time. Shape (N,) where N is the length of the trajectory. - - - `dones` (dataset) - done signal, equal to 1 if playing the corresponding action in the state should terminate the episode. Shape (N,) where N is the length of the trajectory. - - - `obs` (group) - group for the observation keys. Each key is stored as a dataset. - - - `` (dataset) - the first observation key. Note that the name of this dataset and shape will vary. As an example, the name could be "agentview_image", and the shape could be (N, 84, 84, 3). - - ... - - - `next_obs` (group) - group for the next observations. - - - `` (dataset) - the first observation key. - - ... - -- `demo_1` (group) - group for the second trajectory - - ... - -### Filter Keys and Train-Valid Splits - -Each filter key is a dataset in the "mask" group of the dataset hdf5, which contains a list of the demo group keys - these correspond to subsets of trajectories in the dataset. Filter keys make it easy to train on a subset of the data present in an hdf5. A common use is to split a dataset into training and validation datasets using the `split_train_val.py` script. - -```sh -$ python split_train_val.py --dataset /path/to/dataset.hdf5 --ratio 0.1 -``` - -The example above creates a `train` filter key and a `valid` filter key under `mask/train` and `mask/valid`, where the former contains a list of demo groups corresponding to a 90% subset of the dataset trajectories, and the latter contains a list of demo groups correspond to a 10% subset of the dataset trajectories. These filter keys are used by the data loader during training if `config.experiment.validate` is set to True in the training config. - -Many of the released datasets contain other filter keys besides the train-val splits. Some contain `20_percent` and `50_percent` filter keys corresponding to data subsets, and the Multi-Human datasets contain filter keys that correspond to each operator's data (e.g. `better_operator_1`, `better_operator_2`), and ones that correspond to different combinations of operator data (e.g. `better`, `worse_better`). - -Using these filter keys during training is simple. For example, to use the `20_percent` subset during training, you can simply set `config.train.hdf5_filter_key = "20_percent"` in the training config. If using validation, then the `20_percent_train` and `20_percent_valid` filter keys will also be used -- these were generated using the `split_train_val.py` script by passing `--filter_key 20_percent`. - -For robosuite datasets, if attempting to create your own train-val splits, we recommend running the `split_train_val.py` script on the `demo.hdf5` file before extracting observations, since filter keys are copied from the source hdf5 during observation extraction (see more details below on robosuite hdf5s). This will ensure that all postprocessed hdf5s generated from the `demo.hdf5` inherits the same filter keys. - -### Storing image observations - -The repository expects image observations stored in the hdf5 to be of type `np.uint8` and be stored in channel-last `(H, W, C)` format. This is for two reasons - (1) this is a common format that many `gym` environments and all `robosuite` environments return image observations in, and (2) using `np.uint8` saves space in dataset storage, as opposed to using floats. Note that the robosuite observation extraction script (`dataset_states_to_obs.py`) already stores images in the correct format. - -### Storing actions - -The repository **expects all actions to be normalized** between -1 and 1 (this makes for easier policy learning and allows the use of `tanh` layers). The `get_dataset_info.py` script can be used to sanity check the actions in a dataset, as it will throw an `Exception` if there is a violation. - -### View Dataset Structure and Videos - -**Note:** The examples in this section use the small hdf5 dataset packaged with the repository in `tests/assets/test.hdf5`, but you can run these examples with any dataset hdf5. - -**Warning:** If you are using the default dataset, please make sure that robosuite is on the `offline_study` branch of robosuite -- this is necessary for the playback scripts to function properly. - -The repository offers a simple utility script (`get_dataset_info.py`) to view the hdf5 dataset structure and some statistics of hdf5 datasets. The script will print out some statistics about the trajectories, the filter keys present in the dataset, the environment metadata in the dataset, and the dataset structure for the first demonstration. Pass the `--verbose` argument to print the list of demonstration keys under each filter key, and the dataset structure for all demonstrations. - -```sh -$ python get_dataset_info.py --dataset ../../tests/assets/test.hdf5 -``` - -The repository also offers a utility script (`playback_dataset.py`) that allows you to easily view dataset trajectories, and verify that the recorded dataset actions are reasonable. The example below loads the saved MuJoCo simulator states one by one in a simulation environment, and renders frames from some simulation cameras to generate a video, for the first 5 trajectories. This is an easy way to view trajectories from the dataset. After this script runs, you can view the video at `/tmp/playback_dataset.mp4`. - -```sh -$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --render_image_names agentview robot0_eye_in_hand --video_path /tmp/playback_dataset.mp4 --n 5 -``` - -An alternative way to view the demonstrations is to directly visualize the image observations in the dataset. This is especially useful for real robot datasets, where there is no simulator to use for rendering. - -```sh -$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --use-obs --render_image_names agentview_image --video_path /tmp/obs_trajectory.mp4 -``` - -It's also easy to use the script to verify that the dataset actions are reasonable, by playing the actions back one by one in the environment. - -```sh -$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --use-actions --render_image_names agentview --video_path /tmp/playback_dataset_with_actions.mp4 -``` - -Finally, the script can be used to visualize the initial states in the demonstration data. - -```sh -$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --first --render_image_names agentview --video_path /tmp/dataset_task_inits.mp4 -``` - - - -## Robosuite HDF5 Datasets - -The repository is fully compatible with datasets collected using [robosuite](https://robosuite.ai/). See [this link](https://robosuite.ai/docs/algorithms/demonstrations.html) for more information on collecting your own human demonstrations using robosuite. - -### Converting robosuite hdf5 datasets - -The raw `demo.hdf5` file generated by the `collect_human_demonstrations.py` robosuite script can easily be modified in-place to be compatible with this repository, by using the following conversion script in the scripts folder. - -```sh -$ python conversion/convert_robosuite.py --dataset /path/to/demo.hdf5 -``` - -Afterwards, observations should be extracted from the `demo.hdf5` dataset (see below). - -### Structure of raw collected demonstrations - -The structure of these converted raw `demo.hdf5` files is very similar to the normal hdf5 dataset structure, and is compatible with scripts such as `get_dataset_info.py` and `playback_dataset.py`, but it is missing observations (such as proprioception, object poses, and images),, rewards, and dones, which are necessary for training policies. Keeping these raw `demo.hdf5` datasets around is a good idea -- it **allows flexibility in extracting different kinds of observations and rewards** (see below section on extracting observations). The structure of these raw datasets is shown below. - -- `data` (group) - - - `total` (attribute) - number of state-action samples in the dataset - - - `env_args` (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data - - - `demo_0` (group) - group for the first demonstration (every demonstration has a group) - - - `num_samples` (attribute) - the number of state-action samples in this trajectory - - `model_file` (attribute) - the xml string corresponding to the MJCF MuJoCo model - - `states` (dataset) - flattened raw MuJoCo states, ordered by time - - `actions` (dataset) - environment actions, ordered by time - - - `demo_1` (group) - group for the second demonstration - - ... - -### Extracting Observations from MuJoCo states - -As mentioned above, the `demo.hdf5` file produced by robosuite only contains low-level mujoco states - it does not contain observations (such as proprioception, object poses, and images), rewards, or dones - all of which may be needed for learning. In this section, we show how to postprocess these hdf5 files to produce ones compatible with the training pipeline. We provide two examples below - one of which produces an hdf5 with a low-dim observation space, and one which produces an hdf5 with an image observation space. These commands are similar to the ones we used to produce `low_dim.hdf5` and `image.hdf5` files in our released datasets. - -```sh -$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name low_dim.hdf5 --done_mode 2 -$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image.hdf5 --done_mode 2 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 -``` - -Note that we released the `demo.hdf5` files for our collected demonstration data as well - this makes it easy to extract observations directly from these files instead of using the pre-defined observation spaces provided in the `low_dim.hdf5` and `image.hdf5` dataset files. For example, our image observation spaces consisted of the `agentview` and `robot0_eye_in_hand` cameras, with 84x84 images, but if you'd also like the option to train on the `birdview` camera images, and you'd like to increase image resolution to 120x120, you can do that easily using the script. - -```sh -$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name custom_image.hdf5 --done_mode 2 --camera_names agentview robot0_eye_in_hand birdview --camera_height 120 --camera_width 120 -``` - -The script can also be used to change the rewards and dones in the dataset. We used sparse rewards and dones on task success and at the end of each trajectory (this corresponds to done mode 2 in the script). However, the script can be used to write dense rewards, or change the done annotation to be 1 only at the end of each trajectory (this corresponds to done mode 1 in the script). - -```sh -$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_dense.hdf5 --done_mode 2 --dense --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 -$ python dataset_states_to_obs.py --dataset /path/to/demo.hdf5 --output_name image_done_1.hdf5 --done_mode 1 --camera_names agentview robot0_eye_in_hand --camera_height 84 --camera_width 84 -``` - -For more details on how the released `demo.hdf5` dataset files were used to generate the `low_dim.hdf5` and `image.hdf5` files, please see the `extract_obs_from_raw_datasets.sh` script, which contains the commands that were used for our released datasets. - - - -## MOMART Datasets - -

- - - - - - - - - - -

- -This repository is fully compatible with [MOMART](https://sites.google.com/view/il-for-mm/home) datasets, a large collection of long-horizon, multi-stage simulated kitchen tasks executed by a mobile manipulator robot. See [this link](https://sites.google.com/view/il-for-mm/datasets) for a breakdown of the MOMART dataset structure, guide on downloading MOMART datasets, and running experiments using the datasets. - - - -## D4RL Datasets - -This repository is fully compatible with most [D4RL](https://github.com/rail-berkeley/d4rl) datasets. See [this link](./results.html#d4rl) for a guide on downloading D4RL datasets and running D4RL experiments. - - - -## RoboTurk Pilot Datasets - -The first [RoboTurk paper](https://arxiv.org/abs/1811.02790) released [large-scale pilot datasets](https://roboturk.stanford.edu/dataset_sim.html) collected with robosuite `v0.3`. These datasets consist of over 1000 task demonstrations each on several Sawyer `PickPlace` and `NutAssembly` task variants, collected by several human operators. This repository is fully compatible with these datasets. - -![roboturk_pilot](../images/roboturk_pilot.png) - -To get started, first download the dataset [here](http://cvgl.stanford.edu/projects/roboturk/RoboTurkPilot.zip) (~9 GB download), and unzip the file, resulting in a `RoboTurkPilot` folder. This folder has subdirectories corresponding to each task, each with a raw hdf5 file. You can convert the demonstrations using a command like the one below. - -```sh -# convert the Can demonstrations, and also create a "fastest_225" filter_key (prior work such as IRIS has trained on this subset) -$ python conversion/convert_roboturk_pilot.py --folder /path/to/RoboTurkPilot/bins-Can --n 225 -``` - -Next, make sure that you're on the [roboturk_v1](https://github.com/ARISE-Initiative/robosuite/tree/roboturk_v1) branch of robosuite, which is a modified version of v0.3. **You should always be on the roboturk_v1 branch when using these datasets.** Finally, follow the instructions in the above "Extracting Observations from MuJoCo states" section to extract observations from the raw converted `demo.hdf5` file, in order to produce an hdf5 ready for training. \ No newline at end of file diff --git a/docs/introduction/examples.md b/docs/introduction/examples.md deleted file mode 100644 index 779c62d0..00000000 --- a/docs/introduction/examples.md +++ /dev/null @@ -1,162 +0,0 @@ -# Working with robomimic Modules - -This section discusses some simple examples packaged with the repository (in the top-level `examples` folder) that provide a more thorough understanding of components used in the repository. These examples are meant to assist users who may want to build on these components, or use these components in other applications, in contrast to the [Getting Started](./quickstart.html) section, which provides examples relevant to using the repository as-is. - -## Train Loop Example - -We include a simple example script in `examples/simple_train_loop.py` to show how easy it is to use our `SequenceDataset` class and standardized hdf5 datasets in a general torch training loop. Run the example using the command below. - -```sh -$ python examples/simple_train_loop.py -``` - -Modifying this example for use in other code repositories is simple. First, create the dataset loader as in the script. - -```python -from robomimic.utils.dataset import SequenceDataset - -def get_data_loader(dataset_path): - """ - Get a data loader to sample batches of data. - """ - dataset = SequenceDataset( - hdf5_path=dataset_path, - obs_keys=( # observations we want to appear in batches - "robot0_eef_pos", - "robot0_eef_quat", - "robot0_gripper_qpos", - "object", - ), - dataset_keys=( # can optionally specify more keys here if they should appear in batches - "actions", - "rewards", - "dones", - ), - load_next_obs=True, - frame_stack=1, - seq_length=10, # length-10 temporal sequences - pad_frame_stack=True, - pad_seq_length=True, # pad last obs per trajectory to ensure all sequences are sampled - get_pad_mask=False, - goal_mode=None, - hdf5_cache_mode="all", # cache dataset in memory to avoid repeated file i/o - hdf5_use_swmr=True, - hdf5_normalize_obs=False, - filter_by_attribute=None, # can optionally provide a filter key here - ) - print("\n============= Created Dataset =============") - print(dataset) - print("") - - data_loader = DataLoader( - dataset=dataset, - sampler=None, # no custom sampling logic (uniform sampling) - batch_size=100, # batches of size 100 - shuffle=True, - num_workers=0, - drop_last=True # don't provide last batch in dataset pass if it's less than 100 in size - ) - return data_loader - -data_loader = get_data_loader(dataset_path="/path/to/your/dataset.hdf5") -``` - -Then, construct your model, and use the same pattern as in the `run_train_loop` function in the script, to iterate over batches to train the model. - -```python -for epoch in range(1, num_epochs + 1): - - # iterator for data_loader - it yields batches - data_loader_iter = iter(data_loader) - - for train_step in range(gradient_steps_per_epoch): - # load next batch from data loader - try: - batch = next(data_loader_iter) - except StopIteration: - # data loader ran out of batches - reset and yield first batch - data_loader_iter = iter(data_loader) - batch = next(data_loader_iter) - - # @batch is a dictionary with keys loaded from the dataset. - # Train your model on the batch below. - -``` - - - -## Config Example - -The simple config example script at `examples/simple_config.py` shows how the `Config` object can easily be instantiated and modified safely with different levels of locking. We reproduce certain portions of the script. First, we can create a `Config` object and call `lock` when we think we won't need to change it anymore. - -```python -from robomimic.config.base_config import Config - -# create config -config = Config() - -# add nested attributes to the config -config.train.batch_size = 100 -config.train.learning_rate = 1e-3 -config.algo.actor_network_size = [1000, 1000] -config.lock() # prevent accidental changes -``` - -Now, when we try to add a new key (or modify the value of an existing key), the config will throw an error. - -```python -# the config is locked --- cannot add new keys or modify existing keys -try: - config.train.optimizer = "Adam" -except RuntimeError as e: - print(e) -``` - -However, the config can be safely modified using appropriate contexts. - -```python -# values_unlocked scope allows modifying values of existing keys, but not adding keys -with config.values_unlocked(): - config.train.batch_size = 200 -print("batch_size={}".format(config.train.batch_size)) - -# unlock config within the scope, allowing new keys to be inserted -with config.unlocked(): - config.test.num_eval = 10 - -# verify that the config remains locked outside of the scope -assert config.is_locked -assert config.test.is_locked -``` - -Finally, the config can also be updated by using external dictionaries - this is helpful for loading config jsons. - -```python -# update this config with external config from a dict -ext_config = { - "train": { - "learning_rate": 1e-3 - }, - "algo": { - "actor_network_size": [1000, 1000] - } -} -with config.values_unlocked(): - config.update(ext_config) - -print(config) -``` - -Please see the [Config documentation](../modules/configs.html) for more information on Config objects. - - - -## Observation Networks Example - -The example script in `examples/simple_obs_net.py` discusses how to construct networks for taking observation dictionaries as input, and that produce dictionaries as outputs. See [this section](../modules/models.html#observation-encoder-and-decoder) in the documentation for more details. - - - -## Custom Observation Modalities Example - -The example script in `examples/add_new_modality.py` discusses how to (a) modify pre-existing observation modalities, and (b) add your own custom observation modalities with custom encoding. See [this section](../modules/models.html#observation-encoder-and-decoder) in the documentation for more details about the encoding and decoding process. \ No newline at end of file diff --git a/docs/introduction/features.md b/docs/introduction/features.md deleted file mode 100644 index 3aa8248f..00000000 --- a/docs/introduction/features.md +++ /dev/null @@ -1,41 +0,0 @@ -# Features Overview - -## Summary - -In this section, we briefly summarize some key features and where you should look to learn more about them. - -1. **Datasets supported by robomimic** - - See a list of supported datasets [here](./features.html#supported-datasets).

-2. **Visualizing datasets** - - Learn how to visualize dataset trajectories [here](./datasets.html#view-dataset-structure-and-videos).

-3. **Reproducing paper experiments** - - Easily reproduce experiments from the following papers - - robomimic: [here](./results.html) - - MOMART: [here](https://sites.google.com/view/il-for-mm/datasets)

-4. **Making your own dataset** - - Learn how to make your own collected dataset compatible with this repository [here](./datasets.html#dataset-structure). - - Note that **all datasets collected through robosuite are also readily compatible** (see [here](./datasets.html#converting-robosuite-hdf5-datasets)).

-5. **Using filter keys to easily train on subsets of a dataset** - - See [this section](./datasets.html#filter-keys-and-train-valid-splits) on how to use filter keys.

-6. **Running hyperparameter scans easily** - - See [this guide](./advanced.html#using-the-hyperparameter-helper-to-launch-runs) on running hyperparameter scans.

-7. **Using pretrained models in the model zoo** - - See [this link](./model_zoo.html) to download and use pretrained models.

-8. **Getting familiar with configs** - - Learn about how configs work [here](../modules/configs.html).

-9. **Getting familiar with operations over tensor collections** - - Learn about using useful tensor utilities [here](../modules/utils.html#tensorutils).

-10. **Creating your own observation modalities** - - Learn how to make your own observation modalities and process them with custom network architectures [here](../modules/observations.html).

-11. **Creating your own algorithm** - - Learn how to implement your own learning algorithm [here](../modules/algorithms.html#building-your-own-algorithm).

- -## Supported Datasets - -This is a list of datasets that we currently support, along with links on how to work with them. This list will be expanded as more datasets are made compatible with robomimic. - -- [robomimic](./results.html#downloading-released-datasets) -- [robosuite](./datasets.html#converting-robosuite-hdf5-datasets) -- [MOMART](./datasets.html#momart-datasets) -- [D4RL](./results.html#d4rl) -- [RoboTurk Pilot](./datasets.html#roboturk-pilot-datasets) \ No newline at end of file diff --git a/docs/introduction/getting_started.md b/docs/introduction/getting_started.md new file mode 100644 index 00000000..0279cc6b --- /dev/null +++ b/docs/introduction/getting_started.md @@ -0,0 +1,93 @@ +# Getting Started + +## Running experiments +We begin with a quick tutorial on downloading datasets and running experiments. + +Before beginning, make sure you are at the base repo path: +```sh +$ cd {/path/to/robomimic} +``` + +### Step 1: Download dataset + +Download the robosuite **Lift (PH)** dataset (see [this link](../datasets/robomimic_v0.1.html#proficient-human-ph) for more information on this dataset): +```sh +$ python robomimic/scripts/download_datasets.py --tasks lift --dataset_types ph +``` + +The dataset can be found at `datasets/lift/ph/low_dim.hdf5` + +### Step 2: Launch experiment + +Now, we will run an experiment using `train.py`. In this case we would like to run behavior cloning (BC) for the lift dataset we just downloaded. + +```sh +$ python robomimic/scripts/train.py --config robomimic/exps/templates/bc.json --dataset datasets/lift/ph/low_dim.hdf5 --debug +``` + +
+

Running quick sanity check experiments

+ +Make sure to add the `--debug` flag to your experiments as a sanity check that your implementation works. + +
+ +
+

Warning!

+ +This example [requires robosuite](./installation.html#robosuite) to be installed (under the `offline_study` branch), but it can be run without robosuite by disabling rollouts in `robomimic/exps/templates/bc.json`: simply change the `experiment.rollout.enabled` flag to `false`. + +
+ +### Step 3: View experiment results + +After the script finishes, we can check the training outputs in the directory `bc_trained_models/test`. +Experiment outputs comprise the following: +``` +config.json # config used for this experiment +logs/ # experiment log files + log.txt # terminal output + tb/ # tensorboard logs +videos/ # videos of robot rollouts during training +models/ # saved model checkpoints +``` + +The experiment results can be viewed using tensorboard: +```sh +$ tensorboard --logdir bc_trained_models/test --bind_all +``` + +## Next steps + +Please refer to the remaining documentation sections. Some helpful suggestions on pages to view next: +- [Configuring and Launching Training Runs](../tutorials/configs.html) +- [Logging and Viewing Training Results](../tutorials/viewing_results.html) +- [Running Hyperparameter Scans](../tutorials/hyperparam_scan.html) +- [Overview of Datasets](../datasets/overview.html) +- [Dataset Contents and Visualization](../tutorials/dataset_contents.html) +- [Overview of Modules](../modules/overview.html) \ No newline at end of file diff --git a/docs/introduction/implemented_algorithms.md b/docs/introduction/implemented_algorithms.md new file mode 100644 index 00000000..60cd04d5 --- /dev/null +++ b/docs/introduction/implemented_algorithms.md @@ -0,0 +1,34 @@ +# Implemented Algorithms + +**robomimic** includes several high-quality implementations of offline learning algorithms, and offers tools to easily build [your own learning algorithms](../tutorials/custom_algorithms.html). +## Imitation Learning + +### BC + +- Vanilla Behavioral Cloning (see [this paper](https://papers.nips.cc/paper/1988/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf)), consisting of simple supervised regression from observations to actions. Implemented in the `BC` class in `algo/bc.py`, along with some variants such as `BC_GMM` (stochastic GMM policy) and `BC_VAE` (stochastic VAE policy) + +### BC-RNN + +- Behavioral Cloning with an RNN network. Implemented in the `BC_RNN` and `BC_RNN_GMM` (recurrent GMM policy) classes in `algo/bc.py`. + +### HBC + +- Hierarchical Behavioral Cloning - the implementation is largely based off of [this paper](https://arxiv.org/abs/2003.06085). Implemented in the `HBC` class in `algo/hbc.py`. + +## Offline Reinforcement Learning + +### IRIS + +- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/1911.05321). Implemented in the `IRIS` class in `algo/iris.py`. + +### BCQ + +- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/1812.02900). Implemented in the `BCQ` class in `algo/bcq.py`. + +### CQL + +- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/2006.04779). Implemented in the `CQL` class in `algo/cql.py`. + +### TD3-BC + +- A recent algorithm from [this paper](https://arxiv.org/abs/2106.06860). We implemented it as an example (see section below on building your own algorithm). Implemented in the `TD3_BC` class in `algo/td3_bc.py`. diff --git a/docs/introduction/installation.md b/docs/introduction/installation.md index 5805f31b..3480a900 100644 --- a/docs/introduction/installation.md +++ b/docs/introduction/installation.md @@ -1,100 +1,179 @@ # Installation -**robomimic** officially supports Mac OS X and Linux on Python 3. We strongly recommend using a virtual environment with [conda](https://www.anaconda.com/products/individual) ([virtualenv](https://virtualenv.pypa.io/en/latest/) is also an acceptable alternative). To get started, create a virtual env (we use conda in our examples below). +## Requirements + +- Mac OS X or Linux machine +- Python >= 3.6 (recommended 3.7.9) +- [conda](https://www.anaconda.com/products/individual) + - [virtualenv](https://virtualenv.pypa.io/en/latest/) is also an acceptable alternative, but we assume you have conda installed in our examples below + +## Install robomimic + +
+

1. Create and activate conda environment

```sh -# create a python 3.7 virtual environment $ conda create -n robomimic_venv python=3.7.9 -# activate virtual env $ conda activate robomimic_venv ``` -Next, install [PyTorch](https://pytorch.org/) (in our example below, we chose to use version `1.6.0` with CUDA `10.2`). You can omit the `cudatoolkit=10.2` if you're on a machine without a CUDA-capable GPU (such as a Macbook). +
+ +
+

2. Install PyTorch

+ +[PyTorch](https://pytorch.org/) reference + +
+ Option 1: Mac +

+ +```sh +# Can change pytorch, torchvision versions +# We don't install cudatoolkit since Mac does not have NVIDIA GPU +$ conda install pytorch==1.6.0 torchvision==0.7.0 -c pytorch +``` + +

+
+ +
+ Option 2: Linux +

```sh -# install pytorch with specific version of cuda +# Can change pytorch, torchvision versions $ conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch ``` -Next, we'll install the repository and its requirements. We provide two options - installing from source, and installing from pip. **We strongly recommend installing from source**, as it allows greater flexibility and easier access to scripts and examples. +

+
+ +
+ -## Install from source (preferred) +
+

3. Install robomimic

-First, clone the repository from github. +
+ Option 1: Install from source (recommended) +

```sh -# clone the repository +$ cd $ git clone https://github.com/ARISE-Initiative/robomimic.git $ cd robomimic +$ pip install -e . ``` -Next, install the repository in editable mode with pip. +

+
+ +
+ Option 2: Install via pip +

```sh -# install such that changes to source code will be reflected directly in the installation -$ pip install -e . +$ pip install robomimic ``` -To run a quick test, without any dependence on simulators, run the following example +

+
-```sh -$ python examples/simple_train_loop.py -``` +
+ +
+

Warning! Additional dependencies might be required

-For maximum functionality though, we also recommend installing [robosuite](https://robosuite.ai/) -- see the section on simulators below. +This is all you need for using the suite of algorithms and utilities packaged with robomimic. However, to use our demonstration datasets, you may need additional dependencies. Please see the [datasets page](../datasets/overview.html) for more information on downloading datasets and reproducing experiments, and see [the simulators section below](installation.html#install-simulators). +
-## Install from pip -While not preferred, the repository can also be installed directly via pip. +# Optional Installations + +## Downloading datasets and reproducing experiments + +See the [datasets page](../datasets/overview.html) for more information on downloading datasets and reproducing experiments. + +## Install simulators + +If you would like to run robomimic examples and work with released datasets, please install the following simulators: + +
+ robosuite +

+ Required for running most robomimic examples and released datasets. Compatible with robosuite v1.2+. Install via: ```sh -$ pip install robomimic +# From source (recommended) +$ cd +$ git clone https://github.com/ARISE-Initiative/robosuite.git +$ cd robosuite +$ pip install -r requirements.txt +OR +# Via pip +$ pip install robosuite ``` -## Install simulators +**(Optional)** to use our released datasets and reproduce our experiments, switch to our `offline_study` branch (requires installing robosuite from source): + +```sh +git checkout offline_study +``` -While the **robomimic** repository does not depend on particular simulators, installing the following simulators is strongly encouraged, in order to run the examples provided with the repository and work with released datasets. +

+

mujoco-py dependency!

-### Robosuite +Robosuite requires [mujoco-py](https://github.com/openai/mujoco-py). If you are on an Ubuntu machine with a GPU, you should make sure that the `GPU` version of `mujoco-py` gets built, so that image rendering is fast (crucial for working with image datasets!). -Most of our examples and released datasets use [robosuite](https://robosuite.ai/), so we strongly recommend installing it. Install it using [the instructions here](https://robosuite.ai/docs/installation.html), and once again, we recommend installing from source. While the repository is compatible with robosuite `v1.2+`, switch to the `offline_study` branch (by running `git checkout offline_study` in the `robosuite` root directory) in order to easily work with our released datasets and reproduce our experiments. +An easy way to ensure this is to clone the repository, change [this line](https://github.com/openai/mujoco-py/blob/4830435a169c1f3e3b5f9b58a7c3d9c39bdf4acb/mujoco_py/builder.py#L74) to `Builder = LinuxGPUExtensionBuilder`, and install from source by running `pip install -e .` in the `mujoco-py` root directory. -**Note:** robosuite has a dependency on [mujoco-py](https://github.com/openai/mujoco-py). If you are on an Ubuntu machine with a GPU, you should make sure that the `GPU` version of `mujoco-py` gets built, so that image rendering is fast (this is extremely important for working with image datasets). An easy way to ensure this is to clone the repository, change [this line](https://github.com/openai/mujoco-py/blob/4830435a169c1f3e3b5f9b58a7c3d9c39bdf4acb/mujoco_py/builder.py#L74) to `Builder = LinuxGPUExtensionBuilder`, and install from source by running `pip install -e .` in the `mujoco-py` root directory. +
-### D4RL +

+
-We also have examples to run some of our algorithms on the [D4RL](https://arxiv.org/abs/2004.07219) datasets. Follow the instructions [here](https://github.com/rail-berkeley/d4rl) to install them, in order to reproduce our results or run further evaluations on these datasets. -## Test your installation +
+ D4RL +

+ +Useful for running some of our algorithms on the [D4RL](https://arxiv.org/abs/2004.07219) datasets. + +Install via the instructions [here](https://github.com/rail-berkeley/d4rl). + +

+
-To run a quick test, run the following script (see the [Getting Started](./quickstart.html#run-a-quick-example) section for more information). +## Test your installation +This assumes you have installed robomimic from source. + +Run a quick debugging (dummy) training loop to make sure robomimic is installed correctly: ```sh +$ cd $ python examples/train_bc_rnn.py --debug ``` -To run a much more thorough test of several algorithms and scripts, navigate to the `tests` directory and run the following command. **Warning: this script may take several minutes to finish.** - +Run a much more thorough test of several algorithms and scripts (**Warning: this script may take several minutes to finish!**): ```sh +$ cd /tests $ bash test.sh ``` -## Installing released datasets +To run some easy examples, see the [Getting Started](./getting_started.html) section. -To download and get started with the suite of released datasets, please see [this section](./results.html#downloading-released-datasets). +## Install documentation dependencies -## Installation for generating docs - -If you plan to contribute to the repository and add new features, you may want to install additional requirements required to build the documentation locally (in case the docs need to be updated). +If you plan to contribute to the repository and add new features, you must install the additional requirements required to build the documentation locally: ```sh $ pip install -r requirements-docs.txt ``` -Then, you can test generating the documentation and viewing it locally in a web browser. Run the following commands to generate documentation. - +You can test generating the documentation and viewing it locally in a web browser: ```sh -$ cd docs/ +$ cd /docs $ make clean $ make apidoc $ make html diff --git a/docs/introduction/overview.md b/docs/introduction/overview.md index cb257b83..cd14129b 100644 --- a/docs/introduction/overview.md +++ b/docs/introduction/overview.md @@ -11,43 +11,141 @@

-**robomimic** is a framework for robot learning from demonstration. It offers a broad set of demonstration datasets collected on robot manipulation domains, and learning algorithms to learn from these datasets. This project is part of the broader [Advancing Robot Intelligence through Simulated Environments (ARISE) Initiative](https://github.com/ARISE-Initiative), with the aim of lowering the barriers of entry for cutting-edge research at the intersection of AI and Robotics. +**robomimic** is a framework for robot learning from demonstration. +It offers a broad set of demonstration datasets collected on robot manipulation domains and offline learning algorithms to learn from these datasets. +**robomimic** aims to make robot learning broadly *accessible* and *reproducible*, allowing researchers and practitioners to benchmark tasks and algorithms fairly and to develop the next generation of robot learning algorithms. -Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. The overarching goal of **robomimic** is to provide researchers and practitioners with: +## Core Features -- a **standardized set of large demonstration datasets** across several benchmarking tasks to facilitate fair comparisons, with a focus on learning from human-provided demonstrations (see [this link](./features.html#supported-datasets) for a list of supported datasets) -- **high-quality implementations of several learning algorithms** for training closed-loop policies from offline datasets to make reproducing results easy and lower the barrier to entry -- a **modular design** that offers great flexibility in extending algorithms and designing new algorithms -This release of **robomimic** contains seven offline learning [algorithms](../modules/algorithms.html) and standardized [datasets](./results.html) collected across five simulated and three real-world multi-stage manipulation tasks of varying complexity. We highlight some features below (for a more thorough list of features, see [this link](./features.html#features-overview)): + -- **standardized datasets:** a set of datasets collected from different sources (single proficient human, multiple humans, and machine-generated) across several simulated and real-world tasks, along with a plug-and-play [Dataset class](../modules/datasets.html) to easily use the datasets outside of this project -- **algorithm implementations:** several high-quality implementations of offline learning algorithms, including BC, BC-RNN, HBC, IRIS, BCQ, CQL, and TD3-BC -- **multiple observation spaces:** support for learning both low-dimensional and visuomotor policies, with support for observation tensor dictionaries throughout the codebase, making it easy to specify different subsets of observations to train a policy. This includes a set of useful tensor utilities to work with nested dictionaries of torch Tensors and numpy arrays. -- **visualization utilities:** utilities for visualizing demonstration data, playing back actions, visualizing trained policies, and collecting new datasets using trained policies -- **train launching utilities:** utilities for easily running hyperparameter sweeps, enabled by a flexible [Config](../modules/configs.html) management system +

+ +

-## Contributing to robomimic + + + + +## Reproducing benchmarks + +The robomimic framework also makes reproducing the results from different benchmarks and datasets easy. See the [datasets page](../datasets/overview.html) for more information on downloading datasets and reproducing experiments. ## Troubleshooting Please see the [troubleshooting](../miscellaneous/troubleshooting.html) section for common fixes, or [submit an issue](https://github.com/ARISE-Initiative/robomimic/issues) on our github page. -## Reproducing study results - -The **robomimic** framework also makes reproducing the results from this [study](https://arise-initiative.github.io/robomimic-web/study.) easy. See the [reproducing results documentation](./results.html) for more information. +## Contributing to robomimic +This project is part of the broader [Advancing Robot Intelligence through Simulated Environments (ARISE) Initiative](https://github.com/ARISE-Initiative), with the aim of lowering the barriers of entry for cutting-edge research at the intersection of AI and Robotics. +The project originally began development in late 2018 by researchers in the [Stanford Vision and Learning Lab](http://svl.stanford.edu/) (SVL). +Now it is actively maintained and used for robotics research projects across multiple labs. +We welcome community contributions to this project. +For details please check our [contributing guidelines](../miscellaneous/contributing.html). -## Citations +## Citation Please cite [this paper](https://arxiv.org/abs/2108.03298) if you use this framework in your work: -``` +```bibtex @inproceedings{robomimic2021, title={What Matters in Learning from Offline Human Demonstrations for Robot Manipulation}, author={Ajay Mandlekar and Danfei Xu and Josiah Wong and Soroush Nasiriany and Chen Wang and Rohun Kulkarni and Li Fei-Fei and Silvio Savarese and Yuke Zhu and Roberto Mart\'{i}n-Mart\'{i}n}, - booktitle={arXiv preprint arXiv:2108.03298}, + booktitle={Conference on Robot Learning (CoRL)}, year={2021} } ``` \ No newline at end of file diff --git a/docs/introduction/quickstart.md b/docs/introduction/quickstart.md deleted file mode 100644 index 3ac9efd3..00000000 --- a/docs/introduction/quickstart.md +++ /dev/null @@ -1,110 +0,0 @@ -# Getting Started - -This section discusses how to get started with the robomimic repository, by providing examples of how to train and evaluate models. - -## Training Models - -This section discusses how models can be trained. - -**Note:** These examples [require robosuite](./installation.html#robosuite) to be installed, but they can run without robosuite by disabling rollouts in `robomimic/configs/base_config.py`, `robomimic/exps/templates/bc.json`, and `examples/train_bc_rnn.py`. - -### Run a quick example - -To see a quick example of a training run, along with the outputs, run the `train_bc_rnn.py` script in the `examples` folder (the `--debug` flag is used to ensure the training run only takes a few seconds). - -```sh -$ python train_bc_rnn.py --debug -``` - -The default dataset used is the one in `tests/assets/test.hdf5` and the default directory where results are saved for the example training run is in `tests/tmp_model_dir`. Both can be overridden by passing arguments to the above script. - -**Warning:** If you are using the default dataset (and rollouts are enabled), please make sure that robosuite is on the `offline_study` branch of robosuite. - -After the script finishes, you can check the training outputs in the output directory (`tests/tmp_model_dir/bc_rnn_example` by default). See the "Viewing Training Results" section below for more information on interpreting the output. - -### Ways to launch training runs - -In this section, we describe the different ways to launch training runs. - -#### Using a config json (preferred) - -One way is to use the `train.py` script, and pass a config json via the `--config` argument. The dataset can be specified by setting the `data` attribute of the `train` section of the config json, or specified via the `--dataset` argument. The example below runs a default template json for the BC algorithm. **This is the preferred way to launch training runs.** - -```sh -$ python train.py --config ../exps/templates/bc.json --dataset ../../tests/assets/test.hdf5 -``` - -Please see the [hyperparameter helper docs](./advanced.html#using-the-hyperparameter-helper-to-launch-runs) to see how to easily generate json configs for launching training runs. - -#### Constructing a config object in code - -Another way to launch a training run is to make a default config (with a line like `config = config_factory(algo_name="bc")`), modify the config in python code, and then call the train function, like in the `examples/train_bc_rnn.py` script. - -```python -import robomimic -import robomimic.utils.torch_utils as TorchUtils -from robomimic.config import config_factory -from robomimic.scripts.train import train - -# make default BC config -config = config_factory(algo_name="bc") - -# set config attributes here that you would like to update -config.experiment.name = "bc_rnn_example" -config.train.data = "/path/to/dataset.hdf5" -config.train.output_dir = "/path/to/desired/output_dir" -config.train.batch_size = 256 -config.train.num_epochs = 500 -config.algo.gmm.enabled = False - -# get torch device -device = TorchUtils.get_torch_device(try_to_use_cuda=True) - -# launch training run -train(config, device=device) -``` - -#### Directly modifying the config class source code (avoid this) - -Technically, a third way to launch a training run is to directly modify the relevant `Config` classes (such as `config/bc_config.py` and `config/base_config.py`) and then run `train.py` but **this is not recommended**, especially if using the codebase with version control (e.g. git). Modifying these files modifies the default settings, and it's easy to forget that these changes were made, or unintentionally commit these changes so that they become the new defaults. For this reason, **we recommend never modifying the config classes directly, unless you are modifying an algorithm and adding new config keys**. - -To learn more about the `Config` class, read the [Configs documentation](../modules/configs.html), or look at the source code. - - -## Viewing Training Results - -This section discusses how to view and interpret the results of training runs. - -### Logs, Models, and Rollout Videos - -Training runs will output results to the directory specified by `config.train.output_dir`, under a folder with the experiment name (specified by `config.experiment.name`). This folder contains a directory named by a timestamp (e.g. `20210708174935`) for every training run with this same name, and within that directory, there should be three folders - `logs`, `models`, and `videos`. - -The `logs` directory will contain everything printed to stdout in `log.txt` (only if `config.experiment.logging.terminal_output_to_txt` is set to `True`), and a `tb` folder containing tensorboard logs (only if `config.experiment.logging.log_tb` is set to True). You can visualize the tensorboard results by using a command like the below, and then opening the link printed on the terminal in a web browser. The tensorboard logs have convenient sections for rollout evaluations, quantities logged during training, quantities logged during validation, and timing statistics for different parts of the training process (in minutes). - -```sh -$ tensorboard --logdir /path/to/output/dir --bind_all -``` - -The `models` directory contains saved model checkpoints. These can be used by the `run_trained_agent.py` script (more on this below). The `config.experiment.save` portion of the config controls if and when models are saved during training. - -The `videos` directory contains evaluation rollout videos collected during training, when evaluating trained models in the environment (only if `config.experiment.render_video` is set to `True`). The `config.experiment.rollout` portion of the config controls how often rollouts happen, and how many happen. - -### Evaluating Trained Policies - -Saved policy checkpoints in the `models` directory can be evaluated using the `run_trained_agent.py` script. The below example can be used to evaluate a policy with 50 rollouts of maximum horizon 400 and save the rollouts to a video. The agentview and wrist camera images are used to render video frames. - -```sh -$ python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --video_path /path/to/output.mp4 --camera_names agentview robot0_eye_in_hand -``` - -The 50 agent rollouts can also be written to a new dataset hdf5. - -```sh -python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --dataset_path /path/to/output.hdf5 --dataset_obs -``` - -Instead of storing the observations, which can consist of high-dimensional images, they can be excluded by omitting the `--dataset_obs` flag. The observations can be extracted using the `dataset_states_to_obs.hdf5` script (see the Datasets documentation for more information on this). - -```sh -python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --dataset_path /path/to/output.hdf5 -``` diff --git a/docs/miscellaneous/contributing.md b/docs/miscellaneous/contributing.md index aef40460..b412b3b2 100644 --- a/docs/miscellaneous/contributing.md +++ b/docs/miscellaneous/contributing.md @@ -1,7 +1,5 @@ # Contributing Guidelines -We are so happy to see you reading this page! - Our team wholeheartedly welcomes the community to contribute to robomimic. Contributions from members of the community will help ensure the long-term success of this project. Before you plan to make contributions, here are important resources to get started with: - Read the robomimic [documentation](https://arise-initiative.github.io/robomimic-web/docs/overview.html) and [paper](https://arxiv.org/abs/2108.03298) @@ -41,7 +39,7 @@ We value readability and adhere to the following coding conventions: We also list additional suggested contributing guidelines that we adhered to during development. -- When creating new networks (e.g. subclasses of `Module` in `models/base_nets.py`), always sub-modules into a property called `self.nets`, and if there is more than one sub-module, make it a module collection (such as a `torch.nn.ModuleDict`). This is to ensure that the pattern `model.to(device)` works as expected with multiple levels of nested torch modules. As an example of nesting, see the `_create_networks` function in the `VAE` class (`models/vae_nets.py`) and the `MIMO_MLP` class (`models/obs_nets.py`). +- When creating new networks (e.g. subclasses of `Module` in `models/base_nets.py`), always put sub-modules into a property called `self.nets`, and if there is more than one sub-module, make it a module collection (such as a `torch.nn.ModuleDict`). This is to ensure that the pattern `model.to(device)` works as expected with multiple levels of nested torch modules. As an example of nesting, see the `_create_networks` function in the `VAE` class (`models/vae_nets.py`) and the `MIMO_MLP` class (`models/obs_nets.py`). - Do not use default mutable arguments -- they can lead to terrible bugs and unexpected behavior (see [this link](https://florimond.dev/blog/articles/2018/08/python-mutable-defaults-are-the-source-of-all-evil/) for more information). For this reason, in functions that expect optional dictionaries and lists (for example, the `core_kwargs` argument in the `obs_encoder_factory` function, or the `layer_dims` argument in the `MLP` class constructor), we use a default argument of `core_kwargs=None` or an empty tuple (since tuples are immutable) `layer_dims=()`. diff --git a/docs/miscellaneous/references.md b/docs/miscellaneous/references.md index 325c1da9..4b654153 100644 --- a/docs/miscellaneous/references.md +++ b/docs/miscellaneous/references.md @@ -2,15 +2,20 @@ A list of projects and papers that use **robomimic**. If you would like to add your work to this list, please send the paper or project information to Ajay Mandlekar (). -## Reinforcement Learning +## 2022 -- [Deep Affordance Foresight: Planning Through What Can Be Done in the Future](https://arxiv.org/abs/2011.08424) Danfei Xu, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, Li Fei-Fei +- [Imitation Learning by Estimating Expertise of Demonstrators](https://arxiv.org/abs/2202.01288) Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani -## Imitation Learning and Batch (Offline) Reinforcement Learning +## 2021 +- [RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning](https://arxiv.org/abs/2111.02767) Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev - [Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation](https://arxiv.org/abs/2112.05251) Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín - [Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control](https://arxiv.org/abs/2103.00375) Chen Wang, Rui Wang, Danfei Xu, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese -- [Human-in-the-Loop Imitation Learning using Remote Teleoperation](https://arxiv.org/abs/2012.06733) Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese +- [Deep Affordance Foresight: Planning Through What Can Be Done in the Future](https://arxiv.org/abs/2011.08424) Danfei Xu, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, Li Fei-Fei - [Learning Multi-Arm Manipulation Through Collaborative Teleoperation](https://arxiv.org/abs/2012.06738) Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese - [Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations](https://arxiv.org/abs/2003.06085) Ajay Mandlekar\*, Danfei Xu\*, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei -- [IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data](https://arxiv.org/abs/1911.05321). Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox \ No newline at end of file + +## 2020 + +- [Human-in-the-Loop Imitation Learning using Remote Teleoperation](https://arxiv.org/abs/2012.06733) Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese +- [IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data](https://arxiv.org/abs/1911.05321). Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox diff --git a/docs/introduction/model_zoo.md b/docs/model_zoo/robomimic_v0.1.md similarity index 77% rename from docs/introduction/model_zoo.md rename to docs/model_zoo/robomimic_v0.1.md index 39ce9533..8021a094 100644 --- a/docs/introduction/model_zoo.md +++ b/docs/model_zoo/robomimic_v0.1.md @@ -1,8 +1,20 @@ -# Using the Model Zoo +# robomimic-v0.1 -This section provides several proficient trained policy models that can be downloaded and used as-is. See the ["Evaluating Trained Policies"](./quickstart.html#evaluating-trained-policies) section for instructions on loading these agents. The model zoo will be updated over time to include more tasks and policies. All success rates listed below are approximate - they may vary. +We provide links below to several pretrained models that were trained with robomimic-v0.1 for our CoRL 2021 study. All success rates listed below are approximate - they may vary. -**Warning:** When using these trained models, please make sure that robosuite is on the `offline_study` branch of robosuite. +
+

Note: see tutorial on using these models

+ +See the ["Using Pretrained Models"](../tutorials/using_pretrained_models.html) tutorial for instructions on using these models. + +
+ +
+

Warning: use correct robosuite branch!

+ +When using these trained models, please make sure that robosuite is on the `offline_study` branch of robosuite. + +
## Proficient-Human (PH) diff --git a/docs/modules/algorithms.md b/docs/modules/algorithms.md index 8fe954ac..dd0bfd4d 100644 --- a/docs/modules/algorithms.md +++ b/docs/modules/algorithms.md @@ -181,396 +181,10 @@ for step_i in range(horizon): ## Implemented Algorithms -### BC - -- Vanilla Behavioral Cloning (see [this paper](https://papers.nips.cc/paper/1988/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf)), consisting of simple supervised regression from observations to actions. Implemented in the `BC` class in `algo/bc.py`, along with some variants such as `BC_GMM` (stochastic GMM policy) and `BC_VAE` (stochastic VAE policy) - -### BC-RNN - -- Behavioral Cloning with an RNN network. Implemented in the `BC_RNN` and `BC_RNN_GMM` (recurrent GMM policy) classes in `algo/bc.py`. - -### HBC - -- Hierarchical Behavioral Cloning - the implementation is largely based off of [this paper](https://arxiv.org/abs/2003.06085). Implemented in the `HBC` class in `algo/hbc.py`. - -### IRIS - -- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/1911.05321). Implemented in the `IRIS` class in `algo/iris.py`. - -### BCQ - -- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/1812.02900). Implemented in the `BCQ` class in `algo/bcq.py`. - -### CQL - -- A recent batch offline RL algorithm from [this paper](https://arxiv.org/abs/2006.04779). Implemented in the `CQL` class in `algo/cql.py`. - -### TD3-BC - -- A recent algorithm from [this paper](https://arxiv.org/abs/2106.06860). We implemented it as an example (see section below on building your own algorithm). Implemented in the `TD3_BC` class in `algo/td3_bc.py`. +Refer [here](../introduction/implemented_algorithms.html) for the list of algorithms currently implemented in robomimic ## Building your own Algorithm -In this section, we walk through an example of implementing a custom algorithm, to show how easy it is to extend the functionality in the repository. We choose to implement the recently proposed [TD3-BC](https://arxiv.org/abs/2106.06860) algorithm. - -This requires implementing two new files - `algo/td3_bc.py` (which contains the `Algo` subclass implementation) and `config/td3_bc_config.py` (which contains the `Config` subclass implementation). We also make sure to add the line `from robomimic.algo.td3_bc import TD3_BC` to `algo/__init__.py` and `from robomimicL.config.td3_bc_config import TD3_BCConfig` to `config/__init__.py` to `config/__init__.py` to make sure the `Algo` and `Config` subclasses can be found. - -We first describe the config implementation - we implement a `TD3_BCConfig` config class that subclasses from `BaseConfig`. Importantly, we set the class variable `ALGO_NAME = "td3_bc"` to register this config under that algo name. We implement the `algo_config` function to populate `config.algo` with the keys needed for the algorithm - it is extremely similar to the `BCQConfig` implementation. Portions of the code are reproduced below. - -```python -class TD3_BCConfig(BaseConfig): - ALGO_NAME = "td3_bc" - - def algo_config(self): - # optimization parameters - self.algo.optim_params.critic.learning_rate.initial = 3e-4 # critic learning rate - self.algo.optim_params.critic.learning_rate.decay_factor = 0.1 # factor to decay LR by (if epoch schedule non-empty) - self.algo.optim_params.critic.learning_rate.epoch_schedule = [] # epochs where LR decay occurs - self.algo.optim_params.critic.regularization.L2 = 0.00 # L2 regularization strength - self.algo.optim_params.critic.start_epoch = -1 # number of epochs before starting critic training (-1 means start right away) - self.algo.optim_params.critic.end_epoch = -1 # number of epochs before ending critic training (-1 means start right away) - - self.algo.optim_params.actor.learning_rate.initial = 3e-4 # actor learning rate - self.algo.optim_params.actor.learning_rate.decay_factor = 0.1 # factor to decay LR by (if epoch schedule non-empty) - self.algo.optim_params.actor.learning_rate.epoch_schedule = [] # epochs where LR decay occurs - self.algo.optim_params.actor.regularization.L2 = 0.00 # L2 regularization strength - self.algo.optim_params.actor.start_epoch = -1 # number of epochs before starting actor training (-1 means start right away) - self.algo.optim_params.actor.end_epoch = -1 # number of epochs before ending actor training (-1 means start right away) - - # alpha value - for weighting critic loss vs. BC loss - self.algo.alpha = 2.5 - - # target network related parameters - self.algo.discount = 0.99 # discount factor to use - self.algo.n_step = 1 # for using n-step returns in TD-updates - self.algo.target_tau = 0.005 # update rate for target networks - self.algo.infinite_horizon = False # if True, scale terminal rewards by 1 / (1 - discount) to treat as infinite horizon - - # ================== Critic Network Config =================== - self.algo.critic.use_huber = False # Huber Loss instead of L2 for critic - self.algo.critic.max_gradient_norm = None # L2 gradient clipping for critic (None to use no clipping) - self.algo.critic.value_bounds = None # optional 2-tuple to ensure lower and upper bound on value estimates - - # critic ensemble parameters (TD3 trick) - self.algo.critic.ensemble.n = 2 # number of Q networks in the ensemble - self.algo.critic.ensemble.weight = 1.0 # weighting for mixing min and max for target Q value - - self.algo.critic.layer_dims = (256, 256, 256) # size of critic MLP - - # ================== Actor Network Config =================== - - # update actor and target networks every n gradients steps for each critic gradient step - self.algo.actor.update_freq = 2 - - # exploration noise used to form target action for Q-update - clipped Gaussian noise - self.algo.actor.noise_std = 0.2 # zero-mean gaussian noise with this std is applied to actions - self.algo.actor.noise_clip = 0.5 # noise is clipped in each dimension to (-noise_clip, noise_clip) - - self.algo.actor.layer_dims = (256, 256, 256) # size of actor MLP -``` - -Usually, we only need to implement the `algo_config` function to populate `config.algo` with the keys needed for the algorithm, but we also update the `experiment_config` function and `observation_config` function to make it easier to reproduce experiments on `gym` environments from the paper. See the source file for more details. - -Now we discuss the algorithm implementation. As described in the "Initialization" section above, we first need to implement the `algo_config_to_class` method - this is straightforward since we don't have multiple variants of this algorithm. We take special care to make sure we register this function with the same algo name that we used for defining the config (`"td3_bc"`). - -```python -@register_algo_factory_func("td3_bc") -def algo_config_to_class(algo_config): - """ - Maps algo config to the TD3_BC algo class to instantiate, along with additional algo kwargs. - - Args: - algo_config (Config instance): algo config - - Returns: - algo_class: subclass of Algo - algo_kwargs (dict): dictionary of additional kwargs to pass to algorithm - """ - # only one variant of TD3_BC for now - return TD3_BC, {} -``` - -Next, we'll describe how we implement the methods outlined in the "Important Methods" section above. We omit several of the methods, since their implementation is extremely similar to the `BCQ` implementation. We start by defining the class and implementing `_create_networks`. The code uses helper functions `_create_critics` and `_create_actor` to create the critic and actor networks, as in the `BCQ` implementation. - -```python -class TD3_BC(PolicyAlgo, ValueAlgo): - def _create_networks(self): - """ - Creates networks and places them into @self.nets. - """ - self.nets = nn.ModuleDict() - - self._create_critics() - self._create_actor() - - # sync target networks at beginning of training - with torch.no_grad(): - for critic_ind in range(len(self.nets["critic"])): - TorchUtils.hard_update( - source=self.nets["critic"][critic_ind], - target=self.nets["critic_target"][critic_ind], - ) - - TorchUtils.hard_update( - source=self.nets["actor"], - target=self.nets["actor_target"], - ) - - self.nets = self.nets.float().to(self.device) - - def _create_critics(self): - critic_class = ValueNets.ActionValueNetwork - critic_args = dict( - obs_shapes=self.obs_shapes, - ac_dim=self.ac_dim, - mlp_layer_dims=self.algo_config.critic.layer_dims, - value_bounds=self.algo_config.critic.value_bounds, - goal_shapes=self.goal_shapes, - **ObsNets.obs_encoder_args_from_config(self.obs_config.encoder), - ) - - # Q network ensemble and target ensemble - self.nets["critic"] = nn.ModuleList() - self.nets["critic_target"] = nn.ModuleList() - for _ in range(self.algo_config.critic.ensemble.n): - critic = critic_class(**critic_args) - self.nets["critic"].append(critic) - - critic_target = critic_class(**critic_args) - self.nets["critic_target"].append(critic_target) - - def _create_actor(self): - actor_class = PolicyNets.ActorNetwork - actor_args = dict( - obs_shapes=self.obs_shapes, - goal_shapes=self.goal_shapes, - ac_dim=self.ac_dim, - mlp_layer_dims=self.algo_config.actor.layer_dims, - **ObsNets.obs_encoder_args_from_config(self.obs_config.encoder), - ) - - self.nets["actor"] = actor_class(**actor_args) - self.nets["actor_target"] = actor_class(**actor_args) -``` - -Next we describe the `train_on_batch` function, which implements the main training logic. The function trains the critic using the `_train_critic_on_batch` helper function, and then actor using the `_train_actor_on_batch` helper function (the actor is trained at a slower rate according to the `config.algo.actor.update_freq` config variable, as in the original author's implementation). Finally, the target network parameters are moved a little closer to the current network parameters, using `TorchUtils.soft_update`. - -```python - def train_on_batch(self, batch, epoch, validate=False): - """ - Training on a single batch of data. - - Args: - batch (dict): dictionary with torch.Tensors sampled - from a data loader and filtered by @process_batch_for_training - - epoch (int): epoch number - required by some Algos that need - to perform staged training and early stopping - - validate (bool): if True, don't perform any learning updates. - - Returns: - info (dict): dictionary of relevant inputs, outputs, and losses - that might be relevant for logging - """ - with TorchUtils.maybe_no_grad(no_grad=validate): - info = PolicyAlgo.train_on_batch(self, batch, epoch, validate=validate) - - # Critic training - no_critic_backprop = validate or (not self._check_epoch(net_name="critic", epoch=epoch)) - with TorchUtils.maybe_no_grad(no_grad=no_critic_backprop): - critic_info = self._train_critic_on_batch( - batch=batch, - epoch=epoch, - no_backprop=no_critic_backprop, - ) - info.update(critic_info) - - # update actor and target networks at lower frequency - if not no_critic_backprop: - # update counter only on critic training gradient steps - self.actor_update_counter += 1 - do_actor_update = (self.actor_update_counter % self.algo_config.actor.update_freq == 0) - - # Actor training - no_actor_backprop = validate or (not self._check_epoch(net_name="actor", epoch=epoch)) - no_actor_backprop = no_actor_backprop or (not do_actor_update) - with TorchUtils.maybe_no_grad(no_grad=no_actor_backprop): - actor_info = self._train_actor_on_batch( - batch=batch, - epoch=epoch, - no_backprop=no_actor_backprop, - ) - info.update(actor_info) - - if not no_actor_backprop: - # to match original implementation, only update target networks on - # actor gradient steps - with torch.no_grad(): - # update the target critic networks - for critic_ind in range(len(self.nets["critic"])): - TorchUtils.soft_update( - source=self.nets["critic"][critic_ind], - target=self.nets["critic_target"][critic_ind], - tau=self.algo_config.target_tau, - ) - - # update target actor network - TorchUtils.soft_update( - source=self.nets["actor"], - target=self.nets["actor_target"], - tau=self.algo_config.target_tau, - ) - - return info -``` - -Below, we show the helper functions for training the critics, to be explicit in how the Bellman backup is used to construct the TD loss. The target Q values for the TD loss are obtained in the same way as [TD3](https://arxiv.org/abs/1802.09477). - -```python - def _train_critic_on_batch(self, batch, epoch, no_backprop=False): - info = OrderedDict() - - # batch variables - s_batch = batch["obs"] - a_batch = batch["actions"] - r_batch = batch["rewards"] - ns_batch = batch["next_obs"] - goal_s_batch = batch["goal_obs"] - - # 1 if not done, 0 otherwise - done_mask_batch = 1. - batch["dones"] - info["done_masks"] = done_mask_batch - - # Bellman backup for Q-targets - q_targets = self._get_target_values( - next_states=ns_batch, - goal_states=goal_s_batch, - rewards=r_batch, - dones=done_mask_batch, - ) - info["critic/q_targets"] = q_targets - - # Train all critics using this set of targets for regression - for critic_ind, critic in enumerate(self.nets["critic"]): - critic_loss = self._compute_critic_loss( - critic=critic, - states=s_batch, - actions=a_batch, - goal_states=goal_s_batch, - q_targets=q_targets, - ) - info["critic/critic{}_loss".format(critic_ind + 1)] = critic_loss - - if not no_backprop: - critic_grad_norms = TorchUtils.backprop_for_loss( - net=self.nets["critic"][critic_ind], - optim=self.optimizers["critic"][critic_ind], - loss=critic_loss, - max_grad_norm=self.algo_config.critic.max_gradient_norm, - ) - info["critic/critic{}_grad_norms".format(critic_ind + 1)] = critic_grad_norms - - return info - - def _get_target_values(self, next_states, goal_states, rewards, dones): - """ - Helper function to get target values for training Q-function with TD-loss. - """ - - with torch.no_grad(): - # get next actions via target actor and noise - next_target_actions = self.nets["actor_target"](next_states, goal_states) - noise = ( - torch.randn_like(next_target_actions) * self.algo_config.actor.noise_std - ).clamp(-self.algo_config.actor.noise_clip, self.algo_config.actor.noise_clip) - next_actions = (next_target_actions + noise).clamp(-1.0, 1.0) - - # TD3 trick to combine max and min over all Q-ensemble estimates into single target estimates - all_value_targets = self.nets["critic_target"][0](next_states, next_actions, goal_states).reshape(-1, 1) - max_value_targets = all_value_targets - min_value_targets = all_value_targets - for critic_target in self.nets["critic_target"][1:]: - all_value_targets = critic_target(next_states, next_actions, goal_states).reshape(-1, 1) - max_value_targets = torch.max(max_value_targets, all_value_targets) - min_value_targets = torch.min(min_value_targets, all_value_targets) - value_targets = self.algo_config.critic.ensemble.weight * min_value_targets + \ - (1. - self.algo_config.critic.ensemble.weight) * max_value_targets - q_targets = rewards + dones * self.discount * value_targets - - return q_targets - - def _compute_critic_loss(self, critic, states, actions, goal_states, q_targets): - """ - Helper function to compute loss between estimated Q-values and target Q-values. - """ - q_estimated = critic(states, actions, goal_states) - if self.algo_config.critic.use_huber: - critic_loss = nn.SmoothL1Loss()(q_estimated, q_targets) - else: - critic_loss = nn.MSELoss()(q_estimated, q_targets) - return critic_loss -``` - -Next we show the helper function for training the actor, which is trained through a weighted combination of the TD3 (DDPG) and BC loss. - -```python - def _train_actor_on_batch(self, batch, epoch, no_backprop=False): - info = OrderedDict() - - # Actor loss (update with mixture of DDPG loss and BC loss) - s_batch = batch["obs"] - a_batch = batch["actions"] - goal_s_batch = batch["goal_obs"] - - # lambda mixture weight is combination of hyperparameter (alpha) and Q-value normalization - actor_actions = self.nets["actor"](s_batch, goal_s_batch) - Q_values = self.nets["critic"][0](s_batch, actor_actions, goal_s_batch) - lam = self.algo_config.alpha / Q_values.abs().mean().detach() - actor_loss = -lam * Q_values.mean() + nn.MSELoss()(actor_actions, a_batch) - info["actor/loss"] = actor_loss - - if not no_backprop: - actor_grad_norms = TorchUtils.backprop_for_loss( - net=self.nets["actor"], - optim=self.optimizers["actor"], - loss=actor_loss, - ) - info["actor/grad_norms"] = actor_grad_norms - - return info -``` - -Finally, we describe the `get_action` implementation - which is used at test-time during rollouts. The implementation is extremely simple - just query the actor for an action. - -```python - def get_action(self, obs_dict, goal_dict=None): - """ - Get policy action outputs. - - Args: - obs_dict (dict): current observation - goal_dict (dict): (optional) goal - - Returns: - action (torch.Tensor): action tensor - """ - assert not self.nets.training - - return self.nets["actor"](obs_dict=obs_dict, goal_dict=goal_dict) -``` - -That's it! See `algo/td3_bc.py` for the complete implementation, and compare it to `algo/bcq.py` to see the similarity between the two implementations. - -We can now run the `generate_config_templates.py` script to generate the json template for our new algorithm, and then run it on our desired dataset. - -```sh -# generate ../exps/templates/td3_bc.json -$ python generate_config_templates.py - -# run training -$ python train.py --config ../exps/templates/td3_bc.json --dataset /path/to/walker2d_medium_expert.hdf5 -``` - +Learn how to implement your own learning algorithm [here](../tutorials/custom_algorithms.html) diff --git a/docs/modules/configs.md b/docs/modules/configs.md index fa7e691b..b5043795 100644 --- a/docs/modules/configs.md +++ b/docs/modules/configs.md @@ -20,7 +20,7 @@ c.experiment.save.enabled = True print("save enabled: {}".format(c.experiment.save.enabled)) ``` -It's easy to go back and forth between`Config` objects and jsons as well, which is convenient when saving config objects to disk (this happens when generating new config jsons for training, and when saving the config in a model checkpoint), and loading configs from jsons. +It's easy to go back and forth between `Config` objects and jsons as well, which is convenient when saving config objects to disk (this happens when generating new config jsons for training, and when saving the config in a model checkpoint), and loading configs from jsons. ```python # dump config as a json string @@ -110,4 +110,4 @@ All `Config` objects that inherit the constructor of `BaseConfig` are _key-locke ## Minimum Example -Please see the [examples section](../introduction/examples.html#config-example) for more information on how to use the Config object, and examples on how the locking mechanism works. +Please see the [config tutorial](../tutorials/configs.html) for more information on how to use the Config object, and examples on how the locking mechanism works. diff --git a/docs/modules/environments.md b/docs/modules/environments.md index bfcc5637..a87fd17a 100644 --- a/docs/modules/environments.md +++ b/docs/modules/environments.md @@ -72,7 +72,7 @@ env = EnvUtils.create_env_from_metadata( ) ``` -The repo offers simple utility tool `robomimic/scripts/get_dataset_info.py` to view the environment metadata included in a dataset. For example: +The repo offers a simple utility tool `robomimic/scripts/get_dataset_info.py` to view the environment metadata included in a dataset. For example: ```bash $ python robomimic/scripts/get_dataset_info.py --dataset path/to/the/dataset.hdf5 diff --git a/docs/modules/models.md b/docs/modules/models.md index dc13afac..1afd01f8 100644 --- a/docs/modules/models.md +++ b/docs/modules/models.md @@ -1,6 +1,8 @@ # Models -![overview](../images/modules.png) +

+ +

**robomimic** implements a suite of reusable network modules at different abstraction levels that make creating new policy models easy. diff --git a/docs/modules/overview.md b/docs/modules/overview.md index d49da6aa..2342feff 100644 --- a/docs/modules/overview.md +++ b/docs/modules/overview.md @@ -1,19 +1,24 @@ # Overview -![overview](../images/module_overview.png) +

+ +

-The **robomimic** framework consists of several modular pieces that interact to train and evaluate a policy. A [Config](./configs.html) object is used to define all settings for a particular training run, including the hdf5 dataset that will be used to train the agent, and algorithm hyperparameters. The demonstrations in the hdf5 dataset are loaded into a [SequenceDataset](./dataset.html) object, which is used to provide minibatches for the train loop. Training consists of an [Algorithm](./algorithms.html) object that trains a set of [Models](./models.html) (including the Policy) by repeatedly sampling minibatches from the fixed, offline dataset. Every so often, the policy is evaluated in the [Environment](./environments.html) by conducting a set of rollouts. Statistics and other important information during the training process are logged to disk (e.g. tensorboard outputs, model checkpoints, and evaluation rollout videos). We also provide additional utilities in [TensorUtils](./tensor_utils.html) to work with complex observations in the form of nested tensor dictionaries. +The **robomimic** framework consists of several modular components that interact to train and evaluate a policy: +- **Experiment config**: a config object defines all settings for a training run +- **Data**: an hdf5 dataset is loaded into a dataloader, which provides minibatches to the algorithm +- **Training**: an algorithm object trains a set of models (including the policy) +- **Evaluation**: the policy is evaluated in the environment by conducting a set of rollouts +- **Logging**: experiment statistics, model checkpoints, and videos are saved to disk -The directory structure of the repository is as follows. +These modules are encapsulated by the robomimic directory structure: -- `robomimic/algo`: policy learning algorithm implementations (see [Algorithm documentation](./algorithms.html) for more information) -- `robomimic/config`: config classes (see [Config documentation](./configs.html) for more information) -- `robomimic/envs`: wrappers for environments, used during evaluation rollouts (see [Environment documentation](./environments.html) for more information) -- `robomimic/exps/templates`: config templates for each policy learning algorithm (these are auto-generated with the `robomimic/scripts/generate_config_templates.py` script) -- `robomimic/models`: network implementations (see [Models documentation](./models.html) for more information) +- `examples`: examples to better understand modular components in the codebase +- [`robomimic/algo`](./algorithms.html): policy learning algorithm implementations +- [`robomimic/config`](./configs.html): default algorithm configs +- [`robomimic/envs`](./environments.html): wrappers for environments, used during evaluation rollouts +- `robomimic/exps/templates`: config templates for experiments +- [`robomimic/models`](./models.html): network implementations - `robomimic/scripts`: main repository scripts -- `robomimic/utils`: a collection of utilities, including the [SequenceDataset](./dataset.html) class to load hdf5 datasets into a torch training pipeline, and [TensorUtils](./tensor_utils.html) to work with nested tensor dictionaries -- `tests`: test scripts for validating repository functionality -- `examples`: some simple examples to better understand modular components in the codebase (see the [Examples documentation](../introduction/examples.html) for more information) -- `docs`: files to generate sphinx documentation +- `robomimic/utils`: a collection of utilities, including the [SequenceDataset](./dataset.html) class to load datasets, and [TensorUtils](../tutorials/tensor_collections.html#tensorutils) to work with nested tensor dictionaries diff --git a/docs/tutorials/configs.md b/docs/tutorials/configs.md new file mode 100644 index 00000000..fa67a28a --- /dev/null +++ b/docs/tutorials/configs.md @@ -0,0 +1,53 @@ +# Configuring and Launching Training Runs + +Robomimic uses a centralized [configuration system](../modules/configs.html) to specify (hyper)parameters at all levels. Below we walk through two ways to configure and launching training runs. + + +#### Best practices +
+

Warning! Do not modify default configs!

+ +Do not directly modify the default configs such as `config/bc_config.py`, especially if using the codebase with version control (e.g. git). Modifying these files modifies the default settings, and it’s easy to forget that these changes were made, or unintentionally commit these changes so that they become the new defaults. + +
+ + +Please see the [Config documentation](../modules/configs.html) for more information on Config objects, and the [hyperparameter scan tutorial](../tutorials/hyperparam_scan.html) for configuring hyperparameter sweeps. + +#### 1. Using a config json (preferred) + +The preferred way to specify training parameters is to pass a config json to the main training script `train.py` via the `--config` argument. The dataset can be specified by setting the `data` attribute of the `train` section of the config json, or specified via the `--dataset` argument. The example below runs a default template json for the BC algorithm. **This is the preferred way to launch training runs.** + +```sh +$ python train.py --config ../exps/templates/bc.json --dataset ../../tests/assets/test.hdf5 +``` + +Please see the [hyperparameter helper docs](./advanced.html#using-the-hyperparameter-helper-to-launch-runs) to see how to easily generate json configs for launching training runs. + +#### 2. Constructing a config object in code + +Another way to launch a training run is to make a default config (with a line like `config = config_factory(algo_name="bc")`), modify the config in python code, and then call the train function, like in the `examples/train_bc_rnn.py` script. + +```python +import robomimic +import robomimic.utils.torch_utils as TorchUtils +from robomimic.config import config_factory +from robomimic.scripts.train import train + +# make default BC config +config = config_factory(algo_name="bc") + +# set config attributes here that you would like to update +config.experiment.name = "bc_rnn_example" +config.train.data = "/path/to/dataset.hdf5" +config.train.output_dir = "/path/to/desired/output_dir" +config.train.batch_size = 256 +config.train.num_epochs = 500 +config.algo.gmm.enabled = False + +# get torch device +device = TorchUtils.get_torch_device(try_to_use_cuda=True) + +# launch training run +train(config, device=device) +``` diff --git a/docs/tutorials/custom_algorithms.md b/docs/tutorials/custom_algorithms.md new file mode 100644 index 00000000..57e64fee --- /dev/null +++ b/docs/tutorials/custom_algorithms.md @@ -0,0 +1,374 @@ +# Implementing Custom Algorithms + +This tutorial provides an example of implementing a custom algorithm in robomimic. We choose to implement the recently proposed [TD3-BC](https://arxiv.org/abs/2106.06860) algorithm. + +This consists of the following steps: +1. Implement a custom `Config` class for TD3-BC. +2. Implement a custom `Algo` class for TD3-BC. + +## Implementing the Config class + +We will implement the config class in `config/td3_bc_config.py`. We implement a `TD3_BCConfig` config class that subclasses from `BaseConfig`. Importantly, we set the class variable `ALGO_NAME = "td3_bc"` to register this config under that algo name. We implement the `algo_config` function to populate `config.algo` with the keys needed for the algorithm - it is extremely similar to the `BCQConfig` implementation. Portions of the code are reproduced below. + +```python +class TD3_BCConfig(BaseConfig): + ALGO_NAME = "td3_bc" + + def algo_config(self): + # optimization parameters + self.algo.optim_params.critic.learning_rate.initial = 3e-4 # critic learning rate + self.algo.optim_params.critic.learning_rate.decay_factor = 0.1 # factor to decay LR by (if epoch schedule non-empty) + self.algo.optim_params.critic.learning_rate.epoch_schedule = [] # epochs where LR decay occurs + self.algo.optim_params.critic.regularization.L2 = 0.00 # L2 regularization strength + self.algo.optim_params.critic.start_epoch = -1 # number of epochs before starting critic training (-1 means start right away) + self.algo.optim_params.critic.end_epoch = -1 # number of epochs before ending critic training (-1 means start right away) + + self.algo.optim_params.actor.learning_rate.initial = 3e-4 # actor learning rate + self.algo.optim_params.actor.learning_rate.decay_factor = 0.1 # factor to decay LR by (if epoch schedule non-empty) + self.algo.optim_params.actor.learning_rate.epoch_schedule = [] # epochs where LR decay occurs + self.algo.optim_params.actor.regularization.L2 = 0.00 # L2 regularization strength + self.algo.optim_params.actor.start_epoch = -1 # number of epochs before starting actor training (-1 means start right away) + self.algo.optim_params.actor.end_epoch = -1 # number of epochs before ending actor training (-1 means start right away) + + # alpha value - for weighting critic loss vs. BC loss + self.algo.alpha = 2.5 + + # target network related parameters + self.algo.discount = 0.99 # discount factor to use + self.algo.n_step = 1 # for using n-step returns in TD-updates + self.algo.target_tau = 0.005 # update rate for target networks + self.algo.infinite_horizon = False # if True, scale terminal rewards by 1 / (1 - discount) to treat as infinite horizon + + # ================== Critic Network Config =================== + self.algo.critic.use_huber = False # Huber Loss instead of L2 for critic + self.algo.critic.max_gradient_norm = None # L2 gradient clipping for critic (None to use no clipping) + self.algo.critic.value_bounds = None # optional 2-tuple to ensure lower and upper bound on value estimates + + # critic ensemble parameters (TD3 trick) + self.algo.critic.ensemble.n = 2 # number of Q networks in the ensemble + self.algo.critic.ensemble.weight = 1.0 # weighting for mixing min and max for target Q value + + self.algo.critic.layer_dims = (256, 256, 256) # size of critic MLP + + # ================== Actor Network Config =================== + + # update actor and target networks every n gradients steps for each critic gradient step + self.algo.actor.update_freq = 2 + + # exploration noise used to form target action for Q-update - clipped Gaussian noise + self.algo.actor.noise_std = 0.2 # zero-mean gaussian noise with this std is applied to actions + self.algo.actor.noise_clip = 0.5 # noise is clipped in each dimension to (-noise_clip, noise_clip) + + self.algo.actor.layer_dims = (256, 256, 256) # size of actor MLP +``` + +Usually, we only need to implement the `algo_config` function to populate `config.algo` with the keys needed for the algorithm, but we also update the `experiment_config` function and `observation_config` function to make it easier to reproduce experiments on `gym` environments from the paper. See the source file for more details. + +Finally, we add the line `from robomimic.config.td3_bc_config import TD3_BCConfig` to `config/__init__.py` to make sure this `Config` subclass is registered by `robomimic`. + + +## Implementing the Algo class + +We will implement the algo class in `algo/td3_bc.py`. As described in the [Algorithm documentation](../modules/algorithms.html#initialization), we first need to implement the `algo_config_to_class` method - this is straightforward since we don't have multiple variants of this algorithm. We take special care to make sure we register this function with the same algo name that we used for defining the config (`"td3_bc"`). + +```python +@register_algo_factory_func("td3_bc") +def algo_config_to_class(algo_config): + """ + Maps algo config to the TD3_BC algo class to instantiate, along with additional algo kwargs. + + Args: + algo_config (Config instance): algo config + + Returns: + algo_class: subclass of Algo + algo_kwargs (dict): dictionary of additional kwargs to pass to algorithm + """ + # only one variant of TD3_BC for now + return TD3_BC, {} +``` + +Next, we'll describe how we implement the methods outlined in the [Algorithm documentation](../modules/algorithms.html#important-class-methods). We omit several of the methods, since their implementation is extremely similar to the `BCQ` implementation. We start by defining the class and implementing `_create_networks`. The code uses helper functions `_create_critics` and `_create_actor` to create the critic and actor networks, as in the `BCQ` implementation. + +```python +class TD3_BC(PolicyAlgo, ValueAlgo): + def _create_networks(self): + """ + Creates networks and places them into @self.nets. + """ + self.nets = nn.ModuleDict() + + self._create_critics() + self._create_actor() + + # sync target networks at beginning of training + with torch.no_grad(): + for critic_ind in range(len(self.nets["critic"])): + TorchUtils.hard_update( + source=self.nets["critic"][critic_ind], + target=self.nets["critic_target"][critic_ind], + ) + + TorchUtils.hard_update( + source=self.nets["actor"], + target=self.nets["actor_target"], + ) + + self.nets = self.nets.float().to(self.device) + + def _create_critics(self): + critic_class = ValueNets.ActionValueNetwork + critic_args = dict( + obs_shapes=self.obs_shapes, + ac_dim=self.ac_dim, + mlp_layer_dims=self.algo_config.critic.layer_dims, + value_bounds=self.algo_config.critic.value_bounds, + goal_shapes=self.goal_shapes, + **ObsNets.obs_encoder_args_from_config(self.obs_config.encoder), + ) + + # Q network ensemble and target ensemble + self.nets["critic"] = nn.ModuleList() + self.nets["critic_target"] = nn.ModuleList() + for _ in range(self.algo_config.critic.ensemble.n): + critic = critic_class(**critic_args) + self.nets["critic"].append(critic) + + critic_target = critic_class(**critic_args) + self.nets["critic_target"].append(critic_target) + + def _create_actor(self): + actor_class = PolicyNets.ActorNetwork + actor_args = dict( + obs_shapes=self.obs_shapes, + goal_shapes=self.goal_shapes, + ac_dim=self.ac_dim, + mlp_layer_dims=self.algo_config.actor.layer_dims, + **ObsNets.obs_encoder_args_from_config(self.obs_config.encoder), + ) + + self.nets["actor"] = actor_class(**actor_args) + self.nets["actor_target"] = actor_class(**actor_args) +``` + +Next we describe the `train_on_batch` function, which implements the main training logic. The function trains the critic using the `_train_critic_on_batch` helper function, and then actor using the `_train_actor_on_batch` helper function (the actor is trained at a slower rate according to the `config.algo.actor.update_freq` config variable, as in the original author's implementation). Finally, the target network parameters are moved a little closer to the current network parameters, using `TorchUtils.soft_update`. + +```python + def train_on_batch(self, batch, epoch, validate=False): + """ + Training on a single batch of data. + + Args: + batch (dict): dictionary with torch.Tensors sampled + from a data loader and filtered by @process_batch_for_training + + epoch (int): epoch number - required by some Algos that need + to perform staged training and early stopping + + validate (bool): if True, don't perform any learning updates. + + Returns: + info (dict): dictionary of relevant inputs, outputs, and losses + that might be relevant for logging + """ + with TorchUtils.maybe_no_grad(no_grad=validate): + info = PolicyAlgo.train_on_batch(self, batch, epoch, validate=validate) + + # Critic training + no_critic_backprop = validate or (not self._check_epoch(net_name="critic", epoch=epoch)) + with TorchUtils.maybe_no_grad(no_grad=no_critic_backprop): + critic_info = self._train_critic_on_batch( + batch=batch, + epoch=epoch, + no_backprop=no_critic_backprop, + ) + info.update(critic_info) + + # update actor and target networks at lower frequency + if not no_critic_backprop: + # update counter only on critic training gradient steps + self.actor_update_counter += 1 + do_actor_update = (self.actor_update_counter % self.algo_config.actor.update_freq == 0) + + # Actor training + no_actor_backprop = validate or (not self._check_epoch(net_name="actor", epoch=epoch)) + no_actor_backprop = no_actor_backprop or (not do_actor_update) + with TorchUtils.maybe_no_grad(no_grad=no_actor_backprop): + actor_info = self._train_actor_on_batch( + batch=batch, + epoch=epoch, + no_backprop=no_actor_backprop, + ) + info.update(actor_info) + + if not no_actor_backprop: + # to match original implementation, only update target networks on + # actor gradient steps + with torch.no_grad(): + # update the target critic networks + for critic_ind in range(len(self.nets["critic"])): + TorchUtils.soft_update( + source=self.nets["critic"][critic_ind], + target=self.nets["critic_target"][critic_ind], + tau=self.algo_config.target_tau, + ) + + # update target actor network + TorchUtils.soft_update( + source=self.nets["actor"], + target=self.nets["actor_target"], + tau=self.algo_config.target_tau, + ) + + return info +``` + +Below, we show the helper functions for training the critics, to be explicit in how the Bellman backup is used to construct the TD loss. The target Q values for the TD loss are obtained in the same way as [TD3](https://arxiv.org/abs/1802.09477). + +```python + def _train_critic_on_batch(self, batch, epoch, no_backprop=False): + info = OrderedDict() + + # batch variables + s_batch = batch["obs"] + a_batch = batch["actions"] + r_batch = batch["rewards"] + ns_batch = batch["next_obs"] + goal_s_batch = batch["goal_obs"] + + # 1 if not done, 0 otherwise + done_mask_batch = 1. - batch["dones"] + info["done_masks"] = done_mask_batch + + # Bellman backup for Q-targets + q_targets = self._get_target_values( + next_states=ns_batch, + goal_states=goal_s_batch, + rewards=r_batch, + dones=done_mask_batch, + ) + info["critic/q_targets"] = q_targets + + # Train all critics using this set of targets for regression + for critic_ind, critic in enumerate(self.nets["critic"]): + critic_loss = self._compute_critic_loss( + critic=critic, + states=s_batch, + actions=a_batch, + goal_states=goal_s_batch, + q_targets=q_targets, + ) + info["critic/critic{}_loss".format(critic_ind + 1)] = critic_loss + + if not no_backprop: + critic_grad_norms = TorchUtils.backprop_for_loss( + net=self.nets["critic"][critic_ind], + optim=self.optimizers["critic"][critic_ind], + loss=critic_loss, + max_grad_norm=self.algo_config.critic.max_gradient_norm, + ) + info["critic/critic{}_grad_norms".format(critic_ind + 1)] = critic_grad_norms + + return info + + def _get_target_values(self, next_states, goal_states, rewards, dones): + """ + Helper function to get target values for training Q-function with TD-loss. + """ + + with torch.no_grad(): + # get next actions via target actor and noise + next_target_actions = self.nets["actor_target"](next_states, goal_states) + noise = ( + torch.randn_like(next_target_actions) * self.algo_config.actor.noise_std + ).clamp(-self.algo_config.actor.noise_clip, self.algo_config.actor.noise_clip) + next_actions = (next_target_actions + noise).clamp(-1.0, 1.0) + + # TD3 trick to combine max and min over all Q-ensemble estimates into single target estimates + all_value_targets = self.nets["critic_target"][0](next_states, next_actions, goal_states).reshape(-1, 1) + max_value_targets = all_value_targets + min_value_targets = all_value_targets + for critic_target in self.nets["critic_target"][1:]: + all_value_targets = critic_target(next_states, next_actions, goal_states).reshape(-1, 1) + max_value_targets = torch.max(max_value_targets, all_value_targets) + min_value_targets = torch.min(min_value_targets, all_value_targets) + value_targets = self.algo_config.critic.ensemble.weight * min_value_targets + \ + (1. - self.algo_config.critic.ensemble.weight) * max_value_targets + q_targets = rewards + dones * self.discount * value_targets + + return q_targets + + def _compute_critic_loss(self, critic, states, actions, goal_states, q_targets): + """ + Helper function to compute loss between estimated Q-values and target Q-values. + """ + q_estimated = critic(states, actions, goal_states) + if self.algo_config.critic.use_huber: + critic_loss = nn.SmoothL1Loss()(q_estimated, q_targets) + else: + critic_loss = nn.MSELoss()(q_estimated, q_targets) + return critic_loss +``` + +Next we show the helper function for training the actor, which is trained through a weighted combination of the TD3 (DDPG) and BC loss. + +```python + def _train_actor_on_batch(self, batch, epoch, no_backprop=False): + info = OrderedDict() + + # Actor loss (update with mixture of DDPG loss and BC loss) + s_batch = batch["obs"] + a_batch = batch["actions"] + goal_s_batch = batch["goal_obs"] + + # lambda mixture weight is combination of hyperparameter (alpha) and Q-value normalization + actor_actions = self.nets["actor"](s_batch, goal_s_batch) + Q_values = self.nets["critic"][0](s_batch, actor_actions, goal_s_batch) + lam = self.algo_config.alpha / Q_values.abs().mean().detach() + actor_loss = -lam * Q_values.mean() + nn.MSELoss()(actor_actions, a_batch) + info["actor/loss"] = actor_loss + + if not no_backprop: + actor_grad_norms = TorchUtils.backprop_for_loss( + net=self.nets["actor"], + optim=self.optimizers["actor"], + loss=actor_loss, + ) + info["actor/grad_norms"] = actor_grad_norms + + return info +``` + +Finally, we describe the `get_action` implementation - which is used at test-time during rollouts. The implementation is extremely simple - just query the actor for an action. + +```python + def get_action(self, obs_dict, goal_dict=None): + """ + Get policy action outputs. + + Args: + obs_dict (dict): current observation + goal_dict (dict): (optional) goal + + Returns: + action (torch.Tensor): action tensor + """ + assert not self.nets.training + + return self.nets["actor"](obs_dict=obs_dict, goal_dict=goal_dict) +``` + +Finally, we add the line `from robomimic.algo.td3_bc import TD3_BC` to `algo/__init__.py` to make sure this `Algo` subclass is registered by `robomimic`. + +That's it! See `algo/td3_bc.py` for the complete implementation, and compare it to `algo/bcq.py` to see the similarity between the two implementations. + +We can now run the `generate_config_templates.py` script to generate the json template for our new algorithm, and then run it on our desired dataset. + +```sh +# generate ../exps/templates/td3_bc.json +$ python generate_config_templates.py + +# run training +$ python train.py --config ../exps/templates/td3_bc.json --dataset /path/to/walker2d_medium_expert.hdf5 +``` + diff --git a/docs/tutorials/dataset_contents.md b/docs/tutorials/dataset_contents.md new file mode 100644 index 00000000..1519af4c --- /dev/null +++ b/docs/tutorials/dataset_contents.md @@ -0,0 +1,57 @@ +# Dataset Contents and Visualization + +This tutorial shows how to view contents of robomimic hdf5 datasets. + +## Viewing HDF5 Dataset Structure + +
+

Note: HDF5 Dataset Structure.

+ +[This link](../datasets/overview.html#dataset-structure) shows the expected structure of each hdf5 dataset. + +
+ +The repository offers a simple utility script (`get_dataset_info.py`) to view the hdf5 dataset structure and some statistics of hdf5 datasets. The script displays the following information: + +- statistics about the trajectories (number, average length, etc.) +- the [filter keys](../datasets/overview.html#filter-keys) in the dataset +- the [environment metadata](../modules/environments.html#initialize-an-environment-from-a-dataset) in the dataset, which is used to construct the same simulator environment that the data was collected on +- the dataset structure for the first demonstration + +Pass the `--verbose` argument to print the list of demonstration keys under each filter key, and the dataset structure for all demonstrations. An example, using the small hdf5 dataset packaged with the repository in `tests/assets/test.hdf5` is shown below. + +```sh +$ python get_dataset_info.py --dataset ../../tests/assets/test.hdf5 +``` + +
+

Jupyter Notebook: A Deep Dive into Dataset Structure

+ +Any user wishing to write custom code that works with robomimic datasets should also look at the [jupyter notebook](https://github.com/ARISE-Initiative/robomimic/blob/master/examples/notebooks/datasets.ipynb) at `examples/notebooks/datasets.ipynb`, which showcases several useful python code snippets for working with robomimic hdf5 datasets. + +
+ +## Visualize Dataset Trajectories + +
+

Note: These examples are compatible with any robomimic dataset.

+ +The examples in this section use the small hdf5 dataset packaged with the repository in `tests/assets/test.hdf5`, but you can run these examples with any robomimic hdf5 dataset. If you are using the default dataset, please make sure that robosuite is on the `offline_study` branch of robosuite -- this is necessary for the playback scripts to function properly. + +
+ +Use the `playback_dataset.py` script to easily view dataset trajectories. + +```sh +# For the first 5 trajectories, load environment simulator states one-by-one, and render "agentview" and "robot0_eye_in_hand" cameras to video at /tmp/playback_dataset.mp4 +$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --render_image_names agentview robot0_eye_in_hand --video_path /tmp/playback_dataset.mp4 --n 5 + +# Directly visualize the image observations in the dataset. This is especially useful for real robot datasets where there is no simulator to use for rendering. +$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --use-obs --render_image_names agentview_image --video_path /tmp/obs_trajectory.mp4 + +# Play the dataset actions in the environment to verify that the recorded actions are reasonable. +$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --use-actions --render_image_names agentview --video_path /tmp/playback_dataset_with_actions.mp4 + +# Visualize only the initial demonstration frames. +$ python playback_dataset.py --dataset ../../tests/assets/test.hdf5 --first --render_image_names agentview --video_path /tmp/dataset_task_inits.mp4 +``` diff --git a/docs/introduction/advanced.md b/docs/tutorials/hyperparam_scan.md similarity index 56% rename from docs/introduction/advanced.md rename to docs/tutorials/hyperparam_scan.md index c4ceaa4a..43e7cb4e 100644 --- a/docs/introduction/advanced.md +++ b/docs/tutorials/hyperparam_scan.md @@ -1,23 +1,37 @@ -# Advanced Features +# Running Hyperparameter Scans -This section discusses some advanced features of **robomimic**. +We provide the `ConfigGenerator` class under `utils/hyperparam_utils.py` to easily set and sweep over hyperparameters. +**This is the preferred way to launch multiple training runs using the repository.** +Follow the steps below for running your own hyperparameter scan: +1. [Create Base Config json](#step-1-create-base-config-json) +2. [Create Config Generator](#step-2-create-config-generator) +3. [Set Hyperparameter Values](#step-3-set-hyperparameter-values) +4. [Run Hyperparameter Helper Script](#step-4-run-hyperparameter-helper-script) -## Using the Hyperparameter Helper to launch runs - -While copying an algorithm's template json from `exps/templates` and modifying it manually is a perfectly valid way to run experiments, we also provide the `hyperparam_helper.py` script to easily generate config jsons to use with the `train.py` script. **This is the preferred way to launch training runs using the repository.** It also makes hyperparameter scans a breeze. We'll walk through an example below, by reproducing sections of the `hyperparam_helper.py` script. - -The first step is to start with a base config json. A common choice is to copy one of the templates in `exps/templates` (such as `exps/templates/bc.json`) into a new folder (where additional config jsons will be generated). +## Step 1: Create Base Config json +The first step is to start with a base config json. A common choice is to copy one of the templates in `exps/templates` (such as `exps/templates/bc.json`) into a new folder (where additional config jsons will be generated). ```sh $ cp ../exps/templates/bc.json /tmp/gen_configs/base.json ``` -Sections of the config that are not involved in the scan and that do not differ from the default values in the template can also be omitted, if desired. For instance, in the example below, we don't need the `config.algo.gaussian`, `config.algo.vae`, and `config.observation` portions (since we don't sweep over them, and we didn't want to set them to anything other than the default values), so we deleted them. We also added a base experiment name (`"bc_rnn_hyper"`) and specified the dataset path (`"/tmp/test.hdf5"`). +
+

Relevant settings in base json file

+ +Sections of the config that are not involved in the scan and that do not differ from the default values in the template can also be omitted, if desired. + +
+ +We modify `/tmp/gen_configs/base.json`, adding a base experiment name (`"bc_rnn_hyper"`) and specified the dataset path (`"/tmp/test.hdf5"`). ```sh $ cat /tmp/gen_configs/base.json ``` +
+ Click to see output +

+ ```json { "algo_name": "bc", @@ -94,7 +108,16 @@ $ cat /tmp/gen_configs/base.json } ``` -The next step is to define a function that returns a `ConfigGenerator`. In our example, we would like to run the BC-RNN algorithm with an RNN horizon of 10. This requires setting `config.train.seq_length = 10` and `config.algo.rnn.enabled = True` -- we could have modified our base json file directly (as mentioned above) but we opted to set it in the generator function below. The first three calls to `add_param` do exactly this. Leaving `name=""` ensures that the experiment name is not determined by these parameter values. +

+
+ +## Step 2: Create Config Generator + +The next step is create a `ConfigGenerator` object which procedurally generates new configs (one config per unique hyperparameter combination). +We provide an example in `scripts/hyperparam_helper.py` and for the remainder of this tutorial we will follow this script step-by-step. + +First, we define a function `make_generator` that creates a `ConfigGenerator` object. +After this, our next step will be to set hyperparameter values. ```python import robomimic @@ -108,7 +131,44 @@ def make_generator(config_file, script_file): generator = HyperparamUtils.ConfigGenerator( base_config_file=config_file, script_file=script_file ) + + # next: set and sweep over hyperparameters + generator.add_param(...) # set / sweep hp1 + generator.add_param(...) # set / sweep hp2 + generator.add_param(...) # set / sweep hp3 + ... + + return generator + +def main(args): + + # make config generator + generator = make_generator( + config_file=args.config, # base config file from step 1 + script_file=args.script # explained later in step 4 + ) + + # generate jsons and script + generator.generate() +... +``` + +## Step 3: Set Hyperparameter Values + +Next, we use the `generator.add_param` function to set hyperparameter values, which takes the following arguments: +- `key`: (string) full name of config key to sweep +- `name`: (string) shorthand name for this key +- `values`: (list) values to sweep for this key +- `value_names` (list) (optional) shorthand names associated for each value in `values` +- `group`: (integer) hp group identifier. hps with same group are swept together. hps with different groups are swept as a cartesian product + +### Set fixed values +Going back to our example, we first set hyperparameters that are fixed single values. +We could have modified our base json file directly but we opted to set it in the generator function instead. +In this case, we would like to run the BC-RNN algorithm with an RNN horizon of 10. This requires setting `config.train.seq_length = 10` and `config.algo.rnn.enabled = True`. + +```python # use RNN with horizon 10 generator.add_param( key="algo.rnn.enabled", @@ -130,9 +190,28 @@ def make_generator(config_file, script_file): ) ``` -Now we define our scan - we could like to sweep the policy learning rate in [1e-3, 1e-4], whether to use a GMM policy or not, and whether to use an RNN dimension of 400 with an MLP of size (1024, 1024) or an RNN dimension of 1000 with an empty MLP. Notice that the learning rate goes in `group` 1, the GMM enabled parameter goes in `group` 2, and the RNN dimension and MLP layer dims both go in `group` 3. +
+

Empty hyperparameter names

+ +Leaving `name=""` ensures that the experiment name is not determined by these parameter values. +Only do this if you are sweeping over a single value! -The `group` argument specifies which arguments should be modified together. The hyperparameter script will generate a training run for each hyperparameter setting in the cartesian product between all groups. Thus, putting the RNN dimension and MLP layer dims in the same group ensures that the parameters change together (RNN dimension 400 always occurs with MLP layer dims (1024, 1024), and RNN dimension 1000 always occurs with an empty MLP). Finally, notice the use of the `value_names` argument -- by default, the generated config will have an experiment name consisting of the base name under `config.experiment.name` already present in the base json, and then the `name` specified for each parameter, along with the string representation of the selected value in `values`, but `value_names` allows you to override this with a custom string for the corresponding value. +
+ +### Define hyperparameter scan values +Now we define our scan - we could like to sweep the following: +- policy learning rate in [1e-3, 1e-4] +- whether to use a GMM policy or not +- whether to use an RNN dimension of 400 with an MLP of size (1024, 1024) or an RNN dimension of 1000 with an empty MLP + +Notice that the learning rate goes in `group` 1, the GMM enabled parameter goes in `group` 2, and the RNN dimension and MLP layer dims both go in `group` 3. + +
+

Sweeping hyperparameters together

+ +We set the RNN dimension and MLP layer dims in the same group to ensure that the parameters change together (RNN dimension 400 always occurs with MLP layer dims (1024, 1024), and RNN dimension 1000 always occurs with an empty MLP). + +
```python # LR - 1e-3, 1e-4 @@ -172,10 +251,9 @@ The `group` argument specifies which arguments should be modified together. The ], value_names=["1024", "0"], ) - - return generator ``` +## Step 4: Run Hyperparameter Helper Script Finally, we run the hyperparameter helper script (which contains the function we defined above). ```sh diff --git a/docs/modules/observations.md b/docs/tutorials/observations.md similarity index 78% rename from docs/modules/observations.md rename to docs/tutorials/observations.md index 51dd61d0..efebd04e 100644 --- a/docs/modules/observations.md +++ b/docs/tutorials/observations.md @@ -1,4 +1,4 @@ -# Observations +# Multimodal Observations **robomimic** natively supports multiple different observation modalities, and provides integrated support for modifying observations and adding your own custom ones. @@ -15,6 +15,12 @@ Observations are handled in the following way: ## Modifying and Adding Your Own Observation Modalities -**robomimic** natively supports low dimensional (`low_dim`), RGB images (`rgb`), depth images (`depth`), and scan arrays (`scan`). The way each of these modalities are processed and encoded can be easily specified by modifying their respective `encoder` parameters in your `Config` class. +**robomimic** natively supports the following modalities: +- `low_dim`: low-dimensional states +- `rgb`: RGB images +- `depth`: depth images +- `scan`: scan arrays -You may want to specify your own custom modalities that get processed and encoded in a certain way (e.g.: semantic segmentation, optical flow, etc...). This can also easily be done, and we refer you to our [example script](../introduction/examples.html#custom-observation-modalities-example) which walks through the process. \ No newline at end of file +The way each of these modalities are processed and encoded can be easily specified by modifying their respective `encoder` parameters in your `Config` class. + +You may want to specify your own custom modalities that get processed and encoded in a certain way (e.g.: semantic segmentation, optical flow, etc...). This can also easily be done, and we refer you to our [example script](https://github.com/ARISE-Initiative/robomimic/blob/master/examples/simple_obs_nets.py) which walks through the process. diff --git a/docs/tutorials/reproducing_experiments.md b/docs/tutorials/reproducing_experiments.md new file mode 100644 index 00000000..647c9142 --- /dev/null +++ b/docs/tutorials/reproducing_experiments.md @@ -0,0 +1,17 @@ +# Reproducing Published Experiments and Results + +This is a guide on how to reproduce published experiments and results for various datasets. + +
+

Note: Understand how to launch training runs and view results first!

+ +Before trying to reproduce published results, it might be useful to read the following tutorials: +- [how to launch training runs](./configs.html) +- [how to view training results](./viewing_results.html) +- [how to launch multiple training runs efficiently](./hyperparam_scan.html) + +
+ +1. Follow the steps in the [Dataset Pipeline](../datasets/overview.html#dataset-pipeline) in order to download and postprocess your dataset(s) of interest. +2. Some of the datasets provide explicit guidelines on reproducing experiments that should be followed (for example, the [CoRL 2021 robomimic datasets](../datasets/robomimic_v0.1.html#reproduce-study-results)). +3. Otherwise, you can just follow the normal steps for [launching training runs](./configs.html) and [viewing training results](./viewing_results.html). diff --git a/docs/modules/utils.md b/docs/tutorials/tensor_collections.md similarity index 95% rename from docs/modules/utils.md rename to docs/tutorials/tensor_collections.md index c0420cd3..de9df643 100644 --- a/docs/modules/utils.md +++ b/docs/tutorials/tensor_collections.md @@ -1,6 +1,7 @@ -# Utils +# Operations over Tensor Collections -This section highlights some important utility functions / classes used in the codebase. +This section highlights some important utility functions and classes used in the codebase for working with +collections of tensors. ## TensorUtils diff --git a/docs/tutorials/using_pretrained_models.md b/docs/tutorials/using_pretrained_models.md new file mode 100644 index 00000000..9f1114a8 --- /dev/null +++ b/docs/tutorials/using_pretrained_models.md @@ -0,0 +1,26 @@ +# Using Pretrained Models + +This tutorial shows how to use pretrained model checkpoints. + +
+

Jupyter Notebook: Working with Pretrained Policies

+ +The rest of this tutorial shows how to use utility scripts to load and rollout a trained policy. If you wish to do so via an interactive notebook, please refer to the [jupyter notebook](https://github.com/ARISE-Initiative/robomimic/blob/master/examples/notebooks/run_policy.ipynb) at `examples/notebooks/run_policy.ipynb`. The notebook tutorial shows how to download a checkpoint from the model zoo, load the checkpoint in pytorch, and rollout the policy. + +
+ +## Evaluating Trained Policies + +Saved policy checkpoints in the `models` directory can be evaluated using the `run_trained_agent.py` script: +```sh +# 50 rollouts with max horizon 400 and render agentview and wrist camera images to video +$ python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --video_path /path/to/output.mp4 --camera_names agentview robot0_eye_in_hand + +# Write rollouts to a new dataset hdf5 +python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --dataset_path /path/to/output.hdf5 --dataset_obs + +# Write rollouts without explicit observations to hdf5 +python run_trained_agent.py --agent /path/to/model.pth --n_rollouts 50 --horizon 400 --seed 0 --dataset_path /path/to/output.hdf5 +``` + +In the last case, the observations can be (later) extracted later using the `dataset_states_to_obs.py` script (see [here](../datasets/robosuite.html#extracting-observations-from-mujoco-states)). diff --git a/docs/tutorials/viewing_results.md b/docs/tutorials/viewing_results.md new file mode 100644 index 00000000..36d972ee --- /dev/null +++ b/docs/tutorials/viewing_results.md @@ -0,0 +1,94 @@ +# Logging and Viewing Training Results + +In this section, we describe how to configure the logging and evaluations that occur during your training run, and how to view the results of a training run. + +## Configuring Logging + +### Saving Experiment Logs +Configured under `experiment.logging`: +``` +"logging": { + # save terminal outputs under `logs/log.txt` in experiment folder + "terminal_output_to_txt": true, + + # save tensorboard logs under `logs/tb` in experiment folder + "log_tb": true +}, +``` + +### Saving Model Checkpoints +Configured under `experiment.save`: +``` +"save": { + # enable saving model checkpoints + "enabled": true, + + # controlling frequency of checkpoints + "every_n_seconds": null, + "every_n_epochs": 50, + "epochs": [], + + # saving the best checkpoints + "on_best_validation": false, + "on_best_rollout_return": false, + "on_best_rollout_success_rate": true +}, +``` + +### Evaluating Rollouts and Saving Videos +#### Evaluating Rollouts +Configured under `experiment.rollout`: +``` +"rollout": { + "enabled": true, # enable evaluation rollouts + "n": 50, # number of rollouts per evaluation + "horizon": 400, # number of timesteps per rollout + "rate": 50, # frequency of evaluation (in epochs) + "terminate_on_success": true # terminating rollouts upon task success +} +``` + +#### Saving Videos +To save videos of the rollouts, set `experiment.render_video` to `true`. + +## Viewing Training Results + +### Contents of Training Outputs +After the script finishes, you can check the training outputs in the `//` experiment directory: +``` +config.json # config used for this experiment +logs/ # experiment log files + log.txt # terminal output + tb/ # tensorboard logs +videos/ # videos of robot rollouts during training +models/ # saved model checkpoints +``` + +
+

Loading Trained Checkpoints

+ +Please see the [Using Pretrained Models](./using_pretrained_models.html) tutorial to see how to load the trained model checkpoints in the `models` directory. + +
+ +### Viewing Tensorboard Results +The experiment results can be viewed using tensorboard: +```sh +$ tensorboard --logdir --bind_all +``` +Below is a snapshot of the tensorboard dashboard: + +

+ +

+
+ +Experiment results (y-axis) are logged across epochs (x-axis). +You may find the following logging metrics useful: +- `Rollout/`: evaluation rollout metrics, eg. success rate, rewards, etc. + - `Rollout/Success_Rate/{envname}-max`: maximum success rate over time (this is the metric the [study paper](https://arxiv.org/abs/2108.03298) uses to evaluate baselines) +- `Timing_Stats/`: time spent by the algorithm loading data, training, performing rollouts, etc. +- `Timing_Stats/`: time spent by the algorithm loading data, training, performing rollouts, etc. +- `Train/`: training stats +- `Validation/`: validation stats +- `System/RAM Usage (MB)`: system RAM used by algorithm \ No newline at end of file diff --git a/examples/notebooks/datasets.ipynb b/examples/notebooks/datasets.ipynb new file mode 100644 index 00000000..f930e99d --- /dev/null +++ b/examples/notebooks/datasets.ipynb @@ -0,0 +1,1419 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d7e6ab09", + "metadata": {}, + "source": [ + "# A deep dive into robomimic datasets\n", + "\n", + "This notebook will provide examples on how to work with robomimic datasets through various python code examples. This notebook assumes that you have installed `robomimic` and `robosuite` (which should be on the `offline_study` branch)." + ] + }, + { + "cell_type": "markdown", + "id": "2a05e543", + "metadata": {}, + "source": [ + "## Download dataset\n", + "\n", + "First, let's try downloading a simple dataset - we'll use the Lift (PH) dataset. Note that there are utility scripts such as `scripts/download_datasets.py` to do this for us, but for the purposes of this example, we'll use the python API." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e2b90e6", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "low_dim.hdf5: 18.6MB [00:00, 27.5MB/s] \n" + ] + } + ], + "source": [ + "import os\n", + "import json\n", + "import h5py\n", + "import numpy as np\n", + "\n", + "import robomimic\n", + "import robomimic.utils.file_utils as FileUtils\n", + "\n", + "# the dataset registry can be found at robomimic/__init__.py\n", + "from robomimic import DATASET_REGISTRY\n", + "\n", + "# set download folder and make it\n", + "download_folder = \"/tmp/robomimic_ds_example\"\n", + "os.makedirs(download_folder, exist_ok=True)\n", + "\n", + "# download the dataset\n", + "task = \"lift\"\n", + "dataset_type = \"ph\"\n", + "hdf5_type = \"low_dim\"\n", + "FileUtils.download_url(\n", + " url=DATASET_REGISTRY[task][dataset_type][hdf5_type][\"url\"], \n", + " download_dir=download_folder,\n", + ")\n", + "\n", + "# enforce that the dataset exists\n", + "dataset_path = os.path.join(download_folder, \"low_dim.hdf5\")\n", + "assert os.path.exists(dataset_path)" + ] + }, + { + "cell_type": "markdown", + "id": "54bdec82", + "metadata": {}, + "source": [ + "## Read quantities from dataset\n", + "\n", + "Next, let's demonstrate how to read different quantities from the dataset. There are scripts such as `scripts/get_dataset_info.py` that can help you easily understand the contents of a dataset, but in this example, we'll break down how to do this directly.\n", + "\n", + "First, let's take a look at the number of demonstrations in the file." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a35cd8e9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "hdf5 file /tmp/robomimic_ds_example/low_dim.hdf5 has 200 demonstrations\n" + ] + } + ], + "source": [ + "# open file\n", + "f = h5py.File(dataset_path, \"r\")\n", + "\n", + "# each demonstration is a group under \"data\"\n", + "demos = list(f[\"data\"].keys())\n", + "num_demos = len(demos)\n", + "\n", + "print(\"hdf5 file {} has {} demonstrations\".format(dataset_path, num_demos))" + ] + }, + { + "cell_type": "markdown", + "id": "bdb073a0", + "metadata": {}, + "source": [ + "Next, let's list all of the demonstrations, along with the number of state-action pairs in each demonstration." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9bda3e70", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "demo_0 has 59 samples\n", + "demo_1 has 58 samples\n", + "demo_2 has 57 samples\n", + "demo_3 has 55 samples\n", + "demo_4 has 51 samples\n", + "demo_5 has 58 samples\n", + "demo_6 has 49 samples\n", + "demo_7 has 49 samples\n", + "demo_8 has 44 samples\n", + "demo_9 has 51 samples\n", + "demo_10 has 54 samples\n", + "demo_11 has 49 samples\n", + "demo_12 has 53 samples\n", + "demo_13 has 59 samples\n", + "demo_14 has 51 samples\n", + "demo_15 has 50 samples\n", + "demo_16 has 50 samples\n", + "demo_17 has 45 samples\n", + "demo_18 has 49 samples\n", + "demo_19 has 51 samples\n", + "demo_20 has 48 samples\n", + "demo_21 has 57 samples\n", + "demo_22 has 47 samples\n", + "demo_23 has 47 samples\n", + "demo_24 has 52 samples\n", + "demo_25 has 48 samples\n", + "demo_26 has 43 samples\n", + "demo_27 has 46 samples\n", + "demo_28 has 49 samples\n", + "demo_29 has 41 samples\n", + "demo_30 has 46 samples\n", + "demo_31 has 42 samples\n", + "demo_32 has 59 samples\n", + "demo_33 has 54 samples\n", + "demo_34 has 48 samples\n", + "demo_35 has 58 samples\n", + "demo_36 has 45 samples\n", + "demo_37 has 50 samples\n", + "demo_38 has 49 samples\n", + "demo_39 has 41 samples\n", + "demo_40 has 40 samples\n", + "demo_41 has 49 samples\n", + "demo_42 has 53 samples\n", + "demo_43 has 39 samples\n", + "demo_44 has 46 samples\n", + "demo_45 has 49 samples\n", + "demo_46 has 47 samples\n", + "demo_47 has 40 samples\n", + "demo_48 has 53 samples\n", + "demo_49 has 48 samples\n", + "demo_50 has 45 samples\n", + "demo_51 has 47 samples\n", + "demo_52 has 46 samples\n", + "demo_53 has 55 samples\n", + "demo_54 has 43 samples\n", + "demo_55 has 56 samples\n", + "demo_56 has 40 samples\n", + "demo_57 has 38 samples\n", + "demo_58 has 38 samples\n", + "demo_59 has 44 samples\n", + "demo_60 has 42 samples\n", + "demo_61 has 54 samples\n", + "demo_62 has 41 samples\n", + "demo_63 has 42 samples\n", + "demo_64 has 53 samples\n", + "demo_65 has 38 samples\n", + "demo_66 has 41 samples\n", + "demo_67 has 42 samples\n", + "demo_68 has 39 samples\n", + "demo_69 has 42 samples\n", + "demo_70 has 48 samples\n", + "demo_71 has 45 samples\n", + "demo_72 has 38 samples\n", + "demo_73 has 36 samples\n", + "demo_74 has 48 samples\n", + "demo_75 has 36 samples\n", + "demo_76 has 48 samples\n", + "demo_77 has 39 samples\n", + "demo_78 has 44 samples\n", + "demo_79 has 44 samples\n", + "demo_80 has 40 samples\n", + "demo_81 has 38 samples\n", + "demo_82 has 47 samples\n", + "demo_83 has 52 samples\n", + "demo_84 has 53 samples\n", + "demo_85 has 46 samples\n", + "demo_86 has 38 samples\n", + "demo_87 has 39 samples\n", + "demo_88 has 39 samples\n", + "demo_89 has 41 samples\n", + "demo_90 has 42 samples\n", + "demo_91 has 37 samples\n", + "demo_92 has 51 samples\n", + "demo_93 has 50 samples\n", + "demo_94 has 51 samples\n", + "demo_95 has 46 samples\n", + "demo_96 has 56 samples\n", + "demo_97 has 53 samples\n", + "demo_98 has 46 samples\n", + "demo_99 has 46 samples\n", + "demo_100 has 47 samples\n", + "demo_101 has 43 samples\n", + "demo_102 has 58 samples\n", + "demo_103 has 52 samples\n", + "demo_104 has 48 samples\n", + "demo_105 has 55 samples\n", + "demo_106 has 49 samples\n", + "demo_107 has 62 samples\n", + "demo_108 has 43 samples\n", + "demo_109 has 50 samples\n", + "demo_110 has 45 samples\n", + "demo_111 has 46 samples\n", + "demo_112 has 44 samples\n", + "demo_113 has 43 samples\n", + "demo_114 has 47 samples\n", + "demo_115 has 49 samples\n", + "demo_116 has 59 samples\n", + "demo_117 has 52 samples\n", + "demo_118 has 54 samples\n", + "demo_119 has 53 samples\n", + "demo_120 has 63 samples\n", + "demo_121 has 53 samples\n", + "demo_122 has 60 samples\n", + "demo_123 has 51 samples\n", + "demo_124 has 47 samples\n", + "demo_125 has 55 samples\n", + "demo_126 has 56 samples\n", + "demo_127 has 58 samples\n", + "demo_128 has 55 samples\n", + "demo_129 has 53 samples\n", + "demo_130 has 50 samples\n", + "demo_131 has 47 samples\n", + "demo_132 has 46 samples\n", + "demo_133 has 43 samples\n", + "demo_134 has 45 samples\n", + "demo_135 has 54 samples\n", + "demo_136 has 53 samples\n", + "demo_137 has 57 samples\n", + "demo_138 has 50 samples\n", + "demo_139 has 48 samples\n", + "demo_140 has 49 samples\n", + "demo_141 has 54 samples\n", + "demo_142 has 55 samples\n", + "demo_143 has 49 samples\n", + "demo_144 has 51 samples\n", + "demo_145 has 45 samples\n", + "demo_146 has 50 samples\n", + "demo_147 has 51 samples\n", + "demo_148 has 50 samples\n", + "demo_149 has 58 samples\n", + "demo_150 has 46 samples\n", + "demo_151 has 46 samples\n", + "demo_152 has 45 samples\n", + "demo_153 has 42 samples\n", + "demo_154 has 49 samples\n", + "demo_155 has 45 samples\n", + "demo_156 has 63 samples\n", + "demo_157 has 41 samples\n", + "demo_158 has 42 samples\n", + "demo_159 has 45 samples\n", + "demo_160 has 43 samples\n", + "demo_161 has 46 samples\n", + "demo_162 has 52 samples\n", + "demo_163 has 55 samples\n", + "demo_164 has 44 samples\n", + "demo_165 has 42 samples\n", + "demo_166 has 51 samples\n", + "demo_167 has 64 samples\n", + "demo_168 has 57 samples\n", + "demo_169 has 52 samples\n", + "demo_170 has 48 samples\n", + "demo_171 has 45 samples\n", + "demo_172 has 53 samples\n", + "demo_173 has 39 samples\n", + "demo_174 has 47 samples\n", + "demo_175 has 52 samples\n", + "demo_176 has 63 samples\n", + "demo_177 has 50 samples\n", + "demo_178 has 47 samples\n", + "demo_179 has 48 samples\n", + "demo_180 has 55 samples\n", + "demo_181 has 52 samples\n", + "demo_182 has 55 samples\n", + "demo_183 has 53 samples\n", + "demo_184 has 44 samples\n", + "demo_185 has 59 samples\n", + "demo_186 has 45 samples\n", + "demo_187 has 43 samples\n", + "demo_188 has 44 samples\n", + "demo_189 has 52 samples\n", + "demo_190 has 51 samples\n", + "demo_191 has 40 samples\n", + "demo_192 has 49 samples\n", + "demo_193 has 42 samples\n", + "demo_194 has 36 samples\n", + "demo_195 has 54 samples\n", + "demo_196 has 42 samples\n", + "demo_197 has 40 samples\n", + "demo_198 has 45 samples\n", + "demo_199 has 49 samples\n" + ] + } + ], + "source": [ + "# each demonstration is named \"demo_#\" where # is a number.\n", + "# Let's put the demonstration list in increasing episode order\n", + "inds = np.argsort([int(elem[5:]) for elem in demos])\n", + "demos = [demos[i] for i in inds]\n", + "\n", + "for ep in demos:\n", + " num_actions = f[\"data/{}/actions\".format(ep)].shape[0]\n", + " print(\"{} has {} samples\".format(ep, num_actions))" + ] + }, + { + "cell_type": "markdown", + "id": "ff998d62", + "metadata": {}, + "source": [ + "Now, let's dig into a single trajectory to take a look at some of the quantities in each demonstration." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2f7b497a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "timestep 0\n", + "obs\n", + "{\n", + " \"object\": [\n", + " 0.026449414293141932,\n", + " 0.026981257415918496,\n", + " 0.8314240728038541,\n", + " 0.0,\n", + " 0.0,\n", + " 0.9691094123222487,\n", + " 0.24663119621902302,\n", + " -0.11694359335499373,\n", + " -0.042210722419907185,\n", + " 0.1804238946722606\n", + " ],\n", + " \"robot0_eef_pos\": [\n", + " -0.0904941790618518,\n", + " -0.015229465003988687,\n", + " 1.0118479674761147\n", + " ],\n", + " \"robot0_eef_quat\": [\n", + " 0.9972275858407224,\n", + " -0.007232815062599316,\n", + " 0.07403510413011814,\n", + " 0.0019057232216049126\n", + " ],\n", + " \"robot0_eef_vel_ang\": [\n", + " 0.0,\n", + " 0.0,\n", + " 0.0\n", + " ],\n", + " \"robot0_eef_vel_lin\": [\n", + " 0.0,\n", + " 0.0,\n", + " 0.0\n", + " ],\n", + " \"robot0_gripper_qpos\": [\n", + " 0.020833,\n", + " -0.020833\n", + " ],\n", + " \"robot0_gripper_qvel\": [\n", + " 0.0,\n", + " 0.0\n", + " ],\n", + " \"robot0_joint_pos\": [\n", + " -0.041410389327799474,\n", + " 0.21736868939218443,\n", + " 0.007539738887367773,\n", + " -2.589845402931484,\n", + " -0.007843816214400163,\n", + " 2.9554575771696747,\n", + " 0.7738283119303786\n", + " ],\n", + " \"robot0_joint_pos_cos\": [\n", + " 0.9991427123462239,\n", + " 0.9764683001348743,\n", + " 0.9999715763034073,\n", + " -0.851609955590076,\n", + " 0.9999692374313213,\n", + " -0.9827268240955787,\n", + " 0.7152403924484989\n", + " ],\n", + " \"robot0_joint_pos_sin\": [\n", + " -0.04139855511284203,\n", + " 0.21566098124535443,\n", + " 0.0075396674514822334,\n", + " -0.5241760043533744,\n", + " -0.007843735782256875,\n", + " 0.1850621225508276,\n", + " 0.6988785166322665\n", + " ],\n", + " \"robot0_joint_vel\": [\n", + " 0.0,\n", + " 0.0,\n", + " 0.0,\n", + " 0.0,\n", + " 0.0,\n", + " 0.0,\n", + " 0.0\n", + " ]\n", + "}\n", + "action\n", + "[-0. 0. 0. 0.00381497 0.14820713 0.01447902\n", + " -1. ]\n", + "timestep 1\n", + "obs\n", + "{\n", + " \"object\": [\n", + " 0.02644963304851104,\n", + " 0.026981489259044183,\n", + " 0.8193230127298963,\n", + " 3.687451776864639e-06,\n", + " 6.636591241497241e-06,\n", + " 0.9691094122922576,\n", + " 0.24663119622001103,\n", + " -0.12042898519933254,\n", + " -0.04002245385574614,\n", + " 0.19099218419447617\n", + " ],\n", + " \"robot0_eef_pos\": [\n", + " -0.0939793521508215,\n", + " -0.013040964596701957,\n", + " 1.0103151969243724\n", + " ],\n", + " \"robot0_eef_quat\": [\n", + " 0.9976541041157041,\n", + " -0.005165799685637573,\n", + " 0.06825034642115067,\n", + " 0.0012219934912254607\n", + " ],\n", + " \"robot0_eef_vel_ang\": [\n", + " 0.030008726302490445,\n", + " 0.32730904658446547,\n", + " 0.12017070883228292\n", + " ],\n", + " \"robot0_eef_vel_lin\": [\n", + " -0.06337027577857833,\n", + " 0.0584591961202545,\n", + " -0.03119876681899503\n", + " ],\n", + " \"robot0_gripper_qpos\": [\n", + " 0.021144677345077283,\n", + " -0.021098803032220514\n", + " ],\n", + " \"robot0_gripper_qvel\": [\n", + " 0.020536516913238687,\n", + " -0.01967148001615872\n", + " ],\n", + " \"robot0_joint_pos\": [\n", + " -0.040347907449666154,\n", + " 0.21541772265344838,\n", + " 0.010760092121867423,\n", + " -2.5941357356309553,\n", + " -0.006995190747474993,\n", + " 2.9461625155338433,\n", + " 0.7730220470953911\n", + " ],\n", + " \"robot0_joint_pos_cos\": [\n", + " 0.9991861336026011,\n", + " 0.9768871889182116,\n", + " 0.9999421107673003,\n", + " -0.8538510003817338,\n", + " 0.9999755337529701,\n", + " -0.9809642324353866,\n", + " 0.7158036410836986\n", + " ],\n", + " \"robot0_joint_pos_sin\": [\n", + " -0.04033696092028956,\n", + " 0.21375551484692587,\n", + " 0.0107598844899072,\n", + " -0.520517501288009,\n", + " -0.006995133698693658,\n", + " 0.19418850296156312,\n", + " 0.6983016163602369\n", + " ],\n", + " \"robot0_joint_vel\": [\n", + " 0.03543580106414391,\n", + " -0.07043374630232116,\n", + " 0.08873795004885236,\n", + " -0.11157492930937571,\n", + " 0.01835795870713538,\n", + " -0.2870936164472272,\n", + " -0.01528323082917978\n", + " ]\n", + "}\n", + "action\n", + "[ 0.204 0.087 -0.072 0.0040731 0.13793246 0.00472844\n", + " -1. ]\n", + "timestep 2\n", + "obs\n", + "{\n", + " \"object\": [\n", + " 0.02644979048142018,\n", + " 0.026981656585798056,\n", + " 0.8188363799238965,\n", + " 6.124550057574148e-06,\n", + " 1.1039634957752413e-05,\n", + " 0.9691094122391417,\n", + " 0.24663119622246055,\n", + " -0.12215724667074111,\n", + " -0.03828681206822328,\n", + " 0.1896874619017075\n", + " ],\n", + " \"robot0_eef_pos\": [\n", + " -0.09570745618932093,\n", + " -0.011305155482425224,\n", + " 1.008523841825604\n", + " ],\n", + " \"robot0_eef_quat\": [\n", + " 0.9980669555971526,\n", + " -0.0037167922644707474,\n", + " 0.06203175911084908,\n", + " 0.0007736031979145991\n", + " ],\n", + " \"robot0_eef_vel_ang\": [\n", + " 0.008174675780542328,\n", + " 0.2117180550504403,\n", + " 0.025337152199680114\n", + " ],\n", + " \"robot0_eef_vel_lin\": [\n", + " 0.021946528579046963,\n", + " 0.019244360508797305,\n", + " -0.03325545509870823\n", + " ],\n", + " \"robot0_gripper_qpos\": [\n", + " 0.023065368793203082,\n", + " -0.023089813685773657\n", + " ],\n", + " \"robot0_gripper_qvel\": [\n", + " 0.05386538871513393,\n", + " -0.05464789210707713\n", + " ],\n", + " \"robot0_joint_pos\": [\n", + " -0.03862546371287872,\n", + " 0.2176935517862639,\n", + " 0.012540741592900084,\n", + " -2.59203892182414,\n", + " -0.006915289916081674,\n", + " 2.933857605408448,\n", + " 0.7734679128154149\n", + " ],\n", + " \"robot0_joint_pos_cos\": [\n", + " 0.9992541295153923,\n", + " 0.9763981884673584,\n", + " 0.9999213659307244,\n", + " -0.852757695866131,\n", + " 0.9999760894779743,\n", + " -0.978500557298211,\n", + " 0.7154922211915071\n", + " ],\n", + " \"robot0_joint_pos_sin\": [\n", + " -0.03861586003749854,\n", + " 0.21597818768954657,\n", + " 0.012540412881329139,\n", + " -0.5223067222821157,\n", + " -0.0069152347999298655,\n", + " 0.20624417414096938,\n", + " 0.6986206992456232\n", + " ],\n", + " \"robot0_joint_vel\": [\n", + " 0.03587066701618021,\n", + " 0.11427360447919478,\n", + " 0.007098630969981366,\n", + " 0.13903479858802026,\n", + " -0.0045408292982859885,\n", + " -0.2366374332873465,\n", + " 0.021655490569626238\n", + " ]\n", + "}\n", + "action\n", + "[ 0.323 0.131 -0.073 0.00670891 0.12851983 -0.00825769\n", + " -1. ]\n", + "timestep 3\n", + "obs\n", + "{\n", + " \"object\": [\n", + " 0.026449610437205617,\n", + " 0.026981465550533244,\n", + " 0.8200930180165275,\n", + " 2.989477547714015e-06,\n", + " 5.389691506845857e-06,\n", + " 0.9691094122977165,\n", + " 0.24663119623840957,\n", + " -0.12084811993760973,\n", + " -0.03702038558498077,\n", + " 0.1864341070243961\n", + " ],\n", + " \"robot0_eef_pos\": [\n", + " -0.09439850950040411,\n", + " -0.010038920034447524,\n", + " 1.0065271250409236\n", + " ],\n", + " \"robot0_eef_quat\": [\n", + " 0.9983755425976737,\n", + " -0.003414240303874383,\n", + " 0.056872138948892356,\n", + " 0.00042274972047360443\n", + " ],\n", + " \"robot0_eef_vel_ang\": [\n", + " 0.015588272761144207,\n", + " 0.20338910134567467,\n", + " 0.001415554639112783\n", + " ],\n", + " \"robot0_eef_vel_lin\": [\n", + " 0.06160381269345277,\n", + " 0.025010135696567033,\n", + " -0.03656737866839496\n", + " ],\n", + " \"robot0_gripper_qpos\": [\n", + " 0.026358274928870957,\n", + " -0.02636141449910154\n", + " ],\n", + " \"robot0_gripper_qvel\": [\n", + " 0.0745201310630871,\n", + " -0.07411367614184626\n", + " ],\n", + " \"robot0_joint_pos\": [\n", + " -0.03650447151890695,\n", + " 0.22634892583028604,\n", + " 0.013167323320805235,\n", + " -2.581147494051686,\n", + " -0.006772139059673779,\n", + " 2.9212721348692297,\n", + " 0.7754134156517873\n", + " ],\n", + " \"robot0_joint_pos_cos\": [\n", + " 0.999333785766275,\n", + " 0.9744922663574513,\n", + " 0.9999133120507784,\n", + " -0.847018564469704,\n", + " 0.999977069153916,\n", + " -0.9758274525241541,\n", + " 0.7141316994350739\n", + " ],\n", + " \"robot0_joint_pos_sin\": [\n", + " -0.03649636455929419,\n", + " 0.22442108370097089,\n", + " 0.0131669428358545,\n", + " -0.5315633089705137,\n", + " -0.006772087295968501,\n", + " 0.21854240526776433,\n", + " 0.7000113683805237\n", + " ],\n", + " \"robot0_joint_vel\": [\n", + " 0.04691353138789363,\n", + " 0.1994635772259614,\n", + " 0.01143329494353153,\n", + " 0.2674340987745522,\n", + " 0.004438708485487719,\n", + " -0.27163705349779815,\n", + " 0.052193205025039595\n", + " ]\n", + "}\n", + "action\n", + "[ 0.491 0.21 -0.19 0.00824866 0.12259889 -0.02527941\n", + " -1. ]\n", + "timestep 4\n", + "obs\n", + "{\n", + " \"object\": [\n", + " 0.026449513581120028,\n", + " 0.026981362757104405,\n", + " 0.8206866047770742,\n", + " 1.3457488015772851e-06,\n", + " 2.4265647886561854e-06,\n", + " 0.969109412312331,\n", + " 0.24663119624238392,\n", + " -0.1173250875824681,\n", + " -0.03522245982820696,\n", + " 0.1830334739182844\n", + " ],\n", + " \"robot0_eef_pos\": [\n", + " -0.09087557400134808,\n", + " -0.008241097071102555,\n", + " 1.0037200786953586\n", + " ],\n", + " \"robot0_eef_quat\": [\n", + " 0.9986502568703245,\n", + " -0.0038131638489504183,\n", + " 0.05179871492175989,\n", + " -0.00013178296654296415\n", + " ],\n", + " \"robot0_eef_vel_ang\": [\n", + " 0.02512282243933508,\n", + " 0.20309719193885506,\n", + " -0.02906478800666458\n", + " ],\n", + " \"robot0_eef_vel_lin\": [\n", + " 0.10622683467941454,\n", + " 0.03716610735305083,\n", + " -0.06009354586057167\n", + " ],\n", + " \"robot0_gripper_qpos\": [\n", + " 0.030413930567981536,\n", + " -0.030375387077326593\n", + " ],\n", + " \"robot0_gripper_qvel\": [\n", + " 0.08575894495332877,\n", + " -0.0853433936999638\n", + " ],\n", + " \"robot0_joint_pos\": [\n", + " -0.033742898584622906,\n", + " 0.24158202550110155,\n", + " 0.014307614740759025,\n", + " -2.562765138816014,\n", + " -0.006403913367086805,\n", + " 2.907945086256439,\n", + " 0.7796524071820293\n", + " ],\n", + " \"robot0_joint_pos_cos\": [\n", + " 0.999430762410992,\n", + " 0.9709607078578428,\n", + " 0.9998976478262572,\n", + " -0.8371046247858968,\n", + " 0.9999794950168696,\n", + " -0.9728283562961821,\n", + " 0.7111579499359364\n", + " ],\n", + " \"robot0_joint_pos_sin\": [\n", + " -0.03373649576620538,\n", + " 0.23923900977097504,\n", + " 0.014307126598938211,\n", + " -0.5470428202271399,\n", + " -0.006403869596315117,\n", + " 0.2315275128058616,\n", + " 0.703032268279996\n", + " ],\n", + " \"robot0_joint_vel\": [\n", + " 0.06079543928279841,\n", + " 0.3522398354114109,\n", + " 0.02364828609743366,\n", + " 0.4241619255231188,\n", + " 0.0024945495675376796,\n", + " -0.2753739143424271,\n", + " 0.10996504673995737\n", + " ]\n", + "}\n", + "action\n", + "[ 0.465 0.309 -0.304 0.00568831 0.10961337 -0.04577875\n", + " -1. ]\n" + ] + } + ], + "source": [ + "# look at first demonstration\n", + "demo_key = demos[0]\n", + "demo_grp = f[\"data/{}\".format(demo_key)]\n", + "\n", + "# Each observation is a dictionary that maps modalities to numpy arrays, and\n", + "# each action is a numpy array. Let's print the observations and actions for the \n", + "# first 5 timesteps of this trajectory.\n", + "for t in range(5):\n", + " print(\"timestep {}\".format(t))\n", + " obs_t = dict()\n", + " # each observation modality is stored as a subgroup\n", + " for k in demo_grp[\"obs\"]:\n", + " obs_t[k] = demo_grp[\"obs/{}\".format(k)][t] # numpy array\n", + " act_t = demo_grp[\"actions\"][t]\n", + " \n", + " # pretty-print observation and action using json\n", + " obs_t_pp = { k : obs_t[k].tolist() for k in obs_t }\n", + " print(\"obs\")\n", + " print(json.dumps(obs_t_pp, indent=4))\n", + " print(\"action\")\n", + " print(act_t)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "552be387", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "shape of first ten actions (10, 7)\n", + "shape of all actions (59, 7)\n" + ] + } + ], + "source": [ + "# we can also grab multiple timesteps at once directly, or even the full trajectory at once\n", + "first_ten_actions = demo_grp[\"actions\"][:10]\n", + "print(\"shape of first ten actions {}\".format(first_ten_actions.shape))\n", + "all_actions = demo_grp[\"actions\"][:]\n", + "print(\"shape of all actions {}\".format(all_actions.shape))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "57976238", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "success\n" + ] + } + ], + "source": [ + "# the trajectory also contains the next observations under \"next_obs\", \n", + "# for convenient use in a batch (offline) RL pipeline. Let's verify\n", + "# that \"next_obs\" and \"obs\" are offset by 1.\n", + "for k in demo_grp[\"obs\"]:\n", + " # obs_{t+1} == next_obs_{t}\n", + " assert(np.allclose(demo_grp[\"obs\"][k][1:], demo_grp[\"next_obs\"][k][:-1]))\n", + "print(\"success\")" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "51ab4a38", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dones\n", + "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", + " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]\n", + "\n", + "rewards\n", + "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", + " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", + " 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1.]\n" + ] + } + ], + "source": [ + "# we also have \"done\" and \"reward\" information stored in each trajectory.\n", + "# In this case, we have sparse rewards that indicate task completion at\n", + "# that timestep.\n", + "dones = demo_grp[\"dones\"][:]\n", + "rewards = demo_grp[\"rewards\"][:]\n", + "print(\"dones\")\n", + "print(dones)\n", + "print(\"\")\n", + "print(\"rewards\")\n", + "print(rewards)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "360df27c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " \n", + " \n", + "\n" + ] + } + ], + "source": [ + "# each demonstration also contains metadata\n", + "num_samples = demo_grp.attrs[\"num_samples\"] # number of samples in this trajectory\n", + "mujoco_xml_file = demo_grp.attrs[\"model_file\"] # mujoco XML file for this demonstration\n", + "print(mujoco_xml_file)" + ] + }, + { + "cell_type": "markdown", + "id": "5f10f98f", + "metadata": {}, + "source": [ + "Finally, let's take a look at some global metadata present in the file. The hdf5 file stores environment metadata which is a convenient way to understand which simulation environment (task) the dataset was collected on. " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3b579caf", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==== Env Meta ====\n", + "{\n", + " \"env_name\": \"Lift\",\n", + " \"type\": 1,\n", + " \"env_kwargs\": {\n", + " \"has_renderer\": false,\n", + " \"has_offscreen_renderer\": false,\n", + " \"ignore_done\": true,\n", + " \"use_object_obs\": true,\n", + " \"use_camera_obs\": false,\n", + " \"control_freq\": 20,\n", + " \"controller_configs\": {\n", + " \"type\": \"OSC_POSE\",\n", + " \"input_max\": 1,\n", + " \"input_min\": -1,\n", + " \"output_max\": [\n", + " 0.05,\n", + " 0.05,\n", + " 0.05,\n", + " 0.5,\n", + " 0.5,\n", + " 0.5\n", + " ],\n", + " \"output_min\": [\n", + " -0.05,\n", + " -0.05,\n", + " -0.05,\n", + " -0.5,\n", + " -0.5,\n", + " -0.5\n", + " ],\n", + " \"kp\": 150,\n", + " \"damping\": 1,\n", + " \"impedance_mode\": \"fixed\",\n", + " \"kp_limits\": [\n", + " 0,\n", + " 300\n", + " ],\n", + " \"damping_limits\": [\n", + " 0,\n", + " 10\n", + " ],\n", + " \"position_limits\": null,\n", + " \"orientation_limits\": null,\n", + " \"uncouple_pos_ori\": true,\n", + " \"control_delta\": true,\n", + " \"interpolation\": null,\n", + " \"ramp_ratio\": 0.2\n", + " },\n", + " \"robots\": [\n", + " \"Panda\"\n", + " ],\n", + " \"camera_depths\": false,\n", + " \"camera_heights\": 84,\n", + " \"camera_widths\": 84,\n", + " \"reward_shaping\": false\n", + " }\n", + "}\n", + "\n" + ] + } + ], + "source": [ + "env_meta = json.loads(f[\"data\"].attrs[\"env_args\"])\n", + "# note: we could also have used the following function:\n", + "# env_meta = FileUtils.get_env_metadata_from_dataset(dataset_path=dataset_path)\n", + "print(\"==== Env Meta ====\")\n", + "print(json.dumps(env_meta, indent=4))\n", + "print(\"\")" + ] + }, + { + "cell_type": "markdown", + "id": "b395453a", + "metadata": {}, + "source": [ + "## Visualizing demonstration trajectories\n", + "\n", + "Finally, let's play some of these demonstrations back in the simulation environment to easily visualize the data that was collected." + ] + }, + { + "cell_type": "markdown", + "id": "d613ab93", + "metadata": {}, + "source": [ + "It turns out that the environment metadata stored in the hdf5 allows us to easily create a simulation environment that is consistent with the way the dataset was collected!" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "9c98068e", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO: Probing, EGL cannot run on this device\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Creating offscreen glfw\n", + "Created environment with name Lift\n", + "Action size is 7\n" + ] + } + ], + "source": [ + "import robomimic.utils.env_utils as EnvUtils\n", + "\n", + "# create simulation environment from environment metedata\n", + "env = EnvUtils.create_env_from_metadata(\n", + " env_meta=env_meta, \n", + " render=False, # no on-screen rendering\n", + " render_offscreen=True, # off-screen rendering to support rendering video frames\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "595a47d7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "============= Initialized Observation Utils with Obs Spec =============\n", + "\n", + "using obs modality: low_dim with keys: ['robot0_eef_pos']\n", + "using obs modality: rgb with keys: []\n" + ] + } + ], + "source": [ + "import robomimic.utils.obs_utils as ObsUtils\n", + "\n", + "# We normally need to make sure robomimic knows which observations are images (for the\n", + "# data processing pipeline). This is usually inferred from your training config, but\n", + "# since we are just playing back demonstrations, we just need to initialize robomimic\n", + "# with a dummy spec.\n", + "dummy_spec = dict(\n", + " obs=dict(\n", + " low_dim=[\"robot0_eef_pos\"],\n", + " rgb=[],\n", + " ),\n", + ")\n", + "ObsUtils.initialize_obs_utils_with_obs_specs(obs_modality_specs=dummy_spec)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d997cf39", + "metadata": {}, + "outputs": [], + "source": [ + "import imageio\n", + "\n", + "# prepare to write playback trajectories to video\n", + "video_path = os.path.join(download_folder, \"playback.mp4\")\n", + "video_writer = imageio.get_writer(video_path, fps=20)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "dfae1aaa", + "metadata": {}, + "outputs": [], + "source": [ + "def playback_trajectory(demo_key):\n", + " \"\"\"\n", + " Simple helper function to playback the trajectory stored under the hdf5 group @demo_key and\n", + " write frames rendered from the simulation to the active @video_writer.\n", + " \"\"\"\n", + " \n", + " # robosuite datasets store the ground-truth simulator states under the \"states\" key.\n", + " # We will use the first one, alone with the model xml, to reset the environment to\n", + " # the initial configuration before playing back actions.\n", + " init_state = f[\"data/{}/states\".format(demo_key)][0]\n", + " model_xml = f[\"data/{}\".format(demo_key)].attrs[\"model_file\"]\n", + " initial_state_dict = dict(states=init_state, model=model_xml)\n", + " \n", + " # reset to initial state\n", + " env.reset_to(initial_state_dict)\n", + " \n", + " # playback actions one by one, and render frames\n", + " actions = f[\"data/{}/actions\".format(demo_key)][:]\n", + " for t in range(actions.shape[0]):\n", + " env.step(actions[t])\n", + " video_img = env.render(mode=\"rgb_array\", height=512, width=512, camera_name=\"agentview\")\n", + " video_writer.append_data(video_img)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "926d1811", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Playing back demo key: demo_0\n", + "Creating offscreen glfw\n", + "Creating offscreen glfw\n", + "Playing back demo key: demo_1\n", + "Creating offscreen glfw\n", + "Creating offscreen glfw\n", + "Playing back demo key: demo_2\n", + "Creating offscreen glfw\n", + "Creating offscreen glfw\n", + "Playing back demo key: demo_3\n", + "Creating offscreen glfw\n", + "Creating offscreen glfw\n", + "Playing back demo key: demo_4\n", + "Creating offscreen glfw\n", + "Creating offscreen glfw\n" + ] + } + ], + "source": [ + "# playback the first 5 demos\n", + "for ep in demos[:5]:\n", + " print(\"Playing back demo key: {}\".format(ep))\n", + " playback_trajectory(ep)\n", + "\n", + "# done writing video\n", + "video_writer.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "bc89c8d8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# view the trajectories!\n", + "from IPython.display import Video\n", + "Video(video_path, embed=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eea881c7", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/notebooks/run_policy.ipynb b/examples/notebooks/run_policy.ipynb new file mode 100644 index 00000000..0e9f1ca2 --- /dev/null +++ b/examples/notebooks/run_policy.ipynb @@ -0,0 +1,287 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b2b15f2e", + "metadata": {}, + "source": [ + "# Run a trained policy\n", + "\n", + "This notebook will provide examples on how to run a trained policy and visualize the rollout." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "000a4ff3", + "metadata": {}, + "outputs": [], + "source": [ + "import argparse\n", + "import json\n", + "import h5py\n", + "import imageio\n", + "import numpy as np\n", + "import os\n", + "from copy import deepcopy\n", + "\n", + "import torch\n", + "\n", + "import robomimic\n", + "import robomimic.utils.file_utils as FileUtils\n", + "import robomimic.utils.torch_utils as TorchUtils\n", + "import robomimic.utils.tensor_utils as TensorUtils\n", + "import robomimic.utils.obs_utils as ObsUtils\n", + "from robomimic.envs.env_base import EnvBase\n", + "from robomimic.algo import RolloutPolicy\n", + "\n", + "import urllib.request\n" + ] + }, + { + "cell_type": "markdown", + "id": "47427159", + "metadata": {}, + "source": [ + "### Download policy checkpoint\n", + "First, let's try downloading a pretrained model from our model zoo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dfdfe5e", + "metadata": {}, + "outputs": [], + "source": [ + "# Get pretrained checkpooint from the model zoo\n", + "\n", + "ckpt_path = \"lift_ph_low_dim_epoch_1000_succ_100.pth\"\n", + "# Lift (Proficient Human)\n", + "urllib.request.urlretrieve(\n", + " \"http://downloads.cs.stanford.edu/downloads/rt_benchmark/model_zoo/lift/bc_rnn/lift_ph_low_dim_epoch_1000_succ_100.pth\",\n", + " filename=ckpt_path\n", + ")\n", + "\n", + "assert os.path.exists(ckpt_path)" + ] + }, + { + "cell_type": "markdown", + "id": "2c2c25c6", + "metadata": {}, + "source": [ + "### Loading trained policy\n", + "We have a convenient function called `policy_from_checkpoint` that takes care of building the correct model from the checkpoint and load the trained weights. Of course you could also load the checkpoint manually." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf84aed6", + "metadata": {}, + "outputs": [], + "source": [ + "device = TorchUtils.get_torch_device(try_to_use_cuda=True)\n", + "\n", + "# restore policy\n", + "policy, ckpt_dict = FileUtils.policy_from_checkpoint(ckpt_path=ckpt_path, device=device, verbose=True)" + ] + }, + { + "cell_type": "markdown", + "id": "2872a3f0", + "metadata": {}, + "source": [ + "### Creating rollout envionment\n", + "The policy checkpoint also contains sufficient information to recreate the environment that it's trained with. Again, you may manually create the environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12d00c2a", + "metadata": {}, + "outputs": [], + "source": [ + "# create environment from saved checkpoint\n", + "env, _ = FileUtils.env_from_checkpoint(\n", + " ckpt_dict=ckpt_dict, \n", + " render=False, # we won't do on-screen rendering in the notebook\n", + " render_offscreen=True, # render to RGB images for video\n", + " verbose=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c7ac0e9f", + "metadata": {}, + "source": [ + "### Define the rollout loop\n", + "Now let's define the main rollout loop. The loop runs the policy to a target `horizon` and optionally writes the rollout to a video." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3dd1375e", + "metadata": {}, + "outputs": [], + "source": [ + "def rollout(policy, env, horizon, render=False, video_writer=None, video_skip=5, camera_names=None):\n", + " \"\"\"\n", + " Helper function to carry out rollouts. Supports on-screen rendering, off-screen rendering to a video, \n", + " and returns the rollout trajectory.\n", + " Args:\n", + " policy (instance of RolloutPolicy): policy loaded from a checkpoint\n", + " env (instance of EnvBase): env loaded from a checkpoint or demonstration metadata\n", + " horizon (int): maximum horizon for the rollout\n", + " render (bool): whether to render rollout on-screen\n", + " video_writer (imageio writer): if provided, use to write rollout to video\n", + " video_skip (int): how often to write video frames\n", + " camera_names (list): determines which camera(s) are used for rendering. Pass more than\n", + " one to output a video with multiple camera views concatenated horizontally.\n", + " Returns:\n", + " stats (dict): some statistics for the rollout - such as return, horizon, and task success\n", + " \"\"\"\n", + " assert isinstance(env, EnvBase)\n", + " assert isinstance(policy, RolloutPolicy)\n", + " assert not (render and (video_writer is not None))\n", + "\n", + " policy.start_episode()\n", + " obs = env.reset()\n", + " state_dict = env.get_state()\n", + "\n", + " # hack that is necessary for robosuite tasks for deterministic action playback\n", + " obs = env.reset_to(state_dict)\n", + "\n", + " results = {}\n", + " video_count = 0 # video frame counter\n", + " total_reward = 0.\n", + " try:\n", + " for step_i in range(horizon):\n", + "\n", + " # get action from policy\n", + " act = policy(ob=obs)\n", + "\n", + " # play action\n", + " next_obs, r, done, _ = env.step(act)\n", + "\n", + " # compute reward\n", + " total_reward += r\n", + " success = env.is_success()[\"task\"]\n", + "\n", + " # visualization\n", + " if render:\n", + " env.render(mode=\"human\", camera_name=camera_names[0])\n", + " if video_writer is not None:\n", + " if video_count % video_skip == 0:\n", + " video_img = []\n", + " for cam_name in camera_names:\n", + " video_img.append(env.render(mode=\"rgb_array\", height=512, width=512, camera_name=cam_name))\n", + " video_img = np.concatenate(video_img, axis=1) # concatenate horizontally\n", + " video_writer.append_data(video_img)\n", + " video_count += 1\n", + "\n", + " # break if done or if success\n", + " if done or success:\n", + " break\n", + "\n", + " # update for next iter\n", + " obs = deepcopy(next_obs)\n", + " state_dict = env.get_state()\n", + "\n", + " except env.rollout_exceptions as e:\n", + " print(\"WARNING: got rollout exception {}\".format(e))\n", + "\n", + " stats = dict(Return=total_reward, Horizon=(step_i + 1), Success_Rate=float(success))\n", + "\n", + " return stats\n" + ] + }, + { + "cell_type": "markdown", + "id": "0b43d371", + "metadata": {}, + "source": [ + "### Run the policy\n", + "Now let's rollout the policy!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be6e1878", + "metadata": {}, + "outputs": [], + "source": [ + "rollout_horizon = 400\n", + "np.random.seed(0)\n", + "torch.manual_seed(0)\n", + "video_path = \"rollout.mp4\"\n", + "video_writer = imageio.get_writer(video_path, fps=20)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fa67efe", + "metadata": {}, + "outputs": [], + "source": [ + "stats = rollout(\n", + " policy=policy, \n", + " env=env, \n", + " horizon=rollout_horizon, \n", + " render=False, \n", + " video_writer=video_writer, \n", + " video_skip=5, \n", + " camera_names=[\"agentview\"]\n", + ")\n", + "print(stats)\n", + "video_writer.close()" + ] + }, + { + "cell_type": "markdown", + "id": "fe79bc19", + "metadata": {}, + "source": [ + "### Visualize the rollout" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97472b37", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Video\n", + "Video(video_path)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/requirements-docs.txt b/requirements-docs.txt index 92719acb..4b0538e2 100644 --- a/requirements-docs.txt +++ b/requirements-docs.txt @@ -3,5 +3,6 @@ pygments==2.4.1 sphinx sphinx_rtd_theme sphinx_markdown_tables +sphinx_book_theme recommonmark nbsphinx diff --git a/robomimic/__init__.py b/robomimic/__init__.py index f5d4c357..5e305929 100644 --- a/robomimic/__init__.py +++ b/robomimic/__init__.py @@ -1,4 +1,4 @@ -__version__ = "0.2.0" +__version__ = "0.2.1" # stores released dataset links and rollout horizons in global dictionary. diff --git a/robomimic/utils/file_utils.py b/robomimic/utils/file_utils.py index 78cc5076..1312bf24 100644 --- a/robomimic/utils/file_utils.py +++ b/robomimic/utils/file_utils.py @@ -199,6 +199,79 @@ def algo_name_from_checkpoint(ckpt_path=None, ckpt_dict=None): return algo_name, ckpt_dict +def update_config(cfg): + """ + Updates the config for backwards-compatibility if it uses outdated configurations. + + See https://github.com/ARISE-Initiative/robomimic/releases/tag/v0.2.0 for more info. + + Args: + cfg (dict): Raw dictionary of config values + """ + # Check if image modality is defined -- this means we're using an outdated config + modalities = cfg["observation"]["modalities"] + + found_img = False + for modality_group in ("obs", "subgoal", "goal"): + if modality_group in modalities: + img_modality = modalities[modality_group].pop("image", None) + if img_modality is not None: + found_img = True + cfg["observation"]["modalities"][modality_group]["rgb"] = img_modality + + if found_img: + # Also need to map encoder kwargs correctly + old_encoder_cfg = cfg["observation"].pop("encoder") + + # Create new encoder entry for RGB + rgb_encoder_cfg = { + "core_class": "VisualCore", + "core_kwargs": { + "backbone_kwargs": dict(), + "pool_kwargs": dict(), + }, + "obs_randomizer_class": None, + "obs_randomizer_kwargs": dict(), + } + + if "visual_feature_dimension" in old_encoder_cfg: + rgb_encoder_cfg["core_kwargs"]["feature_dimension"] = old_encoder_cfg["visual_feature_dimension"] + + if "visual_core" in old_encoder_cfg: + rgb_encoder_cfg["core_kwargs"]["backbone_class"] = old_encoder_cfg["visual_core"] + + for kwarg in ("pretrained", "input_coord_conv"): + if "visual_core_kwargs" in old_encoder_cfg and kwarg in old_encoder_cfg["visual_core_kwargs"]: + rgb_encoder_cfg["core_kwargs"]["backbone_kwargs"][kwarg] = old_encoder_cfg["visual_core_kwargs"][kwarg] + + # Optionally add pooling info too + if old_encoder_cfg.get("use_spatial_softmax", True): + rgb_encoder_cfg["core_kwargs"]["pool_class"] = "SpatialSoftmax" + + for kwarg in ("num_kp", "learnable_temperature", "temperature", "noise_std"): + if "spatial_softmax_kwargs" in old_encoder_cfg and kwarg in old_encoder_cfg["spatial_softmax_kwargs"]: + rgb_encoder_cfg["core_kwargs"]["pool_kwargs"][kwarg] = old_encoder_cfg["spatial_softmax_kwargs"][kwarg] + + # Update obs randomizer as well + for kwarg in ("obs_randomizer_class", "obs_randomizer_kwargs"): + if kwarg in old_encoder_cfg: + rgb_encoder_cfg[kwarg] = old_encoder_cfg[kwarg] + + # Store rgb config + cfg["observation"]["encoder"] = {"rgb": rgb_encoder_cfg} + + # Also add defaults for low dim + cfg["observation"]["encoder"]["low_dim"] = { + "core_class": None, + "core_kwargs": { + "backbone_kwargs": dict(), + "pool_kwargs": dict(), + }, + "obs_randomizer_class": None, + "obs_randomizer_kwargs": dict(), + } + + def config_from_checkpoint(algo_name=None, ckpt_path=None, ckpt_dict=None, verbose=False): """ Helper function to restore config from a checkpoint file or loaded model dictionary. @@ -222,13 +295,15 @@ def config_from_checkpoint(algo_name=None, ckpt_path=None, ckpt_dict=None, verbo if algo_name is None: algo_name, _ = algo_name_from_checkpoint(ckpt_dict=ckpt_dict) + # restore config from loaded model dictionary + config_dict = json.loads(ckpt_dict['config']) + update_config(cfg=config_dict) + if verbose: print("============= Loaded Config =============") - print(ckpt_dict['config']) + print(json.dumps(config_dict, indent=4)) - # restore config from loaded model dictionary - config_json = ckpt_dict['config'] - config = config_factory(algo_name, dic=json.loads(config_json)) + config = config_factory(algo_name, dic=config_dict) # lock config to prevent further modifications and ensure missing keys raise errors config.lock() @@ -267,8 +342,7 @@ def policy_from_checkpoint(device=None, ckpt_path=None, ckpt_dict=None, verbose= # read config to set up metadata for observation modalities (e.g. detecting rgb observations) ObsUtils.initialize_obs_utils_with_config(config) - # env meta from model dict to get info needed to create model - env_meta = ckpt_dict["env_metadata"] + # shape meta from model dict to get info needed to create model shape_meta = ckpt_dict["shape_metadata"] # maybe restore observation normalization stats