This repository is based on the CORL library.
Set up and install the d4rl environments by following the instructions provided in the d4rl documentation until you can successfully run import d4rl
in your Python environment.
Clone the GitHub repository and install required packages:
git clone https://github.com/uiuc-focal-lab/ORL.git && cd ORL
pip install -r requirements/requirements_dev.txt
Initialize Wandb by running the following command inside the folder:
wandb init
Follow the prompts to create a new project or connect to an existing one. Make sure you have the necessary API key and project settings configured, and update the project
argument.
For more information on how to use Wandb, refer to the Wandb documentation.
Run the shell files. They will be written into the saved
folder.
. generate_pbrl_datasets.sh
. generate_pbrl_datasets_no_overlap.sh
Run the sample Python command. Make sure you have the necessary dependencies installed and the Python environment properly configured.
. example.sh
To run the full experiment and ablation study, use the following scripts:
main.sh
: Contains commands for the full experiment.abl.sh
: Contains commands for the ablation study.
Execute these scripts in your terminal:
. main.sh
. abl.sh
Training log of learning with different methods on different datasets: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward
Training log of learning with a method on datasets of different sizes
Comparison between the learning efficiency of ORL combined with different standard offline RL algorithms
Comparison between the cases where single or multiple preference labels are given to each pair of trajectories
Comparison between datasets with different settings of structured overlapping trajectories