Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 1.13 KB

README.md

File metadata and controls

11 lines (7 loc) · 1.13 KB

TD3BC++ Implementation

Welcome to the implementation of the TD3BC++ algorithm, as proposed in the paper "Robust Offline Reinforcement Learning from Contaminated Demonstrations", available on arXiv.

Details

  • Observation and motivation. We observed performance degradation of many state-of-the-art offline RL algorithms on heterogeneous datasets, which contain both expert and non-expert behaviors. This observation motivated our work.
  • Our methods. We identified two key issues in policy constraint offline RL: (1) risky policy improvement on non-expert states that makes use of unstable Q-gradients and (2) harmful policy constraint towards non-expert dataset actions.
  • Implementation. We proposed two solutions for each issue: (1) conservative policy improvement to reduce unstable Q-function gradients with respect to actions and (2) closeness constraint relaxation to loosen the constraint on non-expert actions. These solutions are simple but effective. (For the impressive results, see our paper)
  • Rerun. To reproduce the results presented in the paper, please run the bash script run.sh.