This repository has a simple to understand and use implementation of CPO in PyTorch. A dummy constraint function is included and can be adapted based on your needs.
- PyTorch (The code is tested on PyTorch 1.2.0.)
- OpenAI Gym.
- MuJoCo (mujoco-py)
- If working with a GPU, set OMP_NUM_THREADS to 1 using:
export OMP_NUM_THREADS=1
- Tensorboard integration to track learning.
- Best model is tracked and saved using the value and standard deviation of average reward.
- python algos/main.py --env-name CartPole-v1 --algo-name=CPO --exp-num=1 --exp-name=CPO/CartPole --save-intermediate-model=10 --gpu-index=0 --max-iter=500