add tianshou-like JAX+PPO+Mujoco #355

quangr · 2023-01-31T11:35:10Z

Description

Add tianshou-like JAX+PPO+Mujoco code, which is tested in Hopper-v3 and HalfCheetah-v3.

11 seed test

Hopper-v3 (Tianshou 1M:2609.3+-700.8 ; 3M:3127.7+-413.0)
my result:

HalfCheetah-v3 (Tianshou 1M:5783.9+-1244.0 ; 3M:7337.4+-1508.2)
my result:

This implementation uses a customized EnvWrapper class to wrap environment. Different from traditional Gym-type wrap which has step and reset method. EnvWrapper requires three methods recv,send and reset, these methods need to be pure functions in order to be transformed in jax. The recv method will modify what env received after an action step, and the send method will modify the action send to env.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

[x ] I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2023-01-31T11:35:16Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Feb 5, 2023 at 5:55AM (UTC)

vwxyzjn

Hi @quangr, thanks for this awesome contribution! Being able to use JAX+PPO+MuJoCo+EnvPool will be a game-changer for a lot of folks! This PR will also make #217 not necessary.

Some comments and thoughts:

Would you mind sharing your wandb username so I can add you to the openrlbenchmark entity? It would be great if you could contribute tracked experiments there, and we can use our CLI utility (https://github.com/openrlbenchmark/openrlbenchmark) to plot charts.

~~Could you share your huggingface username and help add saved models?~~

On a second thought there might not be a way to render mujoco images with Envpool. Don't worry about this yet.

We recently added the huggingface integration as follows

cleanrl/cleanrl/ppo_atari_envpool_xla_jax_scan.py

Lines 477 to 511 in d0d6bae

if args.save_model:

model_path = f"runs/{run_name}/{args.exp_name}.cleanrl_model"

with open(model_path, "wb") as f:

f.write(

flax.serialization.to_bytes(

[

vars(args),

[

agent_state.params.network_params,

agent_state.params.actor_params,

agent_state.params.critic_params,

],

]

)

)

print(f"model saved to {model_path}")

from cleanrl_utils.evals.ppo_envpool_jax_eval import evaluate

episodic_returns = evaluate(

model_path,

make_env,

args.env_id,

eval_episodes=10,

run_name=f"{run_name}-eval",

Model=(Network, Actor, Critic),

)

for idx, episodic_return in enumerate(episodic_returns):

writer.add_scalar("eval/episodic_return", episodic_return, idx)

if args.upload_model:

from cleanrl_utils.huggingface import push_to_hub

repo_name = f"{args.env_id}-{args.exp_name}-seed{args.seed}"

repo_id = f"{args.hf_entity}/{repo_name}" if args.hf_entity else repo_name

push_to_hub(args, episodic_returns, repo_id, "PPO", f"runs/{run_name}", f"videos/{run_name}-eval")

~~You can load the trained model by running python -m cleanrl_utils.enjoy --exp-name ppo_atari_envpool_xla_jax_scan --env-id Breakout-v5~~