[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu #495

realAsma · 2025-01-14T22:31:43Z

[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu

Description

cleanrl/ppo_atari_multigpu.py does not use any data parallel wrappers and performs gradient synchronization explicitly.
Since Pytorch distributed parallel wrappers have become more ubiquitous, we could use them instead and improve the code readability.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

[Minor update] Use Pytorch DDP in ppo_atari_multigpu

vercel · 2025-01-14T22:31:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 23, 2025 11:38pm

vwxyzjn · 2025-01-22T16:42:34Z

Hi @realAsma, thanks for the PR. I am a bit more inclined to keep things as is because it showcases the lower-level implementation of DDP, so maybe more informative to certain users. That said I like your implementation as well. Maybe you could add in the docs a link to this PR, saying you could have implemented multi-gpu this way?

realAsma · 2025-01-23T23:51:47Z

Hi @realAsma, thanks for the PR. I am a bit more inclined to keep things as is because it showcases the lower-level implementation of DDP, so maybe more informative to certain users. That said I like your implementation as well. Maybe you could add in the docs a link to this PR, saying you could have implemented multi-gpu this way?

@vwxyzjn I made a minor update to this PR to update the docs. Now I like the implementation in this PR. In the prior implementation, weight initialization across processes are handled by setting torch seed to be the same across process before model initialization. After model initialization, torch seeds are set back to the unique seed across processes. Additionally, There is that boiler-plate code for gradient synchronization.

The prior implementation is more educational overall. However, do we intend to teach distributed training concepts as well?
My thinking is that DDP example showcases the PPO with multi-GPU/multi-process workflow better by being more succinct.

I support whichever works for you :). I will create a different PR which adds a link to this once in the docs if you wish so. Thanks a lot for taking your time to review this PR. Let me know what you think.

vwxyzjn · 2025-01-28T14:09:16Z

I still like the existing implementation for being more educational, but I agree the DDP implementation is more standard. If you are motivated, we can also just add another file for it like ppo_atari_ddp.py or ppo_atari_multigpu_ddp.py, and you can run the benchmark experiments to compare with the existing implementation.

Alternative feel free to create a different PR adding the link to this one in the docs. Either way is fine with me and up to you.

[Minor update] Use Pytorch DDP in ppo_atari_multigpu

e757c8d

[Minor update] Use Pytorch DDP in ppo_atari_multigpu

vercel bot deployed to Preview January 14, 2025 22:32 View deployment

realAsma added 2 commits January 23, 2025 10:35

Removed the redundant weight syncronization via torch seed

3456c05

documentation update

57c0a1d

vercel bot deployed to Preview January 23, 2025 23:38 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu #495

[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu #495

realAsma commented Jan 14, 2025 •

edited

Loading

vercel bot commented Jan 14, 2025 •

edited

Loading

vwxyzjn commented Jan 22, 2025

realAsma commented Jan 23, 2025

vwxyzjn commented Jan 28, 2025

[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu #495

Are you sure you want to change the base?

[Non-performance-impacting update] Use Pytorch DDP in ppo_atari_multigpu #495

Conversation

realAsma commented Jan 14, 2025 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Jan 14, 2025 • edited Loading

vwxyzjn commented Jan 22, 2025

realAsma commented Jan 23, 2025

vwxyzjn commented Jan 28, 2025

realAsma commented Jan 14, 2025 •

edited

Loading

vercel bot commented Jan 14, 2025 •

edited

Loading