Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing tutorial for deep reinforcement learning implementation with pytorch from scratch #1104

Closed
wants to merge 43 commits into from

Conversation

hahas94
Copy link

@hahas94 hahas94 commented Jul 1, 2024

Description

This PR is for a tutorial I have been working for, namely the deep reinforcement learning implementation with pytorch from scratch. There are no changes in other code/files, hence this RP only adds a new file, along with images and gifs.

Fixes # (issue)

Type of change

Screenshots

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

hahas94 and others added 30 commits December 10, 2023 11:13
Merging latest commits with my added code.
Merging latest commits from main with my own commits.
Integrating latest commits with my own code.
Merging my changes with the latest development.
Merging my development with the latest commits from main branch.
Merging my changes with the latest developments from main branch.
Merging with the latest commits from upstream/main.
… simplifying the agent class where necessary.
fetching latest changes from upstream.
…or evaluating agent and visualizing it, and some more cleaning up and change of documentation.
Merging latest developments from main with my changes.
…how results are visualized. changed evaluation method to conform to proposed standard by Machado et al 2018. added code to create gifs of evaluation checkpoints. multiple changes in documentation and variable naming. added timing functionality so that runtime is tracked and printed. added results for multiple envs.
Merging my changes with the latest changes from main.
…aulting. Replaced replay memory with a list-based one instead of pre-initialized numpy arrays. Started to use FrameStack for every env. Also, in_features of linear layer is calculated automatically now.
Merging with the latest changes from main.
@RedTachyon
Copy link
Member

Was this tutorial discussed with anyone, on Discord or otherwise?

@hahas94
Copy link
Author

hahas94 commented Jul 2, 2024

Yes, see Issues Tutorials #28, link.

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the tutorial @hahas94, is this a new tutorial or has this been published before?

Having a quick look it is very impressive, I haven't checked the implementation

My initial reaction is a couple of the more advanced points shouldn't be included, i.e., the parallel_training
A quick formatting point is that for code then you need to use two backticks not one (otherwise its italicized)

I'll try to make a more complete review this week

@hahas94
Copy link
Author

hahas94 commented Jul 2, 2024

Thank you!

No it has not been published before, so it is new.

Yes please have a thorough look and let me know what changes to make or how to restructure it if necessary.

docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
nn.Linear(in_features=n_hidden_units, out_features=out_features),
)

def _output_size(self) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be computed, can't env.action_space.shape or env.observation_space.shape be used instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how it could be computed from only env.action_space.shape or env.observation_space.shape without going through the steps made by the Conv2d layer, that is kerner_size, stride padding etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't output_size == action_space.shape[0]??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are talking about the output of the network, then in C51 it is action_space.shape[0] * n_atoms.

However, note the two lines

in_features = self._output_size()
out_features = self._params.n_actions * self._params.n_atoms

where the method _output_size is used to compute the output size of the flatten layer. Maybe the choice of method name is not good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in_features = self._output_size()" I very confused by what I am reading here, maybe you need to rename the function _output_size to something else, or add some more comments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspected that. Changed the method name to:

def _fc_input_size(self) -> int:
        """Compute size of input to first linear layer."""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the case that:
in_features = env.observation_space.shape[0]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially true.

If the vision part of the network is not utilized, then indeed in_features = env.observation_space.shape[0], and this is exactly what is returned by _fc_input_size, since the convolutional sequential is empty, so the observation is returned as is.

But if the vision part is active, then it depends on the observation shape and convolutional operations.
Assuming that env.observation_space.shape == (4, 84, 84) and the current convolutional structure, then the size is computed manually as follows:
the output shape of the first conv2d layer is (32, 20, 20),
the output shape of the second conv2d layer is (64, 9, 9) and
the output shape of the third conv2d layer is (64, 7, 7)
which will produce 64 * 7 * 7 = 3136 weights at the flatten layer.

But we can't always assume that the env.observation_space.shape == (4, 84, 84) or that the network has this structure in general. So I prefer computing it using the current method.

docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
docs/tutorials/training_agents/deep_rl_tutorial.py Outdated Show resolved Hide resolved
HardyHasan94 added 2 commits July 9, 2024 16:35
# --- env related ---
env_name: str = ""
obs_shape: tuple = (1,) # shape of observation, ex. (1, 4), (84, 84)
n_actions: int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are obs_shape and n_actions hyperparameter? They are constants of the environments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only these two, but several other parameters are in this class that are not hyperparameters, however this way it is easier to pass these parameters to any function/class that need them.

nn.Linear(in_features=n_hidden_units, out_features=out_features),
)

def _output_size(self) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in_features = self._output_size()" I very confused by what I am reading here, maybe you need to rename the function _output_size to something else, or add some more comments

@hahas94
Copy link
Author

hahas94 commented Jul 25, 2024

@pseudo-rnd-thoughts, @Kallinteris-Andreas

just checking in on this PR. Is there anything more I need to address or any additional feedback you can provide?

@hahas94 hahas94 closed this Aug 3, 2024
@JDRanpariya
Copy link
Contributor

JDRanpariya commented Oct 4, 2024

Is there a reason this PR is not merged but closed? would love to help out with this, seems like most of the work is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants