Writing tutorial for `deep reinforcement learning implementation with pytorch from scratch` #1104

hahas94 · 2024-07-01T15:59:18Z

Description

This PR is for a tutorial I have been working for, namely the deep reinforcement learning implementation with pytorch from scratch. There are no changes in other code/files, hence this RP only adds a new file, along with images and gifs.

Fixes # (issue)

Type of change

Screenshots

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…tion.

…ctness.

Merging latest commits with my added code.

Merging latest commits from main with my own commits.

Integrating latest commits with my own code.

Merging my changes with the latest development.

Merging my development with the latest commits from main branch.

…on an environment successfully.

Merging my changes with the latest developments from main branch.

Merging with the latest commits from upstream/main.

Merging latest updates.

Merging with the latest updates.

…where necessary.

Merging with the latest updates

… simplifying the agent class where necessary.

fetching latest changes from upstream.

…or evaluating agent and visualizing it, and some more cleaning up and change of documentation.

Merging latest developments from main with my changes.

…how results are visualized. changed evaluation method to conform to proposed standard by Machado et al 2018. added code to create gifs of evaluation checkpoints. multiple changes in documentation and variable naming. added timing functionality so that runtime is tracked and printed. added results for multiple envs.

Merging my changes with the latest changes from main.

…aulting. Replaced replay memory with a list-based one instead of pre-initialized numpy arrays. Started to use FrameStack for every env. Also, in_features of linear layer is calculated automatically now.

Merging with the latest changes from main.

…tions.

…rk, and modifying the agent class accordingly.

… and create_env() function, as well as restructuring the code.

…roblems and spellings, etc.

Fetching the latest commits from main.

RedTachyon · 2024-07-01T22:53:29Z

Was this tutorial discussed with anyone, on Discord or otherwise?

hahas94 · 2024-07-02T07:57:52Z

Yes, see Issues Tutorials #28, link.

pseudo-rnd-thoughts · 2024-07-02T11:44:22Z

Thanks for the tutorial @hahas94, is this a new tutorial or has this been published before?

Having a quick look it is very impressive, I haven't checked the implementation

My initial reaction is a couple of the more advanced points shouldn't be included, i.e., the parallel_training
A quick formatting point is that for code then you need to use two backticks not one (otherwise its italicized)

I'll try to make a more complete review this week

hahas94 · 2024-07-02T12:09:55Z

Thank you!

No it has not been published before, so it is new.

Yes please have a thorough look and let me know what changes to make or how to restructure it if necessary.

docs/tutorials/training_agents/deep_rl_tutorial.py

Kallinteris-Andreas · 2024-07-05T18:36:51Z

docs/tutorials/training_agents/deep_rl_tutorial.py

+            nn.Linear(in_features=n_hidden_units, out_features=out_features),
+        )
+
+    def _output_size(self) -> int:


Why does this need to be computed, can't env.action_space.shape or env.observation_space.shape be used instead

I am not sure how it could be computed from only env.action_space.shape or env.observation_space.shape without going through the steps made by the Conv2d layer, that is kerner_size, stride padding etc.

doesn't output_size == action_space.shape[0]??

If we are talking about the output of the network, then in C51 it is action_space.shape[0] * n_atoms.

However, note the two lines

in_features = self._output_size() out_features = self._params.n_actions * self._params.n_atoms

where the method _output_size is used to compute the output size of the flatten layer. Maybe the choice of method name is not good?

"in_features = self._output_size()" I very confused by what I am reading here, maybe you need to rename the function _output_size to something else, or add some more comments

Suspected that. Changed the method name to:

def _fc_input_size(self) -> int: """Compute size of input to first linear layer."""

isn't the case that:
in_features = env.observation_space.shape[0]

Partially true.

If the vision part of the network is not utilized, then indeed in_features = env.observation_space.shape[0], and this is exactly what is returned by _fc_input_size, since the convolutional sequential is empty, so the observation is returned as is.

But if the vision part is active, then it depends on the observation shape and convolutional operations.
Assuming that env.observation_space.shape == (4, 84, 84) and the current convolutional structure, then the size is computed manually as follows:
the output shape of the first conv2d layer is (32, 20, 20),
the output shape of the second conv2d layer is (64, 9, 9) and
the output shape of the third conv2d layer is (64, 7, 7)
which will produce 64 * 7 * 7 = 3136 weights at the flatten layer.

But we can't always assume that the env.observation_space.shape == (4, 84, 84) or that the network has this structure in general. So I prefer computing it using the current method.

docs/tutorials/training_agents/deep_rl_tutorial.py

Fetching and merging latest commits from main.

Kallinteris-Andreas · 2024-07-12T08:39:44Z

docs/tutorials/training_agents/deep_rl_tutorial.py

+    # --- env related ---
+    env_name: str = ""
+    obs_shape: tuple = (1,)  # shape of observation, ex. (1, 4), (84, 84)
+    n_actions: int = 0


Why are obs_shape and n_actions hyperparameter? They are constants of the environments

Not only these two, but several other parameters are in this class that are not hyperparameters, however this way it is easier to pass these parameters to any function/class that need them.

Kallinteris-Andreas · 2024-07-13T09:00:03Z

docs/tutorials/training_agents/deep_rl_tutorial.py

+            nn.Linear(in_features=n_hidden_units, out_features=out_features),
+        )
+
+    def _output_size(self) -> int:


"in_features = self._output_size()" I very confused by what I am reading here, maybe you need to rename the function _output_size to something else, or add some more comments

docs/tutorials/training_agents/deep_rl_tutorial.py

Merge commits from main.

hahas94 · 2024-07-25T16:11:34Z

@pseudo-rnd-thoughts, @Kallinteris-Andreas

just checking in on this PR. Is there anything more I need to address or any additional feedback you can provide?

JDRanpariya · 2024-10-04T09:03:02Z

Is there a reason this PR is not merged but closed? would love to help out with this, seems like most of the work is done.

hahas94 and others added 30 commits December 10, 2023 11:13

Defining tutorial structure, as well as starting with code implmeneta…

de25e40

…tion.

Continuing with implementation of replay memory and testing its corre…

465cba0

…ctness.

Merge remote-tracking branch 'upstream/main' into tutorial

35e9a6c

Merging latest commits with my added code.

Adding the skeleton for the agent class.

eeaf42b

Merge remote-tracking branch 'upstream/main' into tutorial

3d3946c

Merging latest commits from main with my own commits.

Merge remote-tracking branch 'upstream/main' into tutorial

df387aa

Integrating latest commits with my own code.

Finishing the implementation of the agent class.

3f6d2a6

Merge remote-tracking branch 'upstream/main' into tutorial

9626439

Merging my changes with the latest development.

Starting with implementation of the training function.

e89873b

Merge remote-tracking branch 'upstream/main' into tutorial

c5b9735

Merging my development with the latest commits from main branch.

Finishing with implementing the train function and training an agent …

0c21314

…on an environment successfully.

Merge remote-tracking branch 'upstream/main' into tutorial

a03ed9a

Merging my changes with the latest developments from main branch.

Implemented functions for plotting performance of agents.

8edad3b

Merge remote-tracking branch 'upstream/main' into tutorial

76ea93a

Merging with the latest commits from upstream/main.

Performing some cleaning in documentation, typing etc.

3e78de8

Merge remote-tracking branch 'upstream/main' into tutorial

d1a7a07

Merging latest updates.

Merge remote-tracking branch 'upstream/main' into tutorial

5beef16

Merging with the latest updates.

Further implementing visualization functions, as well as cleaning up …

ecaf16e

…where necessary.

Merge remote-tracking branch 'upstream/main' into tutorial

1b63b5e

Merging with the latest updates

Further developing visualization function for new metrics, as well as…

2da7d07

… simplifying the agent class where necessary.

Merge remote-tracking branch 'upstream/main' into tutorial

dd7692e

fetching latest changes from upstream.

Slightly refactoring Memory and Agent classes, adding functionality f…

f515e2b

…or evaluating agent and visualizing it, and some more cleaning up and change of documentation.

Merge remote-tracking branch 'upstream/main' into tutorial

a0c7d20

Merging latest developments from main with my changes.

Merge remote-tracking branch 'upstream/main' into tutorial

4ebabce

Merging my changes with the latest changes from main.

Fixing one bug in the Agent class and making some small changes.

a3d6902

Made hyperparameters a dataclass instead of namedtuple for easier def…

257c2ea

…aulting. Replaced replay memory with a list-based one instead of pre-initialized numpy arrays. Started to use FrameStack for every env. Also, in_features of linear layer is calculated automatically now.

Merge remote-tracking branch 'upstream/main' into tutorial

b0dbdf6

Merging with the latest changes from main.

Correcting spelling in to_tensor() function and changing structure.

c6d6753

Updating create_env function to do what it is supposed to do only.

8bf71f3

HardyHasan94 added 9 commits June 30, 2024 16:04

Updating docs in parallel_training() and visualize_performance() func…

ef78103

…tions.

Replacing replay memory class with a simpler one.

846964a

Decoupling agent and network, creating a separate class for the netwo…

454a070

…rk, and modifying the agent class accordingly.

Updating the train() function to match the changes in the agent class…

eed44a6

… and create_env() function, as well as restructuring the code.

Updating docs for the random_agent_play() function.

fe4dc6f

Adding new results, as well as some other smaller changes.

b5c02c1

Restructuring the file, adding text, and fixing small documentation p…

bdeee3e

…roblems and spellings, etc.

Adding new results visualizations and updating old ones.

6fd69ce

Merge remote-tracking branch 'upstream/main' into tutorial

cdfaa46

Fetching the latest commits from main.

Kallinteris-Andreas reviewed Jul 5, 2024

View reviewed changes

HardyHasan94 added 2 commits July 9, 2024 16:35

Merge remote-tracking branch 'upstream/main' into tutorial

2026f7b

Fetching and merging latest commits from main.

Addressing review comments.

141f27e

Kallinteris-Andreas reviewed Jul 13, 2024

View reviewed changes

HardyHasan94 added 2 commits July 13, 2024 11:50

Merge remote-tracking branch 'upstream/main' into tutorial

94fc663

Merge commits from main.

Addressing review comment.

4d7c17f

hahas94 closed this Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing tutorial for `deep reinforcement learning implementation with pytorch from scratch` #1104

Writing tutorial for `deep reinforcement learning implementation with pytorch from scratch` #1104

hahas94 commented Jul 1, 2024

RedTachyon commented Jul 1, 2024

hahas94 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

hahas94 commented Jul 2, 2024

Kallinteris-Andreas Jul 5, 2024

hahas94 Jul 9, 2024

Kallinteris-Andreas Jul 11, 2024

hahas94 Jul 11, 2024

Kallinteris-Andreas Jul 13, 2024

hahas94 Jul 13, 2024

Kallinteris-Andreas Jul 13, 2024

hahas94 Jul 13, 2024

Kallinteris-Andreas Jul 12, 2024

hahas94 Jul 13, 2024

Kallinteris-Andreas Jul 13, 2024

hahas94 commented Jul 25, 2024

JDRanpariya commented Oct 4, 2024 •

edited

Loading

Writing tutorial for deep reinforcement learning implementation with pytorch from scratch #1104

Writing tutorial for deep reinforcement learning implementation with pytorch from scratch #1104

Conversation

hahas94 commented Jul 1, 2024

Description

Type of change

Screenshots

Checklist:

RedTachyon commented Jul 1, 2024

hahas94 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

hahas94 commented Jul 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hahas94 commented Jul 25, 2024

JDRanpariya commented Oct 4, 2024 • edited Loading

Writing tutorial for `deep reinforcement learning implementation with pytorch from scratch` #1104

Writing tutorial for `deep reinforcement learning implementation with pytorch from scratch` #1104

JDRanpariya commented Oct 4, 2024 •

edited

Loading