Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch RL learns bad policy with default paramters #215

Open
thomas-w-nl opened this issue Jun 7, 2020 · 19 comments
Open

Pytorch RL learns bad policy with default paramters #215

thomas-w-nl opened this issue Jun 7, 2020 · 19 comments

Comments

@thomas-w-nl
Copy link

thomas-w-nl commented Jun 7, 2020

Has anyone successfully trained a good policy with the current default parameters using the RL template? After training multiple times for over 1 million timesteps (60 000 episodes) the only policy that has been learned is to turn in a circle.
I tried using the SteeringToWheelVelWrapper to learn only heading with a fixed velocity of 0.5 but this did not fix the issue.
I also tried to limit the number of rotations allowed, resetting the gym after more than 4* 360 deg of angle difference. However none of these approaches work.
Should I train much longer or is something broken?

Looking at the reward over time for a run of over 4000 episodes, training more would appear not to result in anything useful.
(Validation reward is in blue)
image

@thomas-w-nl
Copy link
Author

After training for 30 000 episodes on "straight_road", with the robot always starting in the ideal position it still does not learn to drive forwards and always turns straight off the map. Is something broken?

@liampaull
Copy link
Member

I have also mostly seen this behavior. @bhairavmehta95 or @Velythyl might have more insight.

@Velythyl
Copy link
Collaborator

I had the same behaviour when I started working on that repo.

I noticed a few bugs in the code, and have fixed them. I was going to do a PR, but I turned my attention to imitation learning lately so I didn't finish it.

I'll get on it tomorrow, I still have to clean up my code and commits but I should be able to open the PR either tomorrow or the day after (I'll have to train it to make sure everything works, and that takes a lot of time).

With the fixes, the car converges to either turning in a circle or going straight (it has a really hard time with curves). I didn't test it with a really long training time because of hardware constraints though, so that might be it.

@Velythyl
Copy link
Collaborator

Velythyl commented Jun 15, 2020

It's still training, but just to be sure I've fixed the faulty behaviour - does this seem better to you? Right now it's only at 60k timesteps, so I'm sure with more computing power it could become better. It's not just turning in circles anymore, though it still likes turning more than going straight (but again, I think with more training that quirk might disappear).

It feels consistent with a "very early RL training that's still exploring the action space" model.

fixedGif

If this seems good I will open the PR.

@Velythyl
Copy link
Collaborator

Okay, I just saw it take two turns in a row at 90k timesteps, so I'll consider this fixed. I'll open the PR.

@thomas-w-nl
Copy link
Author

thomas-w-nl commented Jun 15, 2020

That looks a lot more promising! In my experience the algorithm very quickly learns a max turning angle and really does not want to change. Im very interested in the fixes, thanks a lot for your help. Ill start training for the night, it should do at least 1m timesteps by tomorrow then.

After experimenting a little further, this code seems to deviate from the original DDPG by taking as many gradient steps as there were timesteps in the episode, however that does seem a little excessive, and it did perform better after i reduced the number of gradient steps per episode.

@Velythyl
Copy link
Collaborator

I don't have the rights to link issues or assign reviewers, but here is the PR: duckietown/challenge-aido_LF-baseline-RL-sim-pytorch#33

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

Hi! I am trying to train the ddpg model (with cnn) in the 'Duckietown-loop_pedestrians-v0' environment; however, it took roughly a day to get to 1434 step, which is far away from the 90k steps mentioned above. Is that natural? Currently it is running on a 1080ti.

@Velythyl
Copy link
Collaborator

No, that is not natural. Are you sure that it's using your 1080ti and not your CPU? On a 2080ti, it gets to 1500 extremely quickly, less than an hour iirc

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

Using nvidia-smi I get that the program uses 1696MiB with batch size 32. I definitely think that it is on the small side.

@Velythyl
Copy link
Collaborator

Velythyl commented Nov 19, 2020

I was about to launch a run, I can ping you with the time it took me to reach 1500 training steps for you to use as reference

Additonnally, it could be a good idea to do something like print("cuda" if torch.cuda.is_available() else "cpu")

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

Thanks! I was testing that line and I got cuda. Just to make sure, step is referring to "total_timesteps."

@Velythyl
Copy link
Collaborator

Okay so I got to 1500 total timesteps in about 3 minutes, just so you know what to aim for

Can you print me your duckietown gym version and the version of this repo using git branch? Both should be daffy

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

git branch is 
* daffy
(END) 

Can you explain how to get the duckietown gym version? I am somewhat new to gym.

@Velythyl
Copy link
Collaborator

Ah wait I just realized we're in the Gym repo and not the RL repo. Try following this instead, and let me know if you run into any issues https://docs.duckietown.org/daffy/AIDO/out/embodied_rl.html

This will walk you through installing the gym and uses an updated RL training script compared to what's in the gym-duckietown repo.

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

Ah that make sense. I will update asap. Thanks!

@Max-Fu
Copy link

Max-Fu commented Nov 19, 2020

Problem fixed. Thanks @Velythyl

@Max-Fu
Copy link

Max-Fu commented Nov 21, 2020

@Velythyl I post a new issue (here). I think the key reason why it happens to be the case is that we might have a wrong reward function. I have restarted training with a temporary fix to this, and will let you know if this problem is fixed.

@SebaVGit
Copy link

SebaVGit commented Jan 4, 2021

Hi @Velythyl I have a question about the RL repo. In this code there is this line (114) if action[0] < 0.001: #Penalise slow actions: helps the bot to figure out that going straight > turning in circles reward = 0
I can not understand what is doing this. On the other hand when I try to run this RL with about 500000 steps, it just spun arround and do not go straigh anytime. What could be happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants