Pytorch RL learns bad policy with default paramters #215

thomas-w-nl · 2020-06-07T12:39:48Z

Has anyone successfully trained a good policy with the current default parameters using the RL template? After training multiple times for over 1 million timesteps (60 000 episodes) the only policy that has been learned is to turn in a circle.
I tried using the SteeringToWheelVelWrapper to learn only heading with a fixed velocity of 0.5 but this did not fix the issue.
I also tried to limit the number of rotations allowed, resetting the gym after more than 4* 360 deg of angle difference. However none of these approaches work.
Should I train much longer or is something broken?

Looking at the reward over time for a run of over 4000 episodes, training more would appear not to result in anything useful.
(Validation reward is in blue)

thomas-w-nl · 2020-06-07T23:55:38Z

After training for 30 000 episodes on "straight_road", with the robot always starting in the ideal position it still does not learn to drive forwards and always turns straight off the map. Is something broken?

liampaull · 2020-06-15T03:07:23Z

I have also mostly seen this behavior. @bhairavmehta95 or @Velythyl might have more insight.

Velythyl · 2020-06-15T04:55:54Z

I had the same behaviour when I started working on that repo.

I noticed a few bugs in the code, and have fixed them. I was going to do a PR, but I turned my attention to imitation learning lately so I didn't finish it.

I'll get on it tomorrow, I still have to clean up my code and commits but I should be able to open the PR either tomorrow or the day after (I'll have to train it to make sure everything works, and that takes a lot of time).

With the fixes, the car converges to either turning in a circle or going straight (it has a really hard time with curves). I didn't test it with a really long training time because of hardware constraints though, so that might be it.

Velythyl · 2020-06-15T19:49:30Z

It's still training, but just to be sure I've fixed the faulty behaviour - does this seem better to you? Right now it's only at 60k timesteps, so I'm sure with more computing power it could become better. It's not just turning in circles anymore, though it still likes turning more than going straight (but again, I think with more training that quirk might disappear).

It feels consistent with a "very early RL training that's still exploring the action space" model.

If this seems good I will open the PR.

Velythyl · 2020-06-15T20:34:19Z

Okay, I just saw it take two turns in a row at 90k timesteps, so I'll consider this fixed. I'll open the PR.

thomas-w-nl · 2020-06-15T20:40:07Z

That looks a lot more promising! In my experience the algorithm very quickly learns a max turning angle and really does not want to change. Im very interested in the fixes, thanks a lot for your help. Ill start training for the night, it should do at least 1m timesteps by tomorrow then.

After experimenting a little further, this code seems to deviate from the original DDPG by taking as many gradient steps as there were timesteps in the episode, however that does seem a little excessive, and it did perform better after i reduced the number of gradient steps per episode.

Velythyl · 2020-06-15T21:02:20Z

I don't have the rights to link issues or assign reviewers, but here is the PR: duckietown/challenge-aido_LF-baseline-RL-sim-pytorch#33

Max-Fu · 2020-11-19T04:18:57Z

Hi! I am trying to train the ddpg model (with cnn) in the 'Duckietown-loop_pedestrians-v0' environment; however, it took roughly a day to get to 1434 step, which is far away from the 90k steps mentioned above. Is that natural? Currently it is running on a 1080ti.

Velythyl · 2020-11-19T05:42:52Z

No, that is not natural. Are you sure that it's using your 1080ti and not your CPU? On a 2080ti, it gets to 1500 extremely quickly, less than an hour iirc

Max-Fu · 2020-11-19T06:01:27Z

Using nvidia-smi I get that the program uses 1696MiB with batch size 32. I definitely think that it is on the small side.

Velythyl · 2020-11-19T06:02:37Z

I was about to launch a run, I can ping you with the time it took me to reach 1500 training steps for you to use as reference

Additonnally, it could be a good idea to do something like print("cuda" if torch.cuda.is_available() else "cpu")

Max-Fu · 2020-11-19T06:07:14Z

Thanks! I was testing that line and I got cuda. Just to make sure, step is referring to "total_timesteps."

Velythyl · 2020-11-19T06:09:17Z

Okay so I got to 1500 total timesteps in about 3 minutes, just so you know what to aim for

Can you print me your duckietown gym version and the version of this repo using git branch? Both should be daffy

Max-Fu · 2020-11-19T06:12:33Z

git branch is 
* daffy
(END)

Can you explain how to get the duckietown gym version? I am somewhat new to gym.

Velythyl · 2020-11-19T06:15:20Z

Ah wait I just realized we're in the Gym repo and not the RL repo. Try following this instead, and let me know if you run into any issues https://docs.duckietown.org/daffy/AIDO/out/embodied_rl.html

This will walk you through installing the gym and uses an updated RL training script compared to what's in the gym-duckietown repo.

Max-Fu · 2020-11-19T06:16:23Z

Ah that make sense. I will update asap. Thanks!

Max-Fu · 2020-11-19T06:41:53Z

Problem fixed. Thanks @Velythyl

Max-Fu · 2020-11-21T00:54:56Z

@Velythyl I post a new issue (here). I think the key reason why it happens to be the case is that we might have a wrong reward function. I have restarted training with a temporary fix to this, and will let you know if this problem is fixed.

SebaVGit · 2021-01-04T00:53:24Z

Hi @Velythyl I have a question about the RL repo. In this code there is this line (114) if action[0] < 0.001: #Penalise slow actions: helps the bot to figure out that going straight > turning in circles reward = 0
I can not understand what is doing this. On the other hand when I try to run this RL with about 500000 steps, it just spun arround and do not go straigh anytime. What could be happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch RL learns bad policy with default paramters #215

Pytorch RL learns bad policy with default paramters #215

thomas-w-nl commented Jun 7, 2020 •

edited

Loading

thomas-w-nl commented Jun 7, 2020

liampaull commented Jun 15, 2020

Velythyl commented Jun 15, 2020

Velythyl commented Jun 15, 2020 •

edited

Loading

Velythyl commented Jun 15, 2020

thomas-w-nl commented Jun 15, 2020 •

edited

Loading

Velythyl commented Jun 15, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020 •

edited

Loading

Max-Fu commented Nov 19, 2020 •

edited

Loading

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Max-Fu commented Nov 21, 2020

SebaVGit commented Jan 4, 2021

Pytorch RL learns bad policy with default paramters #215

Pytorch RL learns bad policy with default paramters #215

Comments

thomas-w-nl commented Jun 7, 2020 • edited Loading

thomas-w-nl commented Jun 7, 2020

liampaull commented Jun 15, 2020

Velythyl commented Jun 15, 2020

Velythyl commented Jun 15, 2020 • edited Loading

Velythyl commented Jun 15, 2020

thomas-w-nl commented Jun 15, 2020 • edited Loading

Velythyl commented Jun 15, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020 • edited Loading

Max-Fu commented Nov 19, 2020 • edited Loading

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Velythyl commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Max-Fu commented Nov 19, 2020

Max-Fu commented Nov 21, 2020

SebaVGit commented Jan 4, 2021

thomas-w-nl commented Jun 7, 2020 •

edited

Loading

Velythyl commented Jun 15, 2020 •

edited

Loading

thomas-w-nl commented Jun 15, 2020 •

edited

Loading

Velythyl commented Nov 19, 2020 •

edited

Loading

Max-Fu commented Nov 19, 2020 •

edited

Loading