-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch RL learns bad policy with default paramters #215
Comments
After training for 30 000 episodes on "straight_road", with the robot always starting in the ideal position it still does not learn to drive forwards and always turns straight off the map. Is something broken? |
I have also mostly seen this behavior. @bhairavmehta95 or @Velythyl might have more insight. |
I had the same behaviour when I started working on that repo. I noticed a few bugs in the code, and have fixed them. I was going to do a PR, but I turned my attention to imitation learning lately so I didn't finish it. I'll get on it tomorrow, I still have to clean up my code and commits but I should be able to open the PR either tomorrow or the day after (I'll have to train it to make sure everything works, and that takes a lot of time). With the fixes, the car converges to either turning in a circle or going straight (it has a really hard time with curves). I didn't test it with a really long training time because of hardware constraints though, so that might be it. |
It's still training, but just to be sure I've fixed the faulty behaviour - does this seem better to you? Right now it's only at 60k timesteps, so I'm sure with more computing power it could become better. It's not just turning in circles anymore, though it still likes turning more than going straight (but again, I think with more training that quirk might disappear). It feels consistent with a "very early RL training that's still exploring the action space" model. If this seems good I will open the PR. |
Okay, I just saw it take two turns in a row at 90k timesteps, so I'll consider this fixed. I'll open the PR. |
That looks a lot more promising! In my experience the algorithm very quickly learns a max turning angle and really does not want to change. Im very interested in the fixes, thanks a lot for your help. Ill start training for the night, it should do at least 1m timesteps by tomorrow then. After experimenting a little further, this code seems to deviate from the original DDPG by taking as many gradient steps as there were timesteps in the episode, however that does seem a little excessive, and it did perform better after i reduced the number of gradient steps per episode. |
I don't have the rights to link issues or assign reviewers, but here is the PR: duckietown/challenge-aido_LF-baseline-RL-sim-pytorch#33 |
Hi! I am trying to train the ddpg model (with cnn) in the 'Duckietown-loop_pedestrians-v0' environment; however, it took roughly a day to get to 1434 step, which is far away from the 90k steps mentioned above. Is that natural? Currently it is running on a 1080ti. |
No, that is not natural. Are you sure that it's using your 1080ti and not your CPU? On a 2080ti, it gets to 1500 extremely quickly, less than an hour iirc |
Using |
I was about to launch a run, I can ping you with the time it took me to reach 1500 training steps for you to use as reference Additonnally, it could be a good idea to do something like |
Thanks! I was testing that line and I got cuda. Just to make sure, step is referring to "total_timesteps." |
Okay so I got to 1500 total timesteps in about 3 minutes, just so you know what to aim for Can you print me your duckietown gym version and the version of this repo using |
Can you explain how to get the duckietown gym version? I am somewhat new to gym. |
Ah wait I just realized we're in the Gym repo and not the RL repo. Try following this instead, and let me know if you run into any issues https://docs.duckietown.org/daffy/AIDO/out/embodied_rl.html This will walk you through installing the gym and uses an updated RL training script compared to what's in the gym-duckietown repo. |
Ah that make sense. I will update asap. Thanks! |
Problem fixed. Thanks @Velythyl |
Hi @Velythyl I have a question about the RL repo. In this code there is this line (114) |
Has anyone successfully trained a good policy with the current default parameters using the RL template? After training multiple times for over 1 million timesteps (60 000 episodes) the only policy that has been learned is to turn in a circle.
I tried using the SteeringToWheelVelWrapper to learn only heading with a fixed velocity of 0.5 but this did not fix the issue.
I also tried to limit the number of rotations allowed, resetting the gym after more than 4* 360 deg of angle difference. However none of these approaches work.
Should I train much longer or is something broken?
Looking at the reward over time for a run of over 4000 episodes, training more would appear not to result in anything useful.
(Validation reward is in blue)
The text was updated successfully, but these errors were encountered: