Allow running of unmodified envs with original `done` signals #27

garymcintire · 2017-08-04T20:22:14Z

I try this and watch the movies

python -u rl_teacher/teach.py -p rl -e Humanoid-v1 -n base-rl -w 12

It always runs the full 1000 steps. Putting in a print statement in rollouts.py shows that the env.step never returns a 'done'

Is it supposed to be like this? If so, why?

nottombrown · 2017-08-05T04:43:32Z

Hey Gary, as in Deep RL from Human Preferences, we remove the done signals.

You can see the envs.py file for details.

I'd be interested in accepting PRs that make it easy to run the unmodified environments as well as the modified ones.

See the following issue:
#5

garymcintire · 2017-08-05T19:22:15Z

Thanks for clarifying

nottombrown · 2017-08-07T22:22:13Z

I'm leaving this open because it's a separate issue from #5

nottombrown · 2017-08-07T22:23:41Z

Ah, actually this is already an open issue. Closing in favor of #12

nottombrown changed the title ~~Humanoid does not seem to see done~~ Allow running of unmodified Mujoco environments with original done signals Aug 5, 2017

nottombrown changed the title ~~Allow running of unmodified Mujoco environments with original done signals~~ Allow running of unmodified envs with original done signals Aug 5, 2017

nottombrown closed this as completed Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow running of unmodified envs with original `done` signals #27

Allow running of unmodified envs with original `done` signals #27

garymcintire commented Aug 4, 2017

nottombrown commented Aug 5, 2017

garymcintire commented Aug 5, 2017

nottombrown commented Aug 7, 2017

nottombrown commented Aug 7, 2017

Allow running of unmodified envs with original done signals #27

Allow running of unmodified envs with original done signals #27

Comments

garymcintire commented Aug 4, 2017

nottombrown commented Aug 5, 2017

garymcintire commented Aug 5, 2017

nottombrown commented Aug 7, 2017

nottombrown commented Aug 7, 2017

Allow running of unmodified envs with original `done` signals #27

Allow running of unmodified envs with original `done` signals #27