Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run one Sonic episode and return behavior characterization and return #23

Closed
schrum2 opened this issue Jun 17, 2019 · 7 comments
Closed
Assignees

Comments

@schrum2
Copy link
Collaborator

schrum2 commented Jun 17, 2019

To help us complete #16 you need to complete this sub-issue. Rather than change the existing PPO code, it might make sense to copy it and modify the copy instead.

We need code that will use PPO and learn with it up until Sonic dies OR some time limit is reached (we may add other restrictions as well), but at that stopping point the method will return two things:

  1. The overall return (sum of rewards) or basically some form of objective fitness score, AND
  2. The behavior characterization for Novelty Search (this is the more important one). This should be an ordered list of all the (x,y) coordinates Sonic was located at through the course of evaluation.

Important thing to check:

  • If the time limit is set and Sonic lives for the whole evaluation, then is the length of the behavior characterization always the same?
schrum2 added a commit that referenced this issue Jun 20, 2019
#24 #23 @nazaruka I committed this to my own new branch dev_schrum, but you can feel free to checkout that branch and edit things, or merge this into your master branch. I didn't commit this to master since I though there might be uncommitted changes on your machine (don't do that in the future).

In any case, running this version of NSGAII will now show Sonic being controlled by a NEAT network. Evolution definitely doesn't work yet though. Basically, what this does is create a NEAT population, but then the network you see evaluated is just the same one over an over rather than distinct members of the population. Also, after all of the individuals are evaluated, the code crashes because the fitness objectives are still the ones that originally came with the NSGA-II code. Here is what you should do next if you think working with this would be useful (but you can mess with your own code instead of that seems more fruitful ... just do that in your own branch):

1. Add the behavior characterization tracking to this.
2. Make the evaluate method return the behavior characterization along with the fitness.

If you get that far, then we can move onto more interesting stuff.
@nazaruka
Copy link
Owner

We now have a proper PyTorch implementation running alongside NSGA-II, but the values and some other characteristics of its execution are still a bit wonky.

  • For starters, there's some strange requirement in storage.py that the number of mini-batches must be less than or equal to the number of processes. This requirement means that we can really only use one mini-batch if we intend to use just one process.
  • Additionally, we are still using commands that may not be relevant to our code, such as going through "genomes" of a NEAT "population." This point alone should warrant code speculation/cleanup en masse, for we are not sure exactly how much we need to get rid of.
  • Perhaps the most pressing issue is that Sonic does run and render (woo!), but there is a considerable pause every eight seconds.

@schrum2
Copy link
Collaborator Author

schrum2 commented Jun 26, 2019

Prioritize finding the cause of the pausing, but also start to gradually remove any unnecessary code. In particular, any leftover code associated with the NEAT networks and population should be gradually removed. Do this one small step at a time, with frequent commits along the way.

@nazaruka
Copy link
Owner

Found the culprit. Line 225:
rollouts.compute_returns(next_value, use_gae=False, gamma=0.99, gae_lambda=None, use_proper_time_limits=True)

The code pauses about every eight seconds because that is the amount of time it takes to complete 128 steps, which we set num_steps in the second loop to. Now, I set these values practically for the hell of it; I'm not confident just yet on what would be optimal for PPO. compute_returns is a method in storage.py that essentially generates a loop in the following manner:

if use_proper_time_limits:
   if use_gae: loop 1 (`gae` modified for `bad_masks`)
   else: loop 2 (`gae` modified for `bad_masks`)
else:
   if use_gae: loop 1
   else: loop 2

Running and rendering the original code also has it pause every 128 steps, which makes me inclined to think that it's not meant for rendering. Still, how can we optimize this?

@nazaruka
Copy link
Owner

So I've cleaned up a lot of code that has to do with NEAT but still kept the loop intact - will be addressing that soon after lunch. Running without rendering, I get this error:

Traceback (most recent call last):
  File "NSGAII.py", line 298, in <module>
    fitness, behavior_char = evaluate(envs,net,actor_critic)
  File "NSGAII.py", line 145, in evaluate
    ob = envs.reset()
  File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 257, in reset
    obs = self.venv.reset()
  File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 183, in reset
    obs = self.venv.reset()
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\common\vec_env\dummy_vec_env.py", line 60, in reset
    obs = self.envs[e].reset()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\gym\core.py", line 308, in reset
    observation = self.env.reset(**kwargs)
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 36, in reset
    self.reset_state()
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 46, in reset_state
    raise RuntimeError("Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)")
RuntimeError: Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)

I'm going to print within the main loop to see if we're calling another evaluate by any chance; if not, I'll probably have to wrap the environment.

@schrum2
Copy link
Collaborator Author

schrum2 commented Jun 26, 2019

Wrapping the environment seems wrong. I think that envs.reset() should only be called at the very start of evaluate ... anywhere else would be wrong, and this seemed to be working before.

@nazaruka
Copy link
Owner

Got behavior characterization and cumulative reward to work, but the episode seems to crash with that once more. Next step is to ensure that the code will run in succession with several episodes.

@schrum2
Copy link
Collaborator Author

schrum2 commented Jun 27, 2019

I'm going to declare this issue solved, but shift some of the unresolved issues to #24

@schrum2 schrum2 closed this as completed Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants