-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run one Sonic episode and return behavior characterization and return #23
Comments
#24 #23 @nazaruka I committed this to my own new branch dev_schrum, but you can feel free to checkout that branch and edit things, or merge this into your master branch. I didn't commit this to master since I though there might be uncommitted changes on your machine (don't do that in the future). In any case, running this version of NSGAII will now show Sonic being controlled by a NEAT network. Evolution definitely doesn't work yet though. Basically, what this does is create a NEAT population, but then the network you see evaluated is just the same one over an over rather than distinct members of the population. Also, after all of the individuals are evaluated, the code crashes because the fitness objectives are still the ones that originally came with the NSGA-II code. Here is what you should do next if you think working with this would be useful (but you can mess with your own code instead of that seems more fruitful ... just do that in your own branch): 1. Add the behavior characterization tracking to this. 2. Make the evaluate method return the behavior characterization along with the fitness. If you get that far, then we can move onto more interesting stuff.
We now have a proper PyTorch implementation running alongside NSGA-II, but the values and some other characteristics of its execution are still a bit wonky.
|
Prioritize finding the cause of the pausing, but also start to gradually remove any unnecessary code. In particular, any leftover code associated with the NEAT networks and population should be gradually removed. Do this one small step at a time, with frequent commits along the way. |
Found the culprit. Line 225: The code pauses about every eight seconds because that is the amount of time it takes to complete 128 steps, which we set
Running and rendering the original code also has it pause every 128 steps, which makes me inclined to think that it's not meant for rendering. Still, how can we optimize this? |
So I've cleaned up a lot of code that has to do with NEAT but still kept the loop intact - will be addressing that soon after lunch. Running without rendering, I get this error:
I'm going to print within the |
Wrapping the environment seems wrong. I think that envs.reset() should only be called at the very start of evaluate ... anywhere else would be wrong, and this seemed to be working before. |
Got behavior characterization and cumulative reward to work, but the episode seems to crash with that once more. Next step is to ensure that the code will run in succession with several episodes. |
I'm going to declare this issue solved, but shift some of the unresolved issues to #24 |
To help us complete #16 you need to complete this sub-issue. Rather than change the existing PPO code, it might make sense to copy it and modify the copy instead.
We need code that will use PPO and learn with it up until Sonic dies OR some time limit is reached (we may add other restrictions as well), but at that stopping point the method will return two things:
Important thing to check:
The text was updated successfully, but these errors were encountered: