-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add Maze-like env #2
Comments
Hi @carlosluis! This is actually a very important suggestion and we plan to add procedural generation in some form sooner or later anyway. However, in our experience (and this is actually one of the reasons why this is still not there) procedural map generation is quite difficult to represent in an efficient and jit-compatible way (like recursive maze generation algos). There are some successful examples tho, for example in Jumanji or in minimax. We're unfortunately unlikely to be doing this anytime soon (it's in the plans for post v1.0, ~2-3 months), as we're currently busy working on getting XLand-MiniGrid to full paper and focused on meta-RL part (benchmarks), but we welcome any contributions, as grid randomization will definitely add new challenges to the meta-learning, as well as would allow to port procedural multi-room envs from original MiniGrid. Thus, its highly valuable addition. P.S. Maze exploration alone is not a meta-RL problem I think, since a new maze can be solved zero-shot without the need for adaptation, only generalization (like ProcGen). |
Maybe, it is worth trying to add any simple procedural generation algorithm to test the concept, maybe it would be not that hard. Jax could be paired with recursive algorithms (for tree-search, for example), and some simple example could be a way to start. Sounds promising 🤗 |
There's another problem at the moment. The agent can see through walls 🥲! Unfortunately, the naive porting of the FOV algorithm from MiniGrid slows things down too much (although it is available in the current version, but disabled). We haven't come up with a replacement for it yet, although we've tried different things (like simple ray casting). Without it I think maze will be easy enough to solve. We are open to any suggestions/help on this! For now we just reduce FOV size in most cases to make it a bit harder. |
Thank you all for having a look at this so quickly! I understand the challenges of jitting procedural generation algos, but why not start simple and take maze-generation outside of the jitting? Basically pre-generate a bunch of mazes on initialization and then sample from this list whenever the meta-RL algorithm asks for a new task? Maybe I'm being naive here and missing a key detail of why this wouldn't work. @Howuhh re: why maze exploration may or may not be a good benchmark for meta-RL Happy to hear your thoughts and arguments here though, I think it's an interesting discussion without a clear right/wrong answer. |
There are actually two reasons why I didn't already done this: the first is the inconvenience of having to store and download the maps separately in addition to the benchmarks, and the second is that a million maps in unit8 can start to take up a lot of memory on the GPU (height x width x 2 x 8bits x 1M ~ at least 0.5GB). This is actually quite a lot, as GPU memory is highly valuable. We can store them on CPU tho, but additional FPS benchmarks is needed for this case, maybe overhead is low.. But it's probably the only way. I'll see if I can get it done in time besides the main roadmap. |
I see, that makes it inconvenient, I agree! Also an appropriate sample size would depend on the size of the maze. Maybe 1M maps is overkill for 10x10 mazes, but insufficient for 100x100 mazes. Hard to tell a priori what would be a good value. Although I believe you can get a lot of signal regarding the effectiveness of meta-RL exploration with relatively small mazes |
Hi!
Awesome job on the repo!
Feel free to ignore this request if it's not part of your roadmap. It's more of a suggestion to have other type of exploration tasks.
There's partial code on Farama-Foundation/Minigrid#317 to generate feasible mazes (with a unique direct path to the goal, I believe) based off mini-grid envs. Taking that code I was able to generate envs such as these:
I thought it might do for an interesting meta-RL exploration benchmark, i.e., can your algorithm learn to exhaustively explore the maze until it finds the goal? In principle it might not be that much different than exploring in an open-space grid, but who knows! Maybe the more constrained state-space might even accelerate (or slow down) training progress.
Cheers!
The text was updated successfully, but these errors were encountered: