-
-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a OneOf space for exclusive unions #812
Conversation
Spending some time thinking about this new space, I'm not sure about adding it. Therefore, I would currently be in favour of closing this PR and not merging but if I have missed something, please say |
I don't think that's true? By design, it includes the index of the space. So in a space
There are two things here. Is it currently rarely in use? Yes, because it doesn't exist. People will naturally gravitate to things that exist and are supported. If we add it, will it still be rarely in use? Probably to some extent, same as the other "weird" spaces like graph or text. But I think it's likely that it will see some interesting use - certain video games (stacraft), desktop environments, maybe even some industrial simulations. It's basically the generic version of "You can do A or B", with A or B having some internal structure. Finally, it really makes sense in terms of theory. Ask any programming language theory nerd, they'll tell you that algebraic datatypes are the best thing ever. Every time you write You could get more creative and use it for observations - resizeable observations, custom in-environment error handling, active perception (do you want to read some in-game text, or look around?) My point with research tooling is always that it should be as flexible as reasonably possible. I don't think adding this space has significant downsides or that much maintenance overhead, but it adds a ton of flexibility, and allows expressing things that are very natural to think about. A reasonable contrib-y middle ground might be a new |
Hello, I am a member of The University of Georgia's THINC Lab, which focuses on RL, IRL, and Robotics. (see article, AI Mag Article ) discusses Open Task environments. These are environments where groups of states (and associated actions) change dynamically in the environment. An example we reference is a RideShare domain Where each passenger constitutes a task with actions associated to each. A use case for the OneOf space could be to handle making non-task-specific actions when handling the dynamic nature of an open-task environment. A good example of this type of global action is NOOP.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the code, I understand much more what is happening.
With a couple of changes, I think we can get this merged quickly.
The primary change I made was to flatten
such that the fill value is the flat_sample
's first value removing the need for nan
support. As a result, nullible
can be removed from Box
.
I added comments to the rest of the PR of proposed changes (along with removing nullible
from Box
)
Finally, you need to add the following to gymnasium/wrappers/utils
@create_zero_array.register(OneOf)
def _create_one_of_zero_array(space: OneOf):
return 0, create_zero_array(space.spaces[0])
This method draws independent samples from the subspaces. | ||
|
||
Args: | ||
mask: An optional tuple of optional masks for each of the subspace's samples, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support masking of the subspace index? This adds more complexity that might not be needed
@@ -271,7 +271,7 @@ def _flatten_oneof(space: OneOf, x: tuple[int, Any]) -> NDArray[Any]: | |||
max_flatdim = flatdim(space) - 1 # Don't include the index | |||
if flat_sample.size < max_flatdim: | |||
padding = np.full( | |||
max_flatdim - flat_sample.size, np.nan, dtype=flat_sample.dtype | |||
max_flatdim - flat_sample.size, flat_sample[0], dtype=flat_sample.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now wondering if it might be better to use np.empty
instead if we're not using a dedicated placeholder value.
Either way, the array will be filled with some throwaway data, but with empty we're not pretending that this data has any actual meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a quick check and this fails test_flat_space_contains_flat_points
as the flattened version might not be contained within the flattened space.
But this is just a weird artifact of flatten
rather than anything actually incorrect with the approach
See #594
This is more of an exclusive tuple, but effectively the same thing.
I opted for the name OneOf as opposed to Union, sacrificing some math-notation-niceness, but making it more immediately obvious what it is. Plus I think GRPC uses
oneof
similarly.Some design choices might be controversial:
OneOf(Discrete(5), Discrete(5))
OneOf(Discrete(n), Discrete(m))
need special treatment whenn==m
, and nobody likes thatmax_size + 1
, wheremax_size
is the maximum size of any subspace. The 1 is the index, and it comes at the beginning of the representationnan
s if they don't use up all the spaceBox
can now be nullable, which is used when checking belonging to a spaceOverall I think this is the best combination of questionable choices, but I'm not super attached to them if someone has better options.
I also think this is useful to actually include for two reasons: