Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for different vector autoreset modes #1227

Merged

Conversation

pseudo-rnd-thoughts
Copy link
Member

Description

With the change in Gymnasium v1.0, some users have requested support for the other vector autoreset APIs / modes:

  1. Next step - This is the default API. When a sub-environment terminates or truncates, its reset function is called in the next step.
  2. Same step - This is <1.0 API, on a sub-environment terminating or truncating; within this step, the sub-environment will call reset. The resulting terminating / truncating observation is stored in info["final_obs"] with the reset observation passed back to the step's obs.
  3. Partial reset / disabled autoreset - Some users wish to disable autoreset and partially reset the environment themselves. Though unadvised, partial resets can be used with the prior two APIs. This API will not autoreset on a termination or truncation signal. However, it raises an error if a step function is called on a sub-environment that has terminated or truncated without being partially reset.

We have added support to the built-in SyncVectorEnv and AsyncVectorEnv using the autoreset_mode argument, which takes a str or Enum of AutoresetMode with the metadata["autoreset_mode"] specifying the implemented API.

For custom vector environments, we highly recommend adding this metadata tag to help users and wrappers know the implemented API, as these environments can have any of the autoreset modes implemented.

Importantly, different built-in wrappers have different levels of compatibility; see the table below.

Wrapper name Next step autoreset Same Step autorest Partial reset
VectorObservationWrapper Yes No Yes
TransformObservation Yes No Yes
NormalizeObservation Yes No No
VectorizeTransformObservation* Yes Yes Yes
RecordEpisodeStatistics Yes Yes Yes

* all inherited wrappers from VectorizeTransformObservation are compatible (FilterObservation, FlattenObservation, GrayscaleObservation, ResizeObservation, ReshapeObservation, DtypeObservation).

All other reward and action wrappers should be fully compatible.

Why are some wrappers limited?

  • For same-step autoreset, the final observation must also be transformed, such as for stateful wrappers like NormalizeObservation or wrappers that apply a batch-based transform such as TransformObservation. This is not possible to implement efficiently. Future PR could investigate adding this.
  • For partial resets (i.e., autoreset disabled), like the within-step autoreset, for stateful wrappers like NormalizeObservation, you would not wish to update the normalizer again for the non-final states. For simple Box space environments, it would be possible to add compatibility through filtering the observations, but for more complex spaces, like Dict, this is not efficiently possible.

@pseudo-rnd-thoughts
Copy link
Member Author

@vmoens Have you had a chance to look at this and see if this is compatible with TorchRL?

@vmoens
Copy link
Contributor

vmoens commented Nov 7, 2024

That looks good! So checking which behaviour is in place would require checking an auto_reset argument right?

To be precise, we don't think that auto reset is a bad idea but that auto reset within step isn't optimal: one should have one method for step, one for reset and another for step and maybe reset with, possibly, a different signature that returns additional info such as the reset observation if needed.

@pseudo-rnd-thoughts
Copy link
Member Author

pseudo-rnd-thoughts commented Nov 7, 2024

@vmoens We can't add a whole new function definitions for step, reset, etc for different autoreset api, but I have adapted the API to enable all three API described above.
Additionally, to help understand what autoreset API a vector environment utilises, they can use metadata["autoreset_mode"] so that users and training libraries can know this

@pseudo-rnd-thoughts
Copy link
Member Author

@vmoens Is this compatible with TorchRL? I'm planning on finishing up this PR soon so we can cut a release for you to use.

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really cool thanks!
Exactly what we need!

Cc @matteobettini

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 8a46c3a into Farama-Foundation:main Nov 28, 2024
13 checks passed
@EladSharony
Copy link

Awesome! We just need the fells at IssacLab to align to this as well.

[Bug Report] Final observation in step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants