Add support for different vector autoreset modes #1227

pseudo-rnd-thoughts · 2024-10-22T12:49:24Z

Description

With the change in Gymnasium v1.0, some users have requested support for the other vector autoreset APIs / modes:

Next step - This is the default API. When a sub-environment terminates or truncates, its reset function is called in the next step.
Same step - This is <1.0 API, on a sub-environment terminating or truncating; within this step, the sub-environment will call reset. The resulting terminating / truncating observation is stored in info["final_obs"] with the reset observation passed back to the step's obs.
Partial reset / disabled autoreset - Some users wish to disable autoreset and partially reset the environment themselves. Though unadvised, partial resets can be used with the prior two APIs. This API will not autoreset on a termination or truncation signal. However, it raises an error if a step function is called on a sub-environment that has terminated or truncated without being partially reset.

We have added support to the built-in SyncVectorEnv and AsyncVectorEnv using the autoreset_mode argument, which takes a str or Enum of AutoresetMode with the metadata["autoreset_mode"] specifying the implemented API.

For custom vector environments, we highly recommend adding this metadata tag to help users and wrappers know the implemented API, as these environments can have any of the autoreset modes implemented.

Importantly, different built-in wrappers have different levels of compatibility; see the table below.

Wrapper name	Next step autoreset	Same Step autorest	Partial reset
VectorObservationWrapper	Yes	No	Yes
TransformObservation	Yes	No	Yes
NormalizeObservation	Yes	No	No
VectorizeTransformObservation*	Yes	Yes	Yes
RecordEpisodeStatistics	Yes	Yes	Yes

* all inherited wrappers from VectorizeTransformObservation are compatible (FilterObservation, FlattenObservation, GrayscaleObservation, ResizeObservation, ReshapeObservation, DtypeObservation).

All other reward and action wrappers should be fully compatible.

Why are some wrappers limited?

For same-step autoreset, the final observation must also be transformed, such as for stateful wrappers like NormalizeObservation or wrappers that apply a batch-based transform such as TransformObservation. This is not possible to implement efficiently. Future PR could investigate adding this.
For partial resets (i.e., autoreset disabled), like the within-step autoreset, for stateful wrappers like NormalizeObservation, you would not wish to update the normalizer again for the non-final states. For simple Box space environments, it would be possible to add compatibility through filtering the observations, but for more complex spaces, like Dict, this is not efficiently possible.

pseudo-rnd-thoughts · 2024-11-06T10:22:06Z

@vmoens Have you had a chance to look at this and see if this is compatible with TorchRL?

vmoens · 2024-11-07T14:44:45Z

That looks good! So checking which behaviour is in place would require checking an auto_reset argument right?

To be precise, we don't think that auto reset is a bad idea but that auto reset within step isn't optimal: one should have one method for step, one for reset and another for step and maybe reset with, possibly, a different signature that returns additional info such as the reset observation if needed.

pseudo-rnd-thoughts · 2024-11-07T15:22:47Z

@vmoens We can't add a whole new function definitions for step, reset, etc for different autoreset api, but I have adapted the API to enable all three API described above.
Additionally, to help understand what autoreset API a vector environment utilises, they can use metadata["autoreset_mode"] so that users and training libraries can know this

pseudo-rnd-thoughts · 2024-11-13T15:47:19Z

@vmoens Is this compatible with TorchRL? I'm planning on finishing up this PR soon so we can cut a release for you to use.

vmoens

That is really cool thanks!
Exactly what we need!

Cc @matteobettini

EladSharony · 2024-11-28T16:51:25Z

Awesome! We just need the fells at IssacLab to align to this as well.

[Bug Report] Final observation in step

pseudo-rnd-thoughts added 6 commits October 19, 2024 14:55

Add Autoreset Mode support to Vector environments

b4cc8ef

Add support for wrappers and fix tests

560099a

Update and fix some tests

0bb14ed

Add new parameter to pytest function

613a329

Fix tests

ae3da91

Fix partial reset failures

efb23ba

pseudo-rnd-thoughts mentioned this pull request Oct 23, 2024

[Feature] Gymnasium 1.0 compatibility pytorch/rl#2473

Open

vmoens approved these changes Nov 13, 2024

View reviewed changes

pseudo-rnd-thoughts added 3 commits November 27, 2024 20:33

Merge branch 'refs/heads/main' into autoreset-mode

606bfaf

Add AutoresetMode to metadata and warning about it if missing

781e6f5

Update metadata to autoreset_mode and add tests on it

d031c42

pseudo-rnd-thoughts merged commit 8a46c3a into Farama-Foundation:main Nov 28, 2024
13 checks passed

pseudo-rnd-thoughts mentioned this pull request Dec 2, 2024

[Question] make_vec() prevent reset behaviour from happening when enviroment returnes done = True #1265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for different vector autoreset modes #1227

Add support for different vector autoreset modes #1227

pseudo-rnd-thoughts commented Oct 22, 2024

pseudo-rnd-thoughts commented Nov 6, 2024

vmoens commented Nov 7, 2024

pseudo-rnd-thoughts commented Nov 7, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Nov 13, 2024

vmoens left a comment

EladSharony commented Nov 28, 2024

Add support for different vector autoreset modes #1227

Add support for different vector autoreset modes #1227

Conversation

pseudo-rnd-thoughts commented Oct 22, 2024

Description

pseudo-rnd-thoughts commented Nov 6, 2024

vmoens commented Nov 7, 2024

pseudo-rnd-thoughts commented Nov 7, 2024 • edited Loading

pseudo-rnd-thoughts commented Nov 13, 2024

vmoens left a comment

Choose a reason for hiding this comment

EladSharony commented Nov 28, 2024

pseudo-rnd-thoughts commented Nov 7, 2024 •

edited

Loading