Change autoreset order such that reset will happen on the next step #785

pseudo-rnd-thoughts · 2023-11-22T16:37:22Z

Description

Vector environments have two choices for implementing an autoreset functionality

During the same step as a sub-environment has terminated / truncated, we can reset the sub-environment however this forces the terminated / truncated, referred to as final, observation and info to be stored in the info as "final_observation" and "final_info".
During the next step after a sub-environment has terminated / truncated, we can reset the sub-environment however in cases where terminated=True there is technically a dead action that does nothing.

An important note is that it is not possible to convert between reset orderings, i.e., if you have a vector environment using implementation 1, you can't run some code that makes it looks like 2, similarly for 2 you can't convert to 1 (to my knowledge).

Previously, gym and gymnasium have used the first option however after a long time thinking about this, I believe that v1.0.0 is the best time for us to change to type 2.

Why?

I have three main reasons

vector-only projects

My primary motivating factor is thinking about pure vector-only projects like EnvPool and SampleFactory, for optimisation reasons, it is (highly) inefficient to store data in a dictionary compared to a NumPy array. For this reason, from my knowledge, all vector-only projects use functionality 2 to autoreset environments.

Why does this matter for Gymnasium? In v1.0.0, we have separated Env from VectorEnv such that neither inherits from the other. As a result, vector-only wrappers have been created to be used with vector environments (normal wrappers can be used inside vector environments for async and sync vector environments however for EnvPool and SampleFactory, this is not possible). Therefore, given the note above about interoperability, if we continue with functionality 1, our vector wrappers cannot be used with these important vector-only projects.

more elegant training code

As originally noted in #32 (comment), functionality 1 requires relatively ugly code compared to functionality 2
This is particularly true for vector environments; see https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py#L200

Simplifies `VectorObservationWrapper`

VectorObservationWrapper requires two observation functions in order to transform an observation, a vector observation and a single observation as the final observation must be transformed on its own.
But with functionality two, then there is no separate final observation that must be transformed

Why not?

Unlike the step API in v0.25, there is no obvious way of telling if a vector environment has implemented functionality 1 or 2 or if the training code is adapted to 1 or 2. This is an issue that cannot be avoided but with v1.0.0, this is why we can only make it now

Second, users who use Gymnasium's async or sync vector env will have to update their usage. However, this will be obvious as "final_observation" and "final_info" will no longer exist

Updating info concatenation and decomposition

By removing "final_observation" or "final_info" from info, we can implement a recursive vector info support, in particular, for vectoring RecordEpisodeStatistics.

Using envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.RecordEpisodeStatistics,)), when an sub-environment terminates / truncates, the info is {"episode": np.array([{"r": 1, "t": 2, "l": 3}, None, None], dtype=object)}. However this means to access the episode information, you need to do info["final_info"][i]["episode"]["r"] for each of the sub-environment.

Rather, with functionality 2, you can do, info["episode"]["r"] for all of the sub-environment cumulative rewards (if they have terminated / truncated).

Furthermore, it is necessary to update DictToListInfo to support the reverse operation of this.

To-do

Completion of #694

Update functional jax env
Update custom vector environments

RedTachyon · 2023-12-03T18:49:56Z

Closing in favor of #808

…ng the reset of done environments until the next step. Fixes Farama-Foundation#914.

Co-authored-by: Tim Schneider <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Tim Schneider <[email protected]>

pseudo-rnd-thoughts added 5 commits November 22, 2023 15:30

Update the vector info style

9c0517b

Update the autoreset order

8a9159b

Update the wrappers

a85a6e0

Fix the implementation to make the tests

6cb9482

pre-commit

c47f354

RedTachyon mentioned this pull request Dec 3, 2023

Change autoreset order #808

Merged

RedTachyon closed this Dec 3, 2023

TimSchneider42 mentioned this pull request Feb 6, 2024

[Bug Report] CartPoleVectorEnv resets one step to early, not following the new VectorEnv API #914

Closed

1 task

TimSchneider42 pushed a commit to TimSchneider42/Gymnasium that referenced this pull request Feb 6, 2024

Made CartPoleVectorEnv compliant with Farama-Foundation#785 by delayi…

9da5ec0

…ng the reset of done environments until the next step. Fixes Farama-Foundation#914.

TimSchneider42 mentioned this pull request Feb 6, 2024

Made CartPoleVectorEnv compliant with #785 #915

Merged

11 tasks

TimSchneider42 pushed a commit to TimSchneider42/Gymnasium that referenced this pull request Feb 6, 2024

Made CartPoleVectorEnv compliant with Farama-Foundation#785 by delayi…

fb4ffb8

…ng the reset of done environments until the next step. Fixes Farama-Foundation#914.

pseudo-rnd-thoughts added a commit that referenced this pull request Feb 19, 2024

Made CartPoleVectorEnv compliant with #785 (#915)

2b5e555

Co-authored-by: Tim Schneider <[email protected]> Co-authored-by: Mark Towers <[email protected]> Co-authored-by: Tim Schneider <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change autoreset order such that reset will happen on the next step #785

Change autoreset order such that reset will happen on the next step #785

pseudo-rnd-thoughts commented Nov 22, 2023 •

edited

Loading

RedTachyon commented Dec 3, 2023

Change autoreset order such that reset will happen on the next step #785

Change autoreset order such that reset will happen on the next step #785

Conversation

pseudo-rnd-thoughts commented Nov 22, 2023 • edited Loading

Description

Why?

vector-only projects

more elegant training code

Simplifies VectorObservationWrapper

Why not?

Updating info concatenation and decomposition

To-do

RedTachyon commented Dec 3, 2023

pseudo-rnd-thoughts commented Nov 22, 2023 •

edited

Loading

Simplifies `VectorObservationWrapper`