Change autoreset order for wrapper and vector environments #694

pseudo-rnd-thoughts · 2023-08-28T11:43:11Z

Description

Vector environments have two choices for implementing an autoreset functionality

During the same step as a sub-environment has terminated / truncated, we can reset the sub-environment however this forces the terminated / truncated, referred to as final, observation and info to be stored in the info as "final_observation" and "final_info".
During the next step after a sub-environment has terminated / truncated, we can reset the sub-environment however in cases where terminated=True there is technically a dead action that does nothing.

An important note is that it is not possible to convert between reset orderings, i.e., if you have a vector environment using implementation 1, you can't run some code that makes it looks like 2, similarly for 2 you can't convert to 1 (to my knowledge).

Previously, gym and gymnasium have used the first option however after a long time thinking about this, I believe that v1.0.0 is the best time for us to change to type 2.

Why?

I have three main reasons

vector-only projects

My primary motivating factor is thinking about pure vector-only projects like EnvPool and SampleFactory, for optimisation reasons, it is (highly) inefficient to store data in a dictionary compared to a NumPy array. For this reason, from my knowledge, all vector-only projects use functionality 2 to autoreset environments.

Why does this matter for Gymnasium? In v1.0.0, we have separated Env from VectorEnv such that neither inherits from the other. As a result, vector-only wrappers have been created to be used with vector environments (normal wrappers can be used inside vector environments for async and sync vector environments however for EnvPool and SampleFactory, this is not possible). Therefore, given the note above about interoperability, if we continue with functionality 1, our vector wrappers cannot be used with these important vector-only projects.

more elegant training code

As originally noted in #32 (comment), functionality 1 requires relatively ugly code compared to functionality 2
This is particularly true for vector environments; see https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py#L200

Simplifies `VectorObservationWrapper`

VectorObservationWrapper requires two observation functions in order to transform an observation, a vector observation and a single observation as the final observation must be transformed on its own.
But with functionality two, then there is no separate final observation that must be transformed

Why not?

Unlike the step API in v0.25, there is no obvious way of telling if a vector environment has implemented functionality 1 or 2 or if the training code is adapted to 1 or 2. This is an issue that cannot be avoided but with v1.0.0, this is why we can only make it now

Second, users who use Gymnasium's async or sync vector env will have to update their usage. However, this will be obvious as "final_observation" and "final_info" will no longer exist

Updating info concatenation and decomposition

By removing "final_observation" or "final_info" from info, we can implement a recursive vector info support, in particular, for vectoring RecordEpisodeStatistics.

Using envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.RecordEpisodeStatistics,)), when an sub-environment terminates / truncates, the info is {"episode": np.array([{"r": 1, "t": 2, "l": 3}, None, None], dtype=object)}. However this means to access the episode information, you need to do info["final_info"][i]["episode"]["r"] for each of the sub-environment.

Rather, with functionality 2, you can do, info["episode"]["r"] for all of the sub-environment cumulative rewards (if they have terminated / truncated).

Furthermore, it is necessary to update DictToListInfo to support the reverse operation of this.

Co-authored-by: Kallinteris Andreas <[email protected]>

…undation#623)

…ma-Foundation#632)

… docs (Farama-Foundation#654)

…ion#656)

…ion#678)

# Conflicts: # docs/environments/third_party_environments.md # gymnasium/core.py

…with different action / obs spaces (Farama-Foundation#686)

…info to list

…hint in `RecordEpisodeStatistics` (Farama-Foundation#693)

Co-authored-by: Mark Towers <[email protected]>

…arama-Foundation#710)

…ector_entry_point` rather than `custom` (Farama-Foundation#689)

…tion#711)

# Conflicts: # gymnasium/wrappers/vector/vectorize_observation.py # tests/vector/test_vector_env.py # tests/vector/test_vector_env_info.py # tests/wrappers/vector/test_dict_info_to_list.py

…nvironment

…onalJaxEnv` and `FunctionalVectorJaxEnv`

# Conflicts: # gymnasium/spaces/utils.py # gymnasium/utils/env_checker.py

Te This reverts commit 34a1da8, reversing changes made to 45eb997.

# Conflicts: # .github/workflows/build-docs.yml # .github/workflows/docs-manual-versioning.yml # .github/workflows/docs-versioning.yml # docs/api/experimental/functional.md # docs/api/wrappers.md

…1` from `0`

pseudo-rnd-thoughts and others added 30 commits July 15, 2023 14:28

Remove deprecated features (Farama-Foundation#609)

97ac1eb

Merge experimental functional api into root (Farama-Foundation#610)

d4aa544

Merge experimental vector into root (Farama-Foundation#613)

68a6a25

Co-authored-by: Kallinteris Andreas <[email protected]>

Merge experimental wrappers to root (Farama-Foundation#614)

e49cdc8

Co-authored-by: Kallinteris Andreas <[email protected]>

[WIP] Merge stateful observation tests to main tests suite (Farama-Fo…

0d78c98

…undation#623)

Update vector docs (Farama-Foundation#624)

d932fe9

Update all of the docs to v1.0.0 (Farama-Foundation#637)

b754b5d

ENH: Add NormalizeReward & NormalizeObservation vector wrappers (Fara…

93f6448

…ma-Foundation#632)

Remove VectorWrapper.__getattr__ (Farama-Foundation#645)

f159ab4

Update docs 2 (Farama-Foundation#649)

04a8548

Automatically generate the list of wrappers using their first line in…

700ffd0

… docs (Farama-Foundation#654)

Merge experimental wrappers tests into wrappers tests (Farama-Foundat…

a220735

…ion#656)

Improve the normalize vector wrapper tests (Farama-Foundation#659)

02b7d6d

Reduce the number of CI warnings (Farama-Foundation#663)

e98bbad

Update version to v1.0.0a1

937177d

Add more examples to docs for wrappers (Farama-Foundation#657)

f686481

Rename AsyncVectorEnv or SyncVectorEnv attributes (Farama-Foundat…

24cc085

…ion#678)

Remove wrapper version numbers (Farama-Foundation#665)

e3ec4b8

Support gym.make_vec(envs.spec) (Farama-Foundation#679)

0bc7f1e

Add VectorEnv.render function (Farama-Foundation#666)

dcd9ab9

Remove old testing code (Farama-Foundation#680)

36a446b

Add Wrapper.set_wrapper_attr and fix issue 357 (Farama-Foundation#681)

ff1128f

Change to release candidate from alpha

d2efa7c

Merge branch 'v1.0.0' into merge-v1.0.0

e02befa

# Conflicts: # docs/environments/third_party_environments.md # gymnasium/core.py

Improve the documentation (Farama-Foundation#688)

76e8871

Fixed VectorizeTransformAction and VectorizeTransformObservation …

c48cbd7

…with different action / obs spaces (Farama-Foundation#686)

Add recursive info support for both list of info to dict and dict of …

145727d

…info to list

Update most of the tests

75ff765

Update wrappers.Autoreset

6e82fa9

Remove Autoreset.spec, add type hints to TimeLimit, correct type …

65c62eb

…hint in `RecordEpisodeStatistics` (Farama-Foundation#693)

jjshoots and others added 12 commits September 5, 2023 12:54

add examples for vector wrappers (Farama-Foundation#673)

0313eac

Co-authored-by: Mark Towers <[email protected]>

Add make(max_episode_steps=0) to not apply a TimeLimit wrapper (F…

2af443f

…arama-Foundation#710)

Update make_vec(vectorization_mode) to default to None and use `v…

ce01fd9

…ector_entry_point` rather than `custom` (Farama-Foundation#689)

Update make_vec to use make for async and sync (Farama-Founda…

adcc405

…tion#711)

Merge branch 'v1.0.0' into autoreset-order

06e40df

# Conflicts: # gymnasium/wrappers/vector/vectorize_observation.py # tests/vector/test_vector_env.py # tests/vector/test_vector_env_info.py # tests/wrappers/vector/test_dict_info_to_list.py

Update doctests

13034c1

Add tests and fix Cartpole vector environment matches base cartpole e…

19cc490

…nvironment

Update testing for functional environments and add testing of `Functi…

ff52cdf

…onalJaxEnv` and `FunctionalVectorJaxEnv`

Pre-commit

d0c2b8b

Merge branch 'cartpole-vector-env' into autoreset-order

20ef580

Revert step action error

aa5bf43

Fixed doctests

953cd12

pseudo-rnd-thoughts mentioned this pull request Sep 10, 2023

[WIP] Updates to functional jax vector envs #622

Closed

pseudo-rnd-thoughts added 3 commits September 10, 2023 09:40

Merge branch 'v1.0.0' into merge-v1.0.0

45eb997

Merge branch 'autoreset-order' into merge-v1.0.0

34a1da8

Merge branch 'main' into merge-v1.0.0

5ae9bf2

# Conflicts: # gymnasium/spaces/utils.py # gymnasium/utils/env_checker.py

pseudo-rnd-thoughts mentioned this pull request Sep 14, 2023

[Feature] Gym 'vectorized' envs compatibility pytorch/rl#1519

Merged

pseudo-rnd-thoughts added 10 commits October 11, 2023 12:19

Revert "Merge branch 'autoreset-order' into merge-v1.0.0"

b6f9afa

Te This reverts commit 34a1da8, reversing changes made to 45eb997.

Changes from Ariel's code review

9eeb8cd

Merge branch 'main' into merge-v1.0.0

7d6b46f

# Conflicts: # .github/workflows/build-docs.yml # .github/workflows/docs-manual-versioning.yml # .github/workflows/docs-versioning.yml # docs/api/experimental/functional.md # docs/api/wrappers.md

Specify the Dill version

20aeee0

Change max_episode_steps to not apply the TimeLimit wrapper to `-…

491dc4c

…1` from `0`

Move functional_jax_env to envs from utils

c1aa2e0

Revert initial_info to state_info

4eeda83

Fix pytests for tabular envs and doctests

8ddf676

Fix pytests for tabular envs and doctests

698dd37

Merge branch 'main' into autoreset-order

65be1e4

pseudo-rnd-thoughts changed the base branch from v1.0.0 to main November 7, 2023 14:45

pseudo-rnd-thoughts closed this Nov 22, 2023

pseudo-rnd-thoughts mentioned this pull request Nov 22, 2023

Change autoreset order such that reset will happen on the next step #785

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change autoreset order for wrapper and vector environments #694

Change autoreset order for wrapper and vector environments #694

pseudo-rnd-thoughts commented Aug 28, 2023 •

edited

Loading

Change autoreset order for wrapper and vector environments #694

Change autoreset order for wrapper and vector environments #694

Conversation

pseudo-rnd-thoughts commented Aug 28, 2023 • edited Loading

Description

Why?

vector-only projects

more elegant training code

Simplifies VectorObservationWrapper

Why not?

Updating info concatenation and decomposition

pseudo-rnd-thoughts commented Aug 28, 2023 •

edited

Loading

Simplifies `VectorObservationWrapper`