Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

amacati · 2024-02-10T10:05:59Z

Description

The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.

FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.

The inconsistencies arise because the internal Mujoco state is not restored properly when _reset_sim() is called. Specifically, the position and quaternions of the mocap bodies are currently not being reset. Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.

Fixes #207

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

gymnasium_robotics/envs/fetch/fetch_env.py

Kallinteris-Andreas · 2024-02-11T04:55:35Z

Can you try replacing the entire body of def _reset_sim(self): with a single mj_resetData

        mujoco.mj_resetData(self.model, self.data)

- Fix remaining mujoco envs - Fix mujoco_py envs - Simplify reset

amacati · 2024-02-11T09:52:37Z

Yes, that's a more principled solution. I had to add it to the Fetch environments separately because they redefine RobotEnv._reset_sim(). Also, I added the mujoco_py equivalent to the deprecated environments. This passes all tests for me now, both for mujoco environments and legacy mujoco_py envs.

gymnasium_robotics/envs/fetch/fetch_env.py

…vs from tests

amacati · 2024-02-12T20:28:17Z

I have resolved your comments and removed the modifications to the mujoco-py environments. Since they are expected to fail the new tests I have limited those to the mujoco environments and documented the reason in the test description.

Kallinteris-Andreas

This change will require an environment version bump since

i will work on the gymnasium end to improve the check_env (i will be a bit busy with the 1.0 release)

Kallinteris-Andreas · 2024-02-13T08:16:53Z

gymnasium_robotics/envs/robot_env.py

-            self.data.act[:] = None
-
+        # Reset buffers for joint states, warm-start, control buffers etc.
+        mujoco.mj_resetData(self.model, self.data)
        mujoco.mj_forward(self.model, self.data)


mj_forward should also be removed (but kept for fetch_env because it changes the qpos)

The tests still succeed even if it is removed for fetch_env, although Mujoco should not reflect the changes in qpos in the positions of the links. Do you think it's worth adding tests that catch this or is that overkill?

I am not 100% sure that removing mj_forward after moving the position of an object (qpos), will not result in bugs, so it is better to just keep it

I left it in. Let me know if that addresses all your remarks for the PR

Kallinteris-Andreas · 2024-02-19T11:59:34Z

your test_same_env_determinism_rollout is effectively a reimplementation of gymnasium.utils.env_match.check_environments_match, I do not believe it has any additional utility, if you agree remove it.

You can use this test instead, to validate the change, which is better because it actually asserts that reset properly resets the environment state (note: check_mujoco_reset_state will be importable with from gymnasium.envs.mujoco.utils import check_mujoco_reset_state)

import mujoco
import numpy as np

import gymnasium

def get_state(
    env: gymnasium.envs.mujoco.MujocoEnv,
    state_type: mujoco.mjtState = mujoco.mjtState.mjSTATE_PHYSICS,
):
    """Gets the state of `env`.
    Arguments:
        env: Environment whose state to copy, `env.model` & `env.data` must be accessible.
        state_type: see the [documentation of mjtState](https://mujoco.readthedocs.io/en/stable/APIreference/APItypes.html#mjtstate) most users can use the default for training purposes or `mujoco.mjtState.mjSTATE_INTEGRATION` for validation purposes.
    """
    assert mujoco.__version__ >= "2.3.6", "Feature requires `mujuco>=2.3.6`"

    state = np.empty(mujoco.mj_stateSize(env.unwrapped.model, state_type))
    mujoco.mj_getState(env.unwrapped.model, env.unwrapped.data, state, state_type)
    return state


def check_mujoco_reset_state(env: gymnasium.envs.mujoco.MujocoEnv, seed=1234):
    """Asserts that `env.reset` properly resets the state (not affected by previous steps), assuming `check_reset_seed` has passed."""
    env.action_space.seed(seed)
    action = env.action_space.sample()

    env.reset(seed=seed)
    first_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
    env.step(action)

    env.reset(seed=seed)
    second_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)

    assert np.all(first_reset_state == second_reset_state), "reset is not deterministic"

This will require version bumping for all the affected environments (all based on robot_env), this take 2 steps
2.1 update the changelog of each of those environments
2.2 update the version number in https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/main/gymnasium_robotics/__init__.py
I will likely not be merging this before gymnasium==1.0.0a2

amacati · 2024-02-21T17:55:35Z

I don't think that's correct. The test you mentioned steps through two environments and checks if the information from two distinct environments match at each step. This cannot detect the failure case of this PR since the deviation only occurs after an environment has been reset and stepped through for several times. If we wanted to catch the failure case with the same function, we'd have to run a rollout of one of the two environments before starting the check_environments_match test.

The current test runs a rollout, resets the same env object with the same seed, runs a second rollout, and then tests if the outputs of both rollouts match each other. The key difference is that it uses a single environment that steps multiple times before being reset and thus shows if effects like warming up etc are correctly accounted for.

I do think however that the combination of your check_mujoco_reset_state() test function and the gymnasium check_environments_match test cover it as well.

amacati · 2024-02-26T14:02:23Z

When bumping the environment version number, do we replace all the v2s, should we issue a deprecation error or is that done automatically? And do the required changes to the documentation also go into this PR?

Kallinteris-Andreas · 2024-02-26T17:54:43Z

yes bump all changed environments
deprecation warning are handled by gymnasium.make
yes, documentation should be changed in this PR

amacati · 2024-02-27T09:59:37Z

Hopefully the last question: Some envs are inheriting from MujocoRobotEnv, but the behavior seems to be unchanged. E.g. HandManipulateBlockFull-v1 passes the new tests also without any fixes. The environments with changed behavior are: FetchSlide-v2, FetchPickAndPlace-v2, FetchReach-v2, FetchPush-v2, FetchSlideDense-v2, FetchPickAndPlaceDense-v2, FetchReachDense-v2, FetchPushDense-v2, HandReach-v1, HandReachDense-v1. Therefore, all other variants of the hand tasks should be unchanged even with the new fixes, and should not be bumped. Is that okay with you?

Kallinteris-Andreas · 2024-02-27T11:54:41Z

Since shadow envs are not be affected by the change we do not have to worry about them

we should verify that they remain unchanged though

…ct new version number. Simplify reset test.

gymnasium_robotics/envs/fetch/pick_and_place.py

gymnasium_robotics/envs/fetch/push.py

docs/envs/shadow_dexterous_hand/index.md

Kallinteris-Andreas · 2024-03-12T09:09:48Z

@rodrigodelazcano
What is the policy for keeping old verisons of environments?
In this case the change is practically small. And should not cause problems with comparing training performance, reusing policies and offline reinforcement learning with the data set generated from a previous environment version.

Update: I just checked. There are no datasets Minari for these environments).

amacati · 2024-05-05T20:32:25Z

@Kallinteris-Andreas Any news on when this will be merged?

Kallinteris-Andreas · 2024-05-07T06:32:24Z

@amacati waiting for gymnasium==1.0.0a2 which should be out soon(ish)

Kallinteris-Andreas · 2024-05-29T18:49:28Z

Note: I changed the changelog in the documentation to be shorter

amacati added 2 commits February 10, 2024 10:17

Fix reproducibility of FetchPickAndPlace-v2

54043fc

Add tests for same environment rollout determinism

e89b53e

amacati mentioned this pull request Feb 10, 2024

[Bug Report] FetchPickAndPlace-v2 does not yield reproducible results #207

Closed

1 task

Kallinteris-Andreas reviewed Feb 10, 2024

View reviewed changes

gymnasium_robotics/envs/fetch/fetch_env.py Outdated Show resolved Hide resolved

Fix reproducibility for all robotics environments

5e14ea1

- Fix remaining mujoco envs - Fix mujoco_py envs - Simplify reset

amacati requested a review from Kallinteris-Andreas February 12, 2024 15:58

Kallinteris-Andreas requested changes Feb 12, 2024

View reviewed changes

gymnasium_robotics/envs/fetch/fetch_env.py Outdated Show resolved Hide resolved

gymnasium_robotics/envs/fetch/fetch_env.py Show resolved Hide resolved

Revert changes to mujoco-py. Simplify _reset_sim. Remove mujoco-py en…

02ffe14

…vs from tests

amacati requested a review from Kallinteris-Andreas February 12, 2024 20:28

Kallinteris-Andreas reviewed Feb 14, 2024

View reviewed changes

Remove unnecessary forward call

11db01a

Kallinteris-Andreas mentioned this pull request Feb 16, 2024

Add check_mujoco_reset_state Farama-Foundation/Gymnasium#928

Merged

5 tasks

Merge branch 'Farama-Foundation:main' into main

007afa8

Bump versions of affected environments. Change documentation to refle…

9dd88f7

…ct new version number. Simplify reset test.

amacati requested a review from Kallinteris-Andreas February 29, 2024 10:14

Kallinteris-Andreas requested changes Feb 29, 2024

View reviewed changes

gymnasium_robotics/envs/fetch/pick_and_place.py Show resolved Hide resolved

gymnasium_robotics/envs/fetch/push.py Outdated Show resolved Hide resolved

docs/envs/shadow_dexterous_hand/index.md Show resolved Hide resolved

Fix typo, clarify version change with issue

c6b54d6

amacati requested a review from Kallinteris-Andreas March 4, 2024 09:11

update to gymnasium==1.0a2

b451ba7

Kallinteris-Andreas changed the title ~~Fix reproducibility issues in Fetch environments~~ Fetch-v3 environments (Fix reproducibility issues) May 29, 2024

Kallinteris-Andreas changed the title ~~Fetch-v3 environments (Fix reproducibility issues)~~ Fetch environments new version (Fix reproducibility issues) May 29, 2024

Kallinteris-Andreas added 5 commits May 29, 2024 17:31

update changelog

dace06b

update changelog

f0c1a32

Update reach.py

5f6c5e2

Update slide.py

7ab8f63

Update reach.py

5da9ab6

Kallinteris-Andreas approved these changes May 29, 2024

View reviewed changes

Kallinteris-Andreas changed the title ~~Fetch environments new version (Fix reproducibility issues)~~ add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) May 29, 2024

Kallinteris-Andreas changed the title ~~add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues)~~ Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) May 29, 2024

Kallinteris-Andreas merged commit e525caa into Farama-Foundation:main May 29, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

amacati commented Feb 10, 2024 •

edited

Loading

Kallinteris-Andreas commented Feb 11, 2024

amacati commented Feb 11, 2024 •

edited

Loading

amacati commented Feb 12, 2024

Kallinteris-Andreas left a comment

Kallinteris-Andreas Feb 13, 2024

amacati Feb 15, 2024

Kallinteris-Andreas Feb 15, 2024

amacati Feb 15, 2024

Kallinteris-Andreas commented Feb 19, 2024

amacati commented Feb 21, 2024 •

edited

Loading

amacati commented Feb 26, 2024 •

edited

Loading

Kallinteris-Andreas commented Feb 26, 2024

amacati commented Feb 27, 2024

Kallinteris-Andreas commented Feb 27, 2024 •

edited

Loading

Kallinteris-Andreas commented Mar 12, 2024 •

edited

Loading

amacati commented May 5, 2024

Kallinteris-Andreas commented May 7, 2024

Kallinteris-Andreas commented May 29, 2024

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

Conversation

amacati commented Feb 10, 2024 • edited Loading

Description

Type of change

Checklist:

Kallinteris-Andreas commented Feb 11, 2024

amacati commented Feb 11, 2024 • edited Loading

amacati commented Feb 12, 2024

Kallinteris-Andreas left a comment

Choose a reason for hiding this comment

Kallinteris-Andreas Feb 13, 2024

Choose a reason for hiding this comment

amacati Feb 15, 2024

Choose a reason for hiding this comment

Kallinteris-Andreas Feb 15, 2024

Choose a reason for hiding this comment

amacati Feb 15, 2024

Choose a reason for hiding this comment

Kallinteris-Andreas commented Feb 19, 2024

amacati commented Feb 21, 2024 • edited Loading

amacati commented Feb 26, 2024 • edited Loading

Kallinteris-Andreas commented Feb 26, 2024

amacati commented Feb 27, 2024

Kallinteris-Andreas commented Feb 27, 2024 • edited Loading

Kallinteris-Andreas commented Mar 12, 2024 • edited Loading

amacati commented May 5, 2024

Kallinteris-Andreas commented May 7, 2024

Kallinteris-Andreas commented May 29, 2024

amacati commented Feb 10, 2024 •

edited

Loading

amacati commented Feb 11, 2024 •

edited

Loading

amacati commented Feb 21, 2024 •

edited

Loading

amacati commented Feb 26, 2024 •

edited

Loading

Kallinteris-Andreas commented Feb 27, 2024 •

edited

Loading

Kallinteris-Andreas commented Mar 12, 2024 •

edited

Loading