-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208
Conversation
Can you try replacing the entire body of mujoco.mj_resetData(self.model, self.data) |
- Fix remaining mujoco envs - Fix mujoco_py envs - Simplify reset
Yes, that's a more principled solution. I had to add it to the Fetch environments separately because they redefine |
I have resolved your comments and removed the modifications to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will require an environment version bump since
i will work on the gymnasium
end to improve the check_env
(i will be a bit busy with the 1.0 release)
gymnasium_robotics/envs/robot_env.py
Outdated
self.data.act[:] = None | ||
|
||
# Reset buffers for joint states, warm-start, control buffers etc. | ||
mujoco.mj_resetData(self.model, self.data) | ||
mujoco.mj_forward(self.model, self.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mj_forward
should also be removed (but kept for fetch_env
because it changes the qpos
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests still succeed even if it is removed for fetch_env
, although Mujoco should not reflect the changes in qpos
in the positions of the links. Do you think it's worth adding tests that catch this or is that overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure that removing mj_forward
after moving the position of an object (qpos
), will not result in bugs, so it is better to just keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in. Let me know if that addresses all your remarks for the PR
You can use this test instead, to validate the change, which is better because it actually asserts that reset properly resets the environment state (note: import mujoco
import numpy as np
import gymnasium
def get_state(
env: gymnasium.envs.mujoco.MujocoEnv,
state_type: mujoco.mjtState = mujoco.mjtState.mjSTATE_PHYSICS,
):
"""Gets the state of `env`.
Arguments:
env: Environment whose state to copy, `env.model` & `env.data` must be accessible.
state_type: see the [documentation of mjtState](https://mujoco.readthedocs.io/en/stable/APIreference/APItypes.html#mjtstate) most users can use the default for training purposes or `mujoco.mjtState.mjSTATE_INTEGRATION` for validation purposes.
"""
assert mujoco.__version__ >= "2.3.6", "Feature requires `mujuco>=2.3.6`"
state = np.empty(mujoco.mj_stateSize(env.unwrapped.model, state_type))
mujoco.mj_getState(env.unwrapped.model, env.unwrapped.data, state, state_type)
return state
def check_mujoco_reset_state(env: gymnasium.envs.mujoco.MujocoEnv, seed=1234):
"""Asserts that `env.reset` properly resets the state (not affected by previous steps), assuming `check_reset_seed` has passed."""
env.action_space.seed(seed)
action = env.action_space.sample()
env.reset(seed=seed)
first_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
env.step(action)
env.reset(seed=seed)
second_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
assert np.all(first_reset_state == second_reset_state), "reset is not deterministic"
|
I don't think that's correct. The test you mentioned steps through two environments and checks if the information from two distinct environments match at each step. This cannot detect the failure case of this PR since the deviation only occurs after an environment has been reset and stepped through for several times. If we wanted to catch the failure case with the same function, we'd have to run a rollout of one of the two environments before starting the The current test runs a rollout, resets the same I do think however that the combination of your |
When bumping the environment version number, do we replace all the v2s, should we issue a deprecation error or is that done automatically? And do the required changes to the documentation also go into this PR? |
|
Hopefully the last question: Some envs are inheriting from MujocoRobotEnv, but the behavior seems to be unchanged. E.g. |
Since shadow envs are not be affected by the change we do not have to worry about them we should verify that they remain unchanged though |
…ct new version number. Simplify reset test.
@rodrigodelazcano Update: I just checked. There are no datasets Minari for these environments). |
@Kallinteris-Andreas Any news on when this will be merged? |
@amacati waiting for |
Note: I changed the changelog in the documentation to be shorter |
Description
The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.
FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.
The inconsistencies arise because the internal Mujoco state is not restored properly when _reset_sim() is called. Specifically, the position and quaternions of the mocap bodies are currently not being reset. Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.
Fixes #207
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)