Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env_checker to use exact data_equivilance #953

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

Kallinteris-Andreas
Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas commented Mar 7, 2024

Description

fixes #927

Type of change

Please delete options that are not relevant.

  • Documentation only change (no code changed)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @RedTachyon said, there are cases where floating point makes exact a very difficult standard for projects to require.
I would prefer to reduce the allclose atol or rtol values than change to allequal

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Mar 8, 2024

As I previously said, the operations in check_env performed in deterministic order, so there is no concern of FP arithmetic ordering errors

While it is true that for example that:
a + b == b + a is not always true

a + b == a + b is always true

@RedTachyon
Copy link
Member

Nitpick:

image

(yes, this uses nan's - although those would also fail approximate equality tests)

@RedTachyon
Copy link
Member

Can't we do like a two-tiered check? If they're not allclose, then throw an error because that's a big issue (like what happens now). If it is allclose but isn't ==, then raise a warning.

Alternatively, maybe we can just add a switch in check_env controlling how strict the equality checks are.

@pseudo-rnd-thoughts
Copy link
Member

I agree with @RedTachyon on the two-tier testing

@Kallinteris-Andreas
Copy link
Collaborator Author

I'm sorry, I do not understand the benefit of a two-tier testing System. All deterministic environments. Should fail if they cannot pass allequal

note: NaN It is already tested before the determinism checks.

@pseudo-rnd-thoughts
Copy link
Member

From my perspective, we can take the check_env function in two directions.
The first is a general testing function for most environments that will generally test possible values but not provide a guarantee. To test determinism, we only test the first reset and step values, ignoring any roll out, meaning that in many ways the function is not that strict.

The second is a comprehensive testing tool that covers everything about the function.

In my opinion, we should test the essential core elements of the API comprehensively, while the non-essential elements (i.e., determinism) should be more lacks. Particularly for determinism that is very difficult to guarantee for some environment with external simulators, etc and as Gymnasium is a general API we should try to find a middle ground on this.

Therefore, a two-tier equivalence system could help provide a middle ground to purely alert users if strict equivalence doesn't hold but not failure that would prevent the rest of the testing suite to complete.

For non-essential items, I don't want the testing to fail and the user disregard the function when later parts of the function would show essential elements actually failing if they continued using it

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the warning and error messages be a bit different otherwise it could be difficult for users to understand what is happening

gymnasium/utils/env_checker.py Show resolved Hide resolved
gymnasium/utils/env_checker.py Outdated Show resolved Hide resolved
@pseudo-rnd-thoughts
Copy link
Member

My suggested changes

    assert data_equivalence(
        obs_0, obs_1
    ), "Deterministic step observations are not equivalent for the same seed and action"
    if not data_equivalence(obs_0, obs_1, exact=True):
        logger.warn(
            "Step observations are not equal although similar given the same seed and action"
        )

    assert data_equivalence(
        rew_0, rew_1
    ), "Deterministic step rewards are not equivalent for the same seed and action"
    if not data_equivalence(rew_0, rew_1, exact=True):
        logger.warn(
            "Step rewards are not equal although similar given the same seed and action"
        )

    assert data_equivalence(
        term_0, term_0, exact=True
    ), "Deterministic step termination are not equivalent for the same seed and action"
    assert (
        trunc_0 is False and trunc_1 is False
    ), "Environment truncates after 1 step, something has gone very wrong."

    assert data_equivalence(
        info_0,
        info_1,
    ), "Deterministic step info are not equivalent for the same seed and action"
    if not data_equivalence(info_0, info_1, exact=True):
        logger.warn(
            "Step info are not equal although similar given the same seed and action"
        )

and

            if env.spec is not None and env.spec.nondeterministic is False:
                assert data_equivalence(
                    obs_1, obs_2
                ), "Using `env.reset(seed=123)` is non-deterministic as the observations are not equivalent."
                if not data_equivalence(obs_1, obs_2, exact=True):
                    logger.warn(
                        "Using `env.reset(seed=123)` observations are not equal although similar."
                    )

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 17203a5 into main Mar 14, 2024
16 checks passed
@Kallinteris-Andreas Kallinteris-Andreas deleted the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:36
@Kallinteris-Andreas Kallinteris-Andreas restored the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:37
@Kallinteris-Andreas Kallinteris-Andreas deleted the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Improve check_env
3 participants