`env_checker` to use exact data_equivilance #953

Kallinteris-Andreas · 2024-03-07T09:23:28Z

Description

fixes #927

Type of change

Please delete options that are not relevant.

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

pseudo-rnd-thoughts

Like @RedTachyon said, there are cases where floating point makes exact a very difficult standard for projects to require.
I would prefer to reduce the allclose atol or rtol values than change to allequal

Kallinteris-Andreas · 2024-03-08T05:12:30Z

As I previously said, the operations in check_env performed in deterministic order, so there is no concern of FP arithmetic ordering errors

While it is true that for example that:
a + b == b + a is not always true

a + b == a + b is always true

RedTachyon · 2024-03-08T10:12:50Z

Nitpick:

(yes, this uses nan's - although those would also fail approximate equality tests)

RedTachyon · 2024-03-08T10:17:14Z

Can't we do like a two-tiered check? If they're not allclose, then throw an error because that's a big issue (like what happens now). If it is allclose but isn't ==, then raise a warning.

Alternatively, maybe we can just add a switch in check_env controlling how strict the equality checks are.

pseudo-rnd-thoughts · 2024-03-08T11:19:48Z

I agree with @RedTachyon on the two-tier testing

Kallinteris-Andreas · 2024-03-08T13:32:38Z

I'm sorry, I do not understand the benefit of a two-tier testing System. All deterministic environments. Should fail if they cannot pass allequal

note: NaN It is already tested before the determinism checks.

pseudo-rnd-thoughts · 2024-03-11T11:36:31Z

From my perspective, we can take the check_env function in two directions.
The first is a general testing function for most environments that will generally test possible values but not provide a guarantee. To test determinism, we only test the first reset and step values, ignoring any roll out, meaning that in many ways the function is not that strict.

The second is a comprehensive testing tool that covers everything about the function.

In my opinion, we should test the essential core elements of the API comprehensively, while the non-essential elements (i.e., determinism) should be more lacks. Particularly for determinism that is very difficult to guarantee for some environment with external simulators, etc and as Gymnasium is a general API we should try to find a middle ground on this.

Therefore, a two-tier equivalence system could help provide a middle ground to purely alert users if strict equivalence doesn't hold but not failure that would prevent the rest of the testing suite to complete.

For non-essential items, I don't want the testing to fail and the user disregard the function when later parts of the function would show essential elements actually failing if they continued using it

pseudo-rnd-thoughts

Could the warning and error messages be a bit different otherwise it could be difficult for users to understand what is happening

gymnasium/utils/env_checker.py

pseudo-rnd-thoughts · 2024-03-13T00:01:31Z

My suggested changes

    assert data_equivalence(
        obs_0, obs_1
    ), "Deterministic step observations are not equivalent for the same seed and action"
    if not data_equivalence(obs_0, obs_1, exact=True):
        logger.warn(
            "Step observations are not equal although similar given the same seed and action"
        )

    assert data_equivalence(
        rew_0, rew_1
    ), "Deterministic step rewards are not equivalent for the same seed and action"
    if not data_equivalence(rew_0, rew_1, exact=True):
        logger.warn(
            "Step rewards are not equal although similar given the same seed and action"
        )

    assert data_equivalence(
        term_0, term_0, exact=True
    ), "Deterministic step termination are not equivalent for the same seed and action"
    assert (
        trunc_0 is False and trunc_1 is False
    ), "Environment truncates after 1 step, something has gone very wrong."

    assert data_equivalence(
        info_0,
        info_1,
    ), "Deterministic step info are not equivalent for the same seed and action"
    if not data_equivalence(info_0, info_1, exact=True):
        logger.warn(
            "Step info are not equal although similar given the same seed and action"
        )

and

            if env.spec is not None and env.spec.nondeterministic is False:
                assert data_equivalence(
                    obs_1, obs_2
                ), "Using `env.reset(seed=123)` is non-deterministic as the observations are not equivalent."
                if not data_equivalence(obs_1, obs_2, exact=True):
                    logger.warn(
                        "Using `env.reset(seed=123)` observations are not equal although similar."
                    )

Kallinteris-Andreas added 2 commits March 7, 2024 11:23

env_checker to use exact data_equivilance

af81de4

pre-commit

894c255

pseudo-rnd-thoughts requested changes Mar 7, 2024

View reviewed changes

.

afe94d1

Kallinteris-Andreas requested a review from pseudo-rnd-thoughts March 12, 2024 15:08

pseudo-rnd-thoughts requested changes Mar 12, 2024

View reviewed changes

gymnasium/utils/env_checker.py Show resolved Hide resolved

gymnasium/utils/env_checker.py Outdated Show resolved Hide resolved

.

ba66d49

Kallinteris-Andreas requested a review from pseudo-rnd-thoughts March 12, 2024 19:43

Improve the error messaging and related tests

9955e99

Update env_checker.py

2c6b434

pseudo-rnd-thoughts approved these changes Mar 14, 2024

View reviewed changes

pseudo-rnd-thoughts merged commit 17203a5 into main Mar 14, 2024
16 checks passed

Kallinteris-Andreas deleted the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:36

Kallinteris-Andreas restored the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:37

Kallinteris-Andreas deleted the Kallinteris-Andreas-patch-4 branch March 23, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`env_checker` to use exact data_equivilance #953

`env_checker` to use exact data_equivilance #953

Kallinteris-Andreas commented Mar 7, 2024 •

edited

Loading

pseudo-rnd-thoughts left a comment

Kallinteris-Andreas commented Mar 8, 2024 •

edited

Loading

RedTachyon commented Mar 8, 2024

RedTachyon commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 8, 2024

Kallinteris-Andreas commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 11, 2024

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts commented Mar 13, 2024

env_checker to use exact data_equivilance #953

env_checker to use exact data_equivilance #953

Conversation

Kallinteris-Andreas commented Mar 7, 2024 • edited Loading

Description

Type of change

Screenshots

Checklist:

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Kallinteris-Andreas commented Mar 8, 2024 • edited Loading

RedTachyon commented Mar 8, 2024

RedTachyon commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 8, 2024

Kallinteris-Andreas commented Mar 8, 2024

pseudo-rnd-thoughts commented Mar 11, 2024

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts commented Mar 13, 2024

`env_checker` to use exact data_equivilance #953

`env_checker` to use exact data_equivilance #953

Kallinteris-Andreas commented Mar 7, 2024 •

edited

Loading

Kallinteris-Andreas commented Mar 8, 2024 •

edited

Loading