Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Unordered Unique Variables lead to non-deterministic data sampling despite seeding #30

Open
lfrommelt opened this issue Dec 3, 2024 · 0 comments · May be fixed by #33
Open

Bug: Unordered Unique Variables lead to non-deterministic data sampling despite seeding #30

lfrommelt opened this issue Dec 3, 2024 · 0 comments · May be fixed by #33

Comments

@lfrommelt
Copy link
Collaborator

To replicate, execute the following example, restart the runtime and do it again. In the given two-variables-example there is a 50% chance that the variables will be ordered differently between executions, leading to different evaluations:

from equation_tree import EquationTree
from sympy import symbols
import numpy as np

# get an arbitrary equation Tree object with at least two variables
x1, x2 = symbols('x1 x2')
expr = x1**x2
equation=EquationTree.from_sympy(expr, variable_test=lambda x: "x" in x)

# setting a global seed for numpy will make sure, that we re-sample the same "crossings"
np.random.seed(10)

# However, the order of variables_unique is random, due to set being an unordered data type
print(equation.variables_unique)

# Hence the re-sampled values for x1 and x2 might be swapped, changing the evaluation result as well
print(equation.get_evaluation(num_samples=2))

For me in order to make it work, I wrapped the return of the variables_unique property inside a call of sorted() in line 555 in tree.py. If that is not intended in order to get returned the original set type, at least the argument of enumerate in line 1370 should be sorted.

This one was extremely hard to find, partially because I had basically the same bug in my own script as well (i.e. on sympy.free_variables, they do it like EquationTree), giving a very confusing 25% chance of succesfully replicating my experiments xD

Cheers,
Leonard

@lfrommelt lfrommelt linked a pull request Dec 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant