-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concern around random number seeding #181
Comments
Using a global pseudo random number generator (plain python or numpy) is not really nice in the first place. Different code using the same PRNG may have different expectations of when it is (re)seeded. The best solution (I think) is to have a unique PRNG with its own state for each thing that needs one. That would allow anyone to seed their own PRNG for reproducible results without interfering with other code. I'm not sure how easy that would be to do though. |
Looks like (atleast for numpy), this would be a good solution: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.RandomState.html . The alternative would be fiddling with I would recommend that you use We should also switch to a similar approach, but that requires rewriting the random augmentation code. That should be done together with #68 and #150. |
Thanks for the suggestion. It seems that RandomState would be a good solution for me and would be good to tie in elsewhere. |
I'm assuming the problem is solved by using If the problem is not solved, feel free to request a re-open. Thanks for reporting the issue! |
I ran into an issue while trying to implement class balancing using the np.random number generator. Basically, I was seeing a weird repeating sequence in the number generation and it was causing poor sampling. I think the root of the cause comes from line 58 in utils/image.py. Here np.random.randint is used to get a seed. Then when the Keras method image_data_generator.random_transform gets called, it calls np.random.seed(seed). In doing this, it biases what seed will be selected next. The following code snippet illustrates this problem:
This is probably not causing many problems in the existing codebase because the python random library is used in most other places (using that library for my class balancing code also seemed to solve my issue). Additionally, not much augmentation is done in random_transform so repeating seeds may not be much of an issue. Still, it's probably worth considering alternatives for this because it could be a gotcha down the line as it was with me. I'm not sure what the best solution would be though.
As a side note, I wonder if in preprocessing/generator.py we should call random.seed(seed) in addition to np.random.seed(seed) since the goal is to get replicable sequences and random is used for the sequencing.
The text was updated successfully, but these errors were encountered: