Concern around random number seeding #181

mhwilder · 2017-12-13T18:11:53Z

I ran into an issue while trying to implement class balancing using the np.random number generator. Basically, I was seeing a weird repeating sequence in the number generation and it was causing poor sampling. I think the root of the cause comes from line 58 in utils/image.py. Here np.random.randint is used to get a seed. Then when the Keras method image_data_generator.random_transform gets called, it calls np.random.seed(seed). In doing this, it biases what seed will be selected next. The following code snippet illustrates this problem:

import numpy as np
n = 10000
selected = np.zeros(n)
for i in range(100000):
    seed = np.random.randint(n)
    selected[seed] += 1
    np.random.seed(seed)
print('%d of %d selected' % (np.sum(selected>0), n))
print('Mean count of those selected is %0.1f' % np.mean(selected[selected>0]))

This is probably not causing many problems in the existing codebase because the python random library is used in most other places (using that library for my class balancing code also seemed to solve my issue). Additionally, not much augmentation is done in random_transform so repeating seeds may not be much of an issue. Still, it's probably worth considering alternatives for this because it could be a gotcha down the line as it was with me. I'm not sure what the best solution would be though.

As a side note, I wonder if in preprocessing/generator.py we should call random.seed(seed) in addition to np.random.seed(seed) since the goal is to get replicable sequences and random is used for the sequencing.

The text was updated successfully, but these errors were encountered:

de-vri-es · 2017-12-13T18:18:59Z

Using a global pseudo random number generator (plain python or numpy) is not really nice in the first place. Different code using the same PRNG may have different expectations of when it is (re)seeded.

The best solution (I think) is to have a unique PRNG with its own state for each thing that needs one. That would allow anyone to seed their own PRNG for reproducible results without interfering with other code. I'm not sure how easy that would be to do though.

de-vri-es · 2017-12-13T18:45:38Z

Looks like (atleast for numpy), this would be a good solution: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.RandomState.html . The alternative would be fiddling with get_state/set_state, but that would be horrible.

I would recommend that you use numpy.random.RandomState for your own algorithm so that you're not affected by others messing with the global PRNG.

We should also switch to a similar approach, but that requires rewriting the random augmentation code. That should be done together with #68 and #150.

mhwilder · 2017-12-13T23:10:39Z

Thanks for the suggestion. It seems that RandomState would be a good solution for me and would be good to tie in elsewhere.

de-vri-es · 2017-12-16T23:54:09Z

I'm assuming the problem is solved by using numpy.random.RandomState, so I'm closing this issue. Additionally, when #190 is merged, keras-retinet itself will never (re)seed any global PRNG anymore.

If the problem is not solved, feel free to request a re-open.

Thanks for reporting the issue!

de-vri-es mentioned this issue Dec 16, 2017

Add random_transform_generator for augmenting images. #190

Merged

de-vri-es closed this as completed Dec 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concern around random number seeding #181

Concern around random number seeding #181

mhwilder commented Dec 13, 2017

de-vri-es commented Dec 13, 2017

de-vri-es commented Dec 13, 2017 •

edited

Loading

mhwilder commented Dec 13, 2017

de-vri-es commented Dec 16, 2017 •

edited

Loading

Concern around random number seeding #181

Concern around random number seeding #181

Comments

mhwilder commented Dec 13, 2017

de-vri-es commented Dec 13, 2017

de-vri-es commented Dec 13, 2017 • edited Loading

mhwilder commented Dec 13, 2017

de-vri-es commented Dec 16, 2017 • edited Loading

de-vri-es commented Dec 13, 2017 •

edited

Loading

de-vri-es commented Dec 16, 2017 •

edited

Loading