[Bug]: Sampling parameters from distributions not working as intended #406

alsnhll · 2024-11-26T15:22:48Z

Label

bug, gempyor

Priority Label

high priority

Describe the bug/issue

when specifying parameters in the seir or outcomes section of a config, there is an option to specify a distribution, e.g.

seir:
  parameters:
    Ro:
      value: 
        distribution: truncnorm
        mean: 2.5
        sd: 0.1
        a: 1.1
        b: 3

The expected behavior is

For each slot, a new random value of the parameter is drawn
For multiple scenarios run from the same config, scenarios corresponding to the same slot should use the same value of all parameters

For seir parameters, neither (1) nor (2) is working. One random parameter is drawn and used for all slots, and each scenario has its own value.
For outcomes parameters, (1) is working but (2) is not. All scenarios and all slots have unique parameters

I haven't checked the behavior for seir_modifier (snpi) or outcome_modifier (hnpi) parameters yet but will update when I do.

To Reproduce

To see this problem for seir and outcome parameters, run this config (using a single call of gempyor-simulate) and look at the spar or hpar output files by slot and by scenario: https://github.com/HopkinsIDD/flepimop_sample/blob/main/config_sample_2pop_vaccine_scenarios.yml

Environment, if relevant

No response

The text was updated successfully, but these errors were encountered:

jcblemai · 2024-11-26T15:52:31Z

That might have changed but we really do not want to use several scenarios per config. Among the reasons for that:

It makes the code much more complicated than it should -- and outputs confusing
It makes commands meaning not clear (when I run a config, what do I want ? Do I want to fit each scenario ?)
It makes it hard to reproduce runs
it's strange to think of it like we do (if we have a scenario for modifiers, why not initial conditions ? and why not comparments ? at least parameters would be good).

and so one. It's been a few year that this has been the case practically (I think we never used several scenarios per config for SMH), and we decided to spot support it last year (one config = one run). It's also how the config writer is set up.

What you mention second is a bug: in the one_run that is used in simulation, the seed is drawn in each process as

# in seir.py
def onerun_SEIR(
    sim_id2write: int,
    modinf: model_info.ModelInfo,
    load_ID: bool = False,
    sim_id2load: int = None,
    config=None,
):
    np.random.seed()
    npi = None
    if modinf.npi_config_seir:
        npi = build_npi_SEIR(modinf=modinf, load_ID=load_ID, sim_id2load=sim_id2load, config=config)

But python subprocesses inherit the random process state so it should do like it is done for emcee

# somewhere
def run_simulation(snpi_df_in, hnpi_df_in, modinf, p_draw, unique_strings, transition_array, proportion_array, proportion_info, initial_conditions, seeding_data, seeding_amounts,outcomes_parameters, save=False):
    
    # We need to reseed because subprocess inherit of the same random generator state.
    np.random.seed(int.from_bytes(os.urandom(4), byteorder='little'))

this simple change (random state from the os random generator instead of numpy) will make each slot draw a different configuration in simulate.

alsnhll · 2024-11-26T16:47:58Z

Sure ! I understand it would be hard to completely remove the idea of scenarios from how seir_modifiers and outcome_modifiers are working in gempyor right now, but I suggest we remove the requirement for all configs to have the sections seir_modifiers::scenarios and outcome_modifiers::scenarios sections if we aren't using them. And if we don't have scenarios, we don't even need to ever used the concept of StackedModifiers right? Shouldn't it just apply all the modifiers listed by default?

alsnhll · 2024-11-26T18:45:17Z

To summarize, the part of the issue that is definitely a bug and should be fixed ASAP that a different SEIR parameter should be drawn for slot when a range is specified (as it is done for outcomes parameters). We can save for later the discussion/decision about how variable parameters should interact with multiple scenarios in the same config, and whether we should keep any functionality to run multiple scenarios per config

Created a test that shows the issue of the SEIR parameters not being randomly drawn per a slot wheras the outcome parameters are. Test currently fails.

jcblemai · 2024-12-03T06:25:53Z

I understand it would be hard to completely remove the idea of scenarios from how seir_modifiers and outcome_modifiers are working in gempyor right now, but I suggest we remove the requirement for all configs to have the sections seir_modifiers::scenarios and outcome_modifiers::scenarios sections if we aren't using them.

And if we don't have scenarios, we don't even need to ever used the concept of StackedModifiers right? Shouldn't it just apply all the modifiers listed by default?

yeah, that was the plan. For the record/in case, this is useful for someone: internally it's a bit (not too much, but enough to put that on hold) complicated. If you remove the scenario, you will arrive in this condition:

            if config["seir_modifiers"].exists():
                if config["seir_modifiers"]["scenarios"].exists():
                    self.npi_config_seir = config["seir_modifiers"]["modifiers"][seir_modifiers_scenario]
                    self.seir_modifiers_library = config["seir_modifiers"]["modifiers"].get()
                else:
                    self.seir_modifiers_library = config["seir_modifiers"]["modifiers"].get()
                    raise ValueError("Not implemented yet")  # TODO create a Stacked from all

As you see, that supports the old syntax, and if you provide no scenario it raises a ValueError where instead it should just create a StackedModifiers from all the modifiers. That's not too hard to do, but I never did it because I wanted to rewrite modifiers. The Modifiers code is awful: there should just be a single type (all are nested indeed) and it could be much shorter/sequential. Instead, StackedModifiers are necessary entry points to the modifiers which are then recursively called. And their interface with EMCEE/classical inference is messy.

Thanks, @TimothyWillard for working on the other critical underlying bug and alison for raising

TimothyWillard added bug Defects or errors in the code. gempyor Concerns the Python core. high priority High priority. labels Nov 26, 2024

TimothyWillard added a commit that referenced this issue Dec 2, 2024

Test to demonstrate bug GH-406

2f93ec3

Created a test that shows the issue of the SEIR parameters not being randomly drawn per a slot wheras the outcome parameters are. Test currently fails.

TimothyWillard self-assigned this Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Sampling parameters from distributions not working as intended #406

[Bug]: Sampling parameters from distributions not working as intended #406

alsnhll commented Nov 26, 2024

jcblemai commented Nov 26, 2024 •

edited

Loading

alsnhll commented Nov 26, 2024

alsnhll commented Nov 26, 2024

jcblemai commented Dec 3, 2024

[Bug]: Sampling parameters from distributions not working as intended #406

[Bug]: Sampling parameters from distributions not working as intended #406

Comments

alsnhll commented Nov 26, 2024

Label

Priority Label

Describe the bug/issue

To Reproduce

Environment, if relevant

jcblemai commented Nov 26, 2024 • edited Loading

alsnhll commented Nov 26, 2024

alsnhll commented Nov 26, 2024

jcblemai commented Dec 3, 2024

jcblemai commented Nov 26, 2024 •

edited

Loading