Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Sampling parameters from distributions not working as intended #406

Open
alsnhll opened this issue Nov 26, 2024 · 4 comments
Open
Assignees
Labels
bug Defects or errors in the code. gempyor Concerns the Python core. high priority High priority.

Comments

@alsnhll
Copy link
Collaborator

alsnhll commented Nov 26, 2024

Label

bug, gempyor

Priority Label

high priority

Describe the bug/issue

when specifying parameters in the seir or outcomes section of a config, there is an option to specify a distribution, e.g.

seir:
  parameters:
    Ro:
      value: 
        distribution: truncnorm
        mean: 2.5
        sd: 0.1
        a: 1.1
        b: 3

The expected behavior is

  1. For each slot, a new random value of the parameter is drawn
  2. For multiple scenarios run from the same config, scenarios corresponding to the same slot should use the same value of all parameters

For seir parameters, neither (1) nor (2) is working. One random parameter is drawn and used for all slots, and each scenario has its own value.
For outcomes parameters, (1) is working but (2) is not. All scenarios and all slots have unique parameters

I haven't checked the behavior for seir_modifier (snpi) or outcome_modifier (hnpi) parameters yet but will update when I do.

To Reproduce

To see this problem for seir and outcome parameters, run this config (using a single call of gempyor-simulate) and look at the spar or hpar output files by slot and by scenario: https://github.com/HopkinsIDD/flepimop_sample/blob/main/config_sample_2pop_vaccine_scenarios.yml

Environment, if relevant

No response

@jcblemai
Copy link
Collaborator

jcblemai commented Nov 26, 2024

That might have changed but we really do not want to use several scenarios per config. Among the reasons for that:

  • It makes the code much more complicated than it should -- and outputs confusing
  • It makes commands meaning not clear (when I run a config, what do I want ? Do I want to fit each scenario ?)
  • It makes it hard to reproduce runs
  • it's strange to think of it like we do (if we have a scenario for modifiers, why not initial conditions ? and why not comparments ? at least parameters would be good).

and so one. It's been a few year that this has been the case practically (I think we never used several scenarios per config for SMH), and we decided to spot support it last year (one config = one run). It's also how the config writer is set up.

What you mention second is a bug: in the one_run that is used in simulation, the seed is drawn in each process as

# in seir.py
def onerun_SEIR(
    sim_id2write: int,
    modinf: model_info.ModelInfo,
    load_ID: bool = False,
    sim_id2load: int = None,
    config=None,
):
    np.random.seed()
    npi = None
    if modinf.npi_config_seir:
        npi = build_npi_SEIR(modinf=modinf, load_ID=load_ID, sim_id2load=sim_id2load, config=config)

But python subprocesses inherit the random process state so it should do like it is done for emcee

# somewhere
def run_simulation(snpi_df_in, hnpi_df_in, modinf, p_draw, unique_strings, transition_array, proportion_array, proportion_info, initial_conditions, seeding_data, seeding_amounts,outcomes_parameters, save=False):
    
    # We need to reseed because subprocess inherit of the same random generator state.
    np.random.seed(int.from_bytes(os.urandom(4), byteorder='little'))

this simple change (random state from the os random generator instead of numpy) will make each slot draw a different configuration in simulate.

@alsnhll
Copy link
Collaborator Author

alsnhll commented Nov 26, 2024

Sure ! I understand it would be hard to completely remove the idea of scenarios from how seir_modifiers and outcome_modifiers are working in gempyor right now, but I suggest we remove the requirement for all configs to have the sections seir_modifiers::scenarios and outcome_modifiers::scenarios sections if we aren't using them. And if we don't have scenarios, we don't even need to ever used the concept of StackedModifiers right? Shouldn't it just apply all the modifiers listed by default?

@alsnhll
Copy link
Collaborator Author

alsnhll commented Nov 26, 2024

To summarize, the part of the issue that is definitely a bug and should be fixed ASAP that a different SEIR parameter should be drawn for slot when a range is specified (as it is done for outcomes parameters). We can save for later the discussion/decision about how variable parameters should interact with multiple scenarios in the same config, and whether we should keep any functionality to run multiple scenarios per config

@TimothyWillard TimothyWillard added bug Defects or errors in the code. gempyor Concerns the Python core. high priority High priority. labels Nov 26, 2024
TimothyWillard added a commit that referenced this issue Dec 2, 2024
Created a test that shows the issue of the SEIR parameters not being
randomly drawn per a slot wheras the outcome parameters are. Test
currently fails.
@TimothyWillard TimothyWillard self-assigned this Dec 2, 2024
@jcblemai
Copy link
Collaborator

jcblemai commented Dec 3, 2024

I understand it would be hard to completely remove the idea of scenarios from how seir_modifiers and outcome_modifiers are working in gempyor right now, but I suggest we remove the requirement for all configs to have the sections seir_modifiers::scenarios and outcome_modifiers::scenarios sections if we aren't using them.

And if we don't have scenarios, we don't even need to ever used the concept of StackedModifiers right? Shouldn't it just apply all the modifiers listed by default?

yeah, that was the plan. For the record/in case, this is useful for someone: internally it's a bit (not too much, but enough to put that on hold) complicated. If you remove the scenario, you will arrive in this condition:

            if config["seir_modifiers"].exists():
                if config["seir_modifiers"]["scenarios"].exists():
                    self.npi_config_seir = config["seir_modifiers"]["modifiers"][seir_modifiers_scenario]
                    self.seir_modifiers_library = config["seir_modifiers"]["modifiers"].get()
                else:
                    self.seir_modifiers_library = config["seir_modifiers"]["modifiers"].get()
                    raise ValueError("Not implemented yet")  # TODO create a Stacked from all

As you see, that supports the old syntax, and if you provide no scenario it raises a ValueError where instead it should just create a StackedModifiers from all the modifiers. That's not too hard to do, but I never did it because I wanted to rewrite modifiers. The Modifiers code is awful: there should just be a single type (all are nested indeed) and it could be much shorter/sequential. Instead, StackedModifiers are necessary entry points to the modifiers which are then recursively called. And their interface with EMCEE/classical inference is messy.

Thanks, @TimothyWillard for working on the other critical underlying bug and alison for raising

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Defects or errors in the code. gempyor Concerns the Python core. high priority High priority.
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants