Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Different behaviour when running one_simulation(), compared to gempyor-simulate #373

Open
saraloo opened this issue Oct 29, 2024 · 6 comments
Labels
gempyor Concerns the Python core. low priority Low priority.

Comments

@saraloo
Copy link
Contributor

saraloo commented Oct 29, 2024

Label

gempyor

Priority Label

low priority

Describe the bug/issue

I have a config with a long chunk of outcomes. Not sure what is going on under the hood here but I get an error when running the config from a python notebook using one_simulation() but do not get any error when running gempyor-simulate from CLI.

To Reproduce

Config is https://github.com/HopkinsIDD/RSV_USA/blob/main/config_rsvnet_2024_vaccination_emcee_A.yml

When running the following command in python

gempyor_simulator = gempyor.GempyorInference(
    config_filepath="config_rsvnet_2024_vaccination_emcee_A.yml",
    run_id="test_run_id",
    prefix="test_prefix/",
    stoch_traj_flag=False,
    autowrite_seir=True,
)
gempyor_simulator.one_simulation(sim_id2write=0)

I get the following error

ValueError: ERROR with outcome incidH_age65to100: the specified source incidH_age65to100_v2 is not a dictionnary (for seir outcome) nor an existing pre-identified outcomes.

But instead, running the following works with no errors

gempyor-simulate -c config_rsvnet_2024_vaccination_emcee_A.yml -n 1

Environment, if relevant

No response

@saraloo saraloo added gempyor Concerns the Python core. low priority Low priority. labels Oct 29, 2024
@jcblemai
Copy link
Collaborator

jcblemai commented Dec 2, 2024

The link to the config does not work

@saraloo
Copy link
Contributor Author

saraloo commented Dec 2, 2024

updated, sorry

@jcblemai
Copy link
Collaborator

jcblemai commented Dec 2, 2024

That's interesting because the path of outcomes is roughly the same in the two cases:

  • parameters = read_parameters_from_config(modinf)
  • [NPI, loaded values, not relevant here]

and finally

        outcomes_df, hpar = compute_all_multioutcomes(
            modinf=modinf,
            sim_id2write=sim_id2write,
            parameters=parameters,
            loaded_values=loaded_values,
            npi=npi_outcomes,
        )

Though in the one_simulation case, the dictionary that is called parameters is not ordered in the same order as the config:

>>>  print(gempyor_simulator.outcomes_parameters.keys())
dict_keys(['incidH_age65to100', 'incidH_age50to64', 'incidH_age18to49', 'incidH_age5to17', 'incidH_age1to4', 'incidH_age0to11m', 'incidH_age65to100_v2', 'incidH_age50to64_v2', 'incidH_age18to49_v2', 'incidH_age5to17_v2', 'incidH_age1to4_v2', 'incidH_age0to11m_v2', 'incidH_0prior_age0to2m', 'incidH_1prior_age0to2m', 'incidH_2prior_age0to2m', 'incidH_0prior_age3to5m', 'incidH_1prior_age3to5m', 'incidH_2prior_age3to5m', 'incidH_0prior_age6to11m', 'incidH_1prior_age6to11m', 'incidH_2prior_age6to11m', 'incidH_0prior_age1to2', 'incidH_1prior_age1to2', 'incidH_2prior_age1to2', 'incidH_0prior_age3to4', 'incidH_1prior_age3to4', 'incidH_2prior_age3to4', 'incidH_0prior_age5to17', 'incidH_1prior_age5to17', 'incidH_2prior_age5to17', 'incidH_0prior_age18to49', 'incidH_1prior_age18to49', 'incidH_2prior_age18to49', 'incidH_0prior_age50to64', 'incidH_1prior_age50to64', 'incidH_2prior_age50to64', 'incidH_0prior_age65to100_none', 'incidH_0prior_age65to100_vaccsenior', 'incidH_0prior_age65to100_waned', 'incidH_1prior_age65to100_none', 'incidH_1prior_age65to100_vaccsenior', 'incidH_1prior_age65to100_waned', 'incidH_2prior_age65to100_none', 'incidH_2prior_age65to100_vaccsenior', 'incidH_2prior_age65to100_waned', 'incidH_age0to11m_v1', 'incidH_age1to4_v1', 'incidH_age5to17_v1', 'incidH_age18to49_v1', 'incidH_age50to64_v1', 'incidH_age65to100_v1', 'incidH'])

which cause this error. I have a zoom and will dig after what is causing this.

@saraloo
Copy link
Contributor Author

saraloo commented Dec 2, 2024

Oh, weird. Thanks for looking in to it. I had made sure to write it in order top to bottom of the config chunk. I'll try and dig into this too now that it's clearer where the error is.

jcblemai added a commit that referenced this issue Dec 3, 2024
@jcblemai
Copy link
Collaborator

jcblemai commented Dec 3, 2024

No no this is clearly a gempyor issue. what happens is

So both endpoints run the same, the only difference is the context in which it is run. Going through one_simulation, the parameter reading:

outcomes_parameters = outcomes.read_parameters_from_config(modinf)

is done two times instead of one:

  • once to generate the static arguments for EMCEE
  • a second time to build the model for classical inference

(there should just be one call but that's another issue).

Adding a print to these parameter keys (it's a dictionary) shows us that the first call returns the outcomes in the right order while the second has them shuffled.

I strongly believe this is caused by

def read_parameters_from_config(modinf: model_info.ModelInfo):
        |...|
        outcomes_config = modinf.outcomes_config["outcomes"]
        |...|
                if outcomes_config[new_comp]["delay"].exists():
                    |...|
                else:
                    logging.critical(f"No delay for outcome {new_comp}, using a 0 delay")
                    outcomes_config[new_comp]["delay"] = {"value": 0}
                    |...|

This line outcomes_config[new_comp]["delay"] = {"value": 0} modifies the underlying confuse object (everything here is passed by reference)... I think it causes a change of order and places the modified outcomes at the beginning of the object instead of modifying it in place. Very annoying. Indeed the outcomes affected are ones without delay so the delay is put to 0:

>>> gempyor_simulator.one_simulation(sim_id2write=0)
CRITICAL:root:No delay for outcome incidH_age0to11m_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age1to4_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age5to17_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age18to49_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age50to64_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age65to100_v2, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age0to11m, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age1to4, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age5to17, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age18to49, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age50to64, using a 0 delay
CRITICAL:root:No delay for outcome incidH_age65to100, using a 0 delay

This is a big bug -- I guess we just encountered it now because we never ran two times these commands (within emcee, or within classical inference, the result is cached). So to fix this, you could use emcee_batch as it is and add a delay: 0 to these outcomes, or I've made a quick fix in the branch emcee_batch_fix, checked that it would work in your case. I'll let @pearsonca @TimothyWillard decide what to do about it. It's not a great fix, see #408.

thanks for including very easy commands to reproduce.

TimothyWillard added a commit that referenced this issue Dec 4, 2024
Move the call to `get_static_arguments` under the 'emcee' inference
branch. This avoids calling `outcomes.read_parameters_from_config`
twice. That function has a side effect of editing the outcome modifiers
config. See
#373 (comment)
for details.
@saraloo
Copy link
Contributor Author

saraloo commented Dec 5, 2024

Thanks joseph - this is a super clear explanation and really helpful to understand the code. The quick fix worked fine for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gempyor Concerns the Python core. low priority Low priority.
Projects
None yet
Development

No branches or pull requests

2 participants