Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making theory order local per dataset #99

Merged
merged 16 commits into from
Nov 29, 2024
Merged

Making theory order local per dataset #99

merged 16 commits into from
Nov 29, 2024

Conversation

jacoterh
Copy link
Collaborator

This PR removes the pQCD order key in the runcard and replaces it by locally specifying it per dataset, e.g:

datasets:

  ATLAS_CMS_SSinc_RunI: {"order": "NLO_QCD"}
  ATLAS_CMS_tt_AC_8TeV: {"order": "NLO_QCD"}
  ATLAS_SSinc_RunII: {"order": "NLO_QCD"}
  ATLAS_STXS_runII_13TeV: {"order": "NLO_QCD"}
  ATLAS_WH_Hbb_13TeV: {"order": "NLO_QCD"}

The reason we need this feature is because we now have different higher order corrections that we like to distinguish: NLO_QCD, NLO_EW, NLO_QCD_EW. The theory files need updating accordingly, which is done in PR#25 on smefit_database.

@jacoterh jacoterh marked this pull request as ready for review November 26, 2024 11:40
@jacoterh jacoterh added enhancement New feature or request data labels Nov 26, 2024
@LucaMantani
Copy link
Collaborator

@jacoterh Correct me if wrong, this will be fully compatible with current tables as long as we specify "NLO" or "LO" for each dataset, right?

@jacoterh
Copy link
Collaborator Author

Yes, that's right, the only thing that mattes is that the order specified for each dataset also appears in the theory json. This can be any key as you point out, but PR#25 on smefit_database actually updates the keys in the jsons to our new convention (i.e. NLO_QCD, NLO_EW, etc)

@arossia94
Copy link
Collaborator

@jacoterh, is there any default behavior for the case in which one forgets to specify the order for a dataset? That could be useful in case one reuses an old runcard.
I'd print a warning to the user and either exclude the dataset or use it with LO (assuming all datasets will have at least LO).

src/smefit/loader.py Outdated Show resolved Hide resolved
tests/test_fisher.py Outdated Show resolved Hide resolved
tests/test_optimize.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@arossia94 arossia94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor requests regarding default and limit behaviors.
Should we also update the documentation within this same PR before forgetting about this change?

@jacoterh
Copy link
Collaborator Author

Should be ready to go. Updated behaviour

datasets:
  - dataset_1
  - dataset_2: {"order": "NLO_QCD"}
  - dataset_3: {"order": "NLO_EW"}

where the predictions of dataset_1 are LO (default behaviour), dataset_2 NLO_QCD and dataset_3 at NLO_EW

@jacoterh
Copy link
Collaborator Author

Updated syntax (no mixed types anymore)

datasets:
  - name: dataset_1
  - name: dataset_2
    order: NLO_QCD
  - name: dataset_3
    order: "NLO_EW

This syntax is future proof as it supports naturally additional properties we might want to specify per dataset, i.e. cuts.

@LucaMantani
Copy link
Collaborator

Updated syntax (no mixed types anymore)

datasets:
  - name: dataset_1
  - name: dataset_2
    order: NLO_QCD
  - name: dataset_3
    order: "NLO_EW

This syntax is future proof as it supports naturally additional properties we might want to specify per dataset, i.e. cuts.

So now you made it a list of dictionaries, right? it's more similar to the NNPDF format. I am in favour of this but I thought we wanted to keep the functioning of the old runcards?

@jacoterh
Copy link
Collaborator Author

That's right, it's a list of dictionaries, so it'd be equivalent to doing

datasets:
  - {name: dataset_1}
  - {name: dataset_2, order: NLO_QCD}
  - {name: dataset_3, order: "NLO_EW

This breaks compatibility with the old runcards, but for this we can provide a short script to convert them. In any case it's a small difference.

Copy link
Collaborator

@LucaMantani LucaMantani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment on the order datasets are looped through but for the rest it looks fine.


_logger.info(f"Applying cutoff scale: {cutoff_scale} GeV.")
for sset in np.unique(datasets):
for sset in datasets:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in similar loops (the one in load_rge_mat for example) datasets are looped without being put in alphabetical order now? I am a bit worried about having mismatches, are we guaranteed that things are ordered correctly?

Copy link
Collaborator Author

@jacoterh jacoterh Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your concern, but datasets is a list which has a well defined ordering, so there's no need to sort anything (as opposed to dictionaries). But it was a list also before, and yet it got sorted. Perhaps @giacomomagni can comment if/why this was necessary at the time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the order is always the one put in the card basically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the unique was there just to be sure a dataset is not loaded twice, but yes from the DataTuple you can always infer the order.

use_quad,
use_theory_covmat,
use_t0,
use_multiplicative_prescription,
default_order="LO",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to have a default "LO" ? Do you need to load a dataset without any theory being loaded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the default theory order that gets loaded if one doesn't specify it for a particular dataset. Of course, the default can be set to NLO as well, but not all predictions are available at NLO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but sorry to be picky why do you want to have a default ?
I mean if you load, you need to specify an order

src/smefit/loader.py Outdated Show resolved Hide resolved

_logger.info(f"Applying cutoff scale: {cutoff_scale} GeV.")
for sset in np.unique(datasets):
for sset in datasets:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the unique was there just to be sure a dataset is not loaded twice, but yes from the DataTuple you can always infer the order.

Co-authored-by: Giacomo Magni <[email protected]>
@jacoterh jacoterh merged commit ec2b607 into main Nov 29, 2024
5 checks passed
@jacoterh jacoterh deleted the NLO-keys branch November 29, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants