Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement DY #2248

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Reimplement DY #2248

wants to merge 10 commits into from

Conversation

peterkrack
Copy link
Contributor

@peterkrack peterkrack commented Dec 11, 2024

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Dataset Status Check CovMat Check $t0$ CovMat Comments
DYE605_Z0_38P8GEV_DW_PXSEC The old implementation contains one extra-source of uncertainties
DYE866_Z0_800GEV_PXSEC Slight numerical differences due to rawdata source (with maxdiff of ~2%)
DYE866_Z0_800GEV_DW_RATIO_PDXSECRATIO -
DYE906_Z0_120GEV_DW_PDXSECRATIO -

These being said, none of the remaining differences are really visible at the data vs theory comparisons report.

Copy link
Member

@Radonirinaunimi Radonirinaunimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @peterkrack! Happy new year and thanks for this!

Here are some preliminary comments:

Could you please take of these first before I can move further into the details of the implementation?

@Radonirinaunimi
Copy link
Member

Hi @peterkrack, please let me know if there is anything I can help with in the meantime.

@Radonirinaunimi
Copy link
Member

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

@peterkrack
Copy link
Contributor Author

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

@Radonirinaunimi
Copy link
Member

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

I can also have a look in details, but would you be able to take care of #2248 (review) above?

@Radonirinaunimi Radonirinaunimi dismissed their stale review January 21, 2025 07:59

Already resolved!

@Radonirinaunimi Radonirinaunimi marked this pull request as ready for review January 22, 2025 08:13
@peterkrack
Copy link
Contributor Author

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Concerning the extra uncertainty:
rep00001 to rep00999
https://raw.githubusercontent.com/NNPDF/nnpdf/refs/tags/4.0.6/buildmaster/rawdata/DYE605/nuclear/output/tables/group_result_table.csv

Then in the old buildmaster
nrep is set to 1000

int nrep=1000;

then later on the loop runs from irep=0 to irep=999; one uncertainty too much

for(int irep=0; irep<nrep; irep++)

@enocera
Copy link
Contributor

enocera commented Jan 22, 2025

Dear @peterkrack @Radonirinaunimi, let me try to clarify the uncertainties for the DY E605 data set.

  • In the "old" commondata implementation, there were one statistical (uncorrelated) uncertainty and 1002 sources of systematic uncertainties. Of these 1002 sources, the first was a 10% uncorrelated (additive) uncertainty and the second a 15% correlated (multiplicative) normalisation uncertainty. The other 1000 uncertainties were "nuclear uncertainties", estimated as the difference between predictions obtained with proton and nuclear PDFs, taking the proton PDF fixed to the NNPDF4.0 central value and varying the nuclear predictions for each of the 1000 replicas in nNNPDF3.0.
  • It seems to me that this implementation is correctly propagated into the legacy data set. Indeed, if I look at uncertainties_legacy_PXSEC.yaml, there are one statistical uncertainty, the two aforementioned systematic uncertainties, and 1000 nuclear uncertainties. In uncertainties_reimplemented_PXSEC.yaml, I'd say that one nuclear uncertainty is missing. This is consistent with what @peterkrack noticed above.
  • Now, you're right in saying that there are only 999 nuclear uncertainties in the input file rawdata/nuclear/output/tables/group_result_table.csv. A replica must have gone missed in the generation process.

I see two ways of proceeding forward.

  1. I re-generate the rawdata/nuclear/output/tables/group_result_table.csv with the missing nuclear uncertainty.
  2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

@enocera
Copy link
Contributor

enocera commented Jan 22, 2025

I have a preference for option 2.

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented Jan 22, 2025

I see two ways of proceeding forward.

1. I re-generate the `rawdata/nuclear/output/tables/group_result_table.csv` with the missing nuclear uncertainty.

2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

Thanks for your reply @enocera! I agree with all of your points - and the conclusions. As you see in the report, this differences is negligible so I tend to lean towards the 2nd option (on top of the reason you said).

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kinematics_override needs to be set to the identity. The result_transform we can live with for the time being.

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

@Radonirinaunimi
Copy link
Member

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

This is probably very minor, but how would you call the process types then?

And are you happy with how the variants are called?

@scarlehoff
Copy link
Member

scarlehoff commented Jan 23, 2025

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

although FTDY would be better imho but 🤷‍♂️

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented Jan 23, 2025

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

Yep, this I know. But I was wondering if you want something specific after the _, ie DYP_XX.

although FTDY would be better imho but 🤷‍♂️

Accounting for this, I went for DYP_FT.

@scarlehoff
Copy link
Member

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

@Radonirinaunimi
Copy link
Member

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

When modifying the process options, I went for the easiest solution which is to simply add the variable M2 into the _dyboson_xq2map instead of modifying the variables here to be m_Z2 (although that should be the proper variable name). Doing the later turns out to be very messy as it also involves adding modifications to the filter rules.

Final Report

reimplemented:
data_uncertainties:
- uncertainties_reimplemented_PXSEC.yaml
data_central: data_reimplemented_PXSEC.yaml
Copy link
Member

@scarlehoff scarlehoff Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final comments. This should not be reimplemented, this should just be the normal data_uncertainties, not a variant.

Also, if we have data_reimplemented and it is the same as the old one, the old one should be removed. And I think for one of the 866 it needs to be kept because there were small differences in the data (in the rawdata)

Same for all the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one out of the four datasets, the new implementation has a slightly different central values (numerical fluctuations due to rawdata source), so not sure if we want to keep reimplemented for that one (?). But for the rest, I'll do asap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to keep the old data under legacy for that dataset, and the new implementation be the default for all (so not keep reimplemented for any)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants