Reimplement DY #2248

peterkrack · 2024-12-11T14:59:44Z

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Dataset	Status	Check CovMat	Check $t0$ CovMat	Comments
DYE605_Z0_38P8GEV_DW_PXSEC	❌	❌	❌	The old implementation contains one extra-source of uncertainties
DYE866_Z0_800GEV_PXSEC	✅	❌	❌	Slight numerical differences due to rawdata source (with maxdiff of ~2%)
DYE866_Z0_800GEV_DW_RATIO_PDXSECRATIO	✅	✅	✅	-
DYE906_Z0_120GEV_DW_PDXSECRATIO	✅	✅	✅	-

These being said, none of the remaining differences are really visible at the data vs theory comparisons report.

Radonirinaunimi

Hi @peterkrack! Happy new year and thanks for this!

Here are some preliminary comments:

Could you please fix the metedata regarding the arXiv, inSpire, and HepData urls? These will be crucial when moving into Remove HepData raw tables from the repository #2228
Could you make sure that the filters run properly. Currently, this is not the case as these filters contain some calls to non-existing functions.

Add the following to the filters to properly format the floats:

nnpdf/nnpdf_data/nnpdf_data/commondata/ATHENA_NC_45GEV_EP/filter.py

Lines 12 to 14 in 3f59024

    
           from nnpdf_data.filter_utils.utils import prettify_float 
        
           yaml.add_representer(float, prettify_float)

Use pre-commit hooks https://pre-commit.com/ to make sure that all the files are properly formatted before each commit.

Could you please take of these first before I can move further into the details of the implementation?

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py

nnpdf_data/nnpdf_data/commondata/DYE866_Z0_800GEV/filter.py

Radonirinaunimi · 2025-01-10T05:53:37Z

Hi @peterkrack, please let me know if there is anything I can help with in the meantime.

Radonirinaunimi · 2025-01-14T07:28:19Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

peterkrack · 2025-01-14T10:14:20Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

Radonirinaunimi · 2025-01-14T13:08:23Z

I was wondering if there are some updates regarding the pending issues above and whether some helps are needed?

For DYE605_Z0_38P8GEV_DW is suspect it is a bug in the old buildmaster. The other two I have to figure out what exactly is causing the discrepancy between legacy and reimplementation.

I can also have a look in details, but would you be able to take care of #2248 (review) above?

Already resolved!

peterkrack · 2025-01-22T10:48:20Z

Overall, this is close to completion except for DYE605_Z0_38P8GEV_DW_PXSEC in which the legacy implementation contains one extra-source of systematic uncertainties (cc @enocera).

Concerning the extra uncertainty:
rep00001 to rep00999
https://raw.githubusercontent.com/NNPDF/nnpdf/refs/tags/4.0.6/buildmaster/rawdata/DYE605/nuclear/output/tables/group_result_table.csv

Then in the old buildmaster
nrep is set to 1000

nnpdf/buildmaster/filters/FTDY.cc

Line 151 in f6c49ae

int nrep=1000;

then later on the loop runs from irep=0 to irep=999; one uncertainty too much

nnpdf/buildmaster/filters/FTDY.cc

Line 206 in f6c49ae

for(int irep=0; irep<nrep; irep++)

enocera · 2025-01-22T14:39:29Z

Dear @peterkrack @Radonirinaunimi, let me try to clarify the uncertainties for the DY E605 data set.

In the "old" commondata implementation, there were one statistical (uncorrelated) uncertainty and 1002 sources of systematic uncertainties. Of these 1002 sources, the first was a 10% uncorrelated (additive) uncertainty and the second a 15% correlated (multiplicative) normalisation uncertainty. The other 1000 uncertainties were "nuclear uncertainties", estimated as the difference between predictions obtained with proton and nuclear PDFs, taking the proton PDF fixed to the NNPDF4.0 central value and varying the nuclear predictions for each of the 1000 replicas in nNNPDF3.0.
It seems to me that this implementation is correctly propagated into the legacy data set. Indeed, if I look at uncertainties_legacy_PXSEC.yaml, there are one statistical uncertainty, the two aforementioned systematic uncertainties, and 1000 nuclear uncertainties. In uncertainties_reimplemented_PXSEC.yaml, I'd say that one nuclear uncertainty is missing. This is consistent with what @peterkrack noticed above.
Now, you're right in saying that there are only 999 nuclear uncertainties in the input file rawdata/nuclear/output/tables/group_result_table.csv. A replica must have gone missed in the generation process.

I see two ways of proceeding forward.

I re-generate the rawdata/nuclear/output/tables/group_result_table.csv with the missing nuclear uncertainty.
We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

enocera · 2025-01-22T14:40:45Z

I have a preference for option 2.

Radonirinaunimi · 2025-01-22T14:48:52Z

I see two ways of proceeding forward.

1. I re-generate the `rawdata/nuclear/output/tables/group_result_table.csv` with the missing nuclear uncertainty.

2. We live with 999 nuclear uncertainties for the time being and we forget about the 1000th. Nuclear uncertainties will have to be revisited anyways for NNPDF4.1.

Thanks for your reply @enocera! I agree with all of your points - and the conclusions. As you see in the report, this differences is negligible so I tend to lean towards the 2nd option (on top of the reason you said).

scarlehoff

The kinematics_override needs to be set to the identity. The result_transform we can live with for the time being.

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

Radonirinaunimi · 2025-01-23T05:41:08Z

Probably the process type needs to be changed as well, we shoudn't have a process type per dataset...

This is probably very minor, but how would you call the process types then?

And are you happy with how the variants are called?

scarlehoff · 2025-01-23T08:12:50Z

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

although FTDY would be better imho but 🤷‍♂️

Radonirinaunimi · 2025-01-23T09:56:50Z

The variables are ok. For the process type... use DYP, otherwise you will need to dive into validphys to change a few things...

Yep, this I know. But I was wondering if you want something specific after the _, ie DYP_XX.

although FTDY would be better imho but 🤷‍♂️

Accounting for this, I went for DYP_FT.

scarlehoff · 2025-01-23T11:43:09Z

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

Radonirinaunimi · 2025-01-23T21:29:40Z

yes, that sounds ok, as long as the first three letters are DYP it will go through vp ok. It needs to be added to process options though.

When modifying the process options, I went for the easiest solution which is to simply add the variable M2 into the _dyboson_xq2map instead of modifying the variables here to be m_Z2 (although that should be the proper variable name). Doing the later turns out to be very messy as it also involves adding modifications to the filter rules.

Final Report

scarlehoff · 2025-01-24T12:28:11Z

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/metadata.yaml

+    reimplemented:
+      data_uncertainties:
+      - uncertainties_reimplemented_PXSEC.yaml
+  data_central: data_reimplemented_PXSEC.yaml


Some final comments. This should not be reimplemented, this should just be the normal data_uncertainties, not a variant.

Also, if we have data_reimplemented and it is the same as the old one, the old one should be removed. And I think for one of the 866 it needs to be kept because there were small differences in the data (in the rawdata)

Same for all the others.

For one out of the four datasets, the new implementation has a slightly different central values (numerical fluctuations due to rawdata source), so not sure if we want to keep reimplemented for that one (?). But for the rest, I'll do asap.

we want to keep the old data under legacy for that dataset, and the new implementation be the default for all (so not keep reimplemented for any)

peterkrack requested a review from enocera December 11, 2024 15:01

RoyStegeman mentioned this pull request Dec 11, 2024

Final revision of the 4.0 dataset #2242

Open

5 tasks

Radonirinaunimi self-requested a review December 17, 2024 06:53

Radonirinaunimi previously requested changes Jan 8, 2025

View reviewed changes

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py Outdated Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/DYE605_Z0_38P8GEV_DW/filter.py Outdated Show resolved Hide resolved

nnpdf_data/nnpdf_data/commondata/DYE866_Z0_800GEV/filter.py Outdated Show resolved Hide resolved

scarlehoff added the data toolchain label Jan 8, 2025

Radonirinaunimi force-pushed the reimplement-DY branch from 422f1d1 to 567ec45 Compare January 10, 2025 05:53

Radonirinaunimi force-pushed the reimplement-DY branch from 567ec45 to fb1b0a7 Compare January 21, 2025 06:38

Radonirinaunimi marked this pull request as ready for review January 22, 2025 08:13

peterkrack and others added 8 commits January 22, 2025 09:19

initial commit

e93ea4c

add tables for uncertainties to rawdata

59c1bef

initial commit

e8aa8e8

Run pre-commit hooks

10bd470

Fix metadata

017546e

Make filters run and clean up

9bf44b2

Use re-implemented kinematics and data in metadata

ab9d709

Minor cosmetic changes

44c117e

Radonirinaunimi force-pushed the reimplement-DY branch from bf2ff28 to 44c117e Compare January 22, 2025 08:19

scarlehoff requested changes Jan 22, 2025

View reviewed changes

Fix process_type and kinematic_override

ce08e12

Fix process_options and minor metadata entries

2160d3f

scarlehoff reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement DY #2248

Reimplement DY #2248

peterkrack commented Dec 11, 2024 •

edited by Radonirinaunimi

Loading

Radonirinaunimi left a comment

Radonirinaunimi commented Jan 10, 2025

Radonirinaunimi commented Jan 14, 2025

peterkrack commented Jan 14, 2025

Radonirinaunimi commented Jan 14, 2025

peterkrack commented Jan 22, 2025

enocera commented Jan 22, 2025

enocera commented Jan 22, 2025

Radonirinaunimi commented Jan 22, 2025 •

edited

Loading

scarlehoff left a comment

Radonirinaunimi commented Jan 23, 2025

scarlehoff commented Jan 23, 2025 •

edited

Loading

Radonirinaunimi commented Jan 23, 2025 •

edited

Loading

scarlehoff commented Jan 23, 2025

Radonirinaunimi commented Jan 23, 2025

scarlehoff Jan 24, 2025 •

edited

Loading

Radonirinaunimi Jan 24, 2025

scarlehoff Jan 24, 2025

	from nnpdf_data.filter_utils.utils import prettify_float

	yaml.add_representer(float, prettify_float)

Reimplement DY #2248

Are you sure you want to change the base?

Reimplement DY #2248

Conversation

peterkrack commented Dec 11, 2024 • edited by Radonirinaunimi Loading

Radonirinaunimi left a comment

Choose a reason for hiding this comment

Radonirinaunimi commented Jan 10, 2025

Radonirinaunimi commented Jan 14, 2025

peterkrack commented Jan 14, 2025

Radonirinaunimi commented Jan 14, 2025

peterkrack commented Jan 22, 2025

enocera commented Jan 22, 2025

enocera commented Jan 22, 2025

Radonirinaunimi commented Jan 22, 2025 • edited Loading

scarlehoff left a comment

Choose a reason for hiding this comment

Radonirinaunimi commented Jan 23, 2025

scarlehoff commented Jan 23, 2025 • edited Loading

Radonirinaunimi commented Jan 23, 2025 • edited Loading

scarlehoff commented Jan 23, 2025

Radonirinaunimi commented Jan 23, 2025

scarlehoff Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Radonirinaunimi Jan 24, 2025

Choose a reason for hiding this comment

scarlehoff Jan 24, 2025

Choose a reason for hiding this comment

peterkrack commented Dec 11, 2024 •

edited by Radonirinaunimi

Loading

Radonirinaunimi commented Jan 22, 2025 •

edited

Loading

scarlehoff commented Jan 23, 2025 •

edited

Loading

Radonirinaunimi commented Jan 23, 2025 •

edited

Loading

scarlehoff Jan 24, 2025 •

edited

Loading