-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add y labels, integrate utils in filters and remove comma in sys names #2020
Conversation
@@ -1,7 +1,59 @@ | |||
import yaml | |||
from validphys.commondata_utils import symmetrize_errors as se | |||
from validphys.commondata_utils import percentage_to_absolute as pta | |||
from validphys.commondata_utils import covmat_to_artunc as cta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the covmat_to_artunc
was supposed to become a generic function to be used by all (since there are some datasets by @Radonirinaunimi that also use it).
Unless you already talked among yourselves and decided to take this one out as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the commondata utils will have to include the cormat_to_covmat
, otherwise all of the collider DY datasets will break. This PR, unfortunately however, does not introduce/address the utils yet (in the form we discussed at the CM) - which will still have to be done in a separate PR.
And in that respect, I absolutely don't think such function should be re-implemented in all of the filters which require it as it is done here.
Is this supposed to be merged in favor of #1693, while closing that one? P.S. even if it's a lot of repetition, having descriptive names and docstrings beats the alternative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixes regarding the labels and sysnames look good to me. @t7phy could you attach in the description the corresponding data vs. theory comparisons for these datasets?
As for the utils, let's discuss at the CM the base functions that must crucially form the utils. For the time being, covmat_to_artunc
should be reverted back.
@@ -1,7 +1,59 @@ | |||
import yaml | |||
from validphys.commondata_utils import symmetrize_errors as se | |||
from validphys.commondata_utils import percentage_to_absolute as pta | |||
from validphys.commondata_utils import covmat_to_artunc as cta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a function for sure should be part of the utils and for the purpose of this PR it should be still called from validphys.commondata_utils
.
This has been discussed, and this was the agreed upon solution: #1693 (comment) At the same CM @comane also offered to take charge of fixing the |
I indeed had in mind @comane regarding the other utils as he has been implementing datasets which also require base functions (that I think should be part of the utils, such as the one below for example): nnpdf/validphys2/src/validphys/datafiles/new_commondata/CMS_WPWM_13TEV_ETA/filter_utils.py Line 82 in e945ba5
But if this doesn't happen soon, I'd happy for only having |
My idea behind the choice to decouple it from current utils PR was that once we have finalized the actual utils (i.e. taking ones we need from Mark and ones we need from my PR), we should then and only then reintroduce dependencies. It would then be a very simple task to remove a function and add an import, however keeping unworking filter files any longer is not a good idea. I suggest we merge this, and then once we finalize the actual utils, we can come back to filters to remove redundant functions, thus ensuring the filters always work. Please also see that since we are moving towards creation of a data package, we may actually have to import from nnpdf_data.utils etc, so I think it is important to only add dependencies once the dependencies are by themselves finalized. @Radonirinaunimi I would have to redo data theory comparison, if you would like that, I but I am not sure why that would change?data, kin, uncertainties are all the same. |
Personally I would still keep only a single version of each function and import from there, but fine, I won't insist on it (but of course I don't talk for @Radonirinaunimi). I would still like the function names and docstrings to be there in full. Please also close #1693 and write a helpful summary in the first comment of this PR. Oh and please run |
I was exactly about to write something along that line.
I don't think this stands at all. The idea is that once this is merged, hopefully we will not touch them anymore But I won't also insist on this.
Well the idea is to finally see the fixed labels on plots in case there are uncaught (latex/typographical) errors. |
@RoyStegeman I can add the docstrings but I would like to keep the names as is... I was using @Radonirinaunimi yes I forgot about the need to check the latex labels, I will share the DTC once I have it. |
A search-replace would solve that, so would putting them in one of the files and importing from there. |
The problem isn't applying search and replace, it's that sometimes multiple functions (as many as 3 or 4) are used in the same line because one requires output of another which requires output of yet another and using the full names makes the lines unnecessarily long. At this point it seems we are over complicating things for no reason. #1693 is dead. All these functions are now local to the filter files and used as they are needed. so complaining about their names is counter productive because then we may as well have a look at every single filter file in every single implementation to see how the functions are named. |
This should be avoided whenever possible. |
I am sorry but at this point I am unable to dedicate time to rewrite filter files. I am happy to produce a DTC, add docstrings and make the formatting changes using black and isort for this PR. If people prefer to change every single thing about the implementation, please feel free to delete this PR and reimplement the datasets. If I need to generate a covmat, based on corrmat provided by the experimentalists and based on unsymm percentage uncert. Clearly I need to convert from percentage to absolute, then symterize them, then use them to convert corrmat to covmat. The util functions in the utils PR were designed such that these functions worked perfectly with one another and the final artUnc could be obtained using very compact code. Complaining about all these points without looking at how data is provided from hepdata for particular edge cases is counter productive and I am not able to dedicate more time towards these implementations. |
I am not sure I understand any of this. Suggestions/request for modifications in PRs are meant to maintain a sort of reasonable standard throughout the codes in order to not descend into chaos. It is not just about getting things work!
Sure, and no one is opposing that! However, the way in which things are approached could be improved. A one-line complicated syntax is definitely not a good one and variables/functions naming should follow some convention.
Various people have implemented datasets so far and this has never arisen.... |
@t7phy this is a relatively big collaboration, in order to ensure that things work smoothly and that a lot of different people can work in the same (quite big) code base requires making some compromises and efforts. One of these efforts concerns Pull Requests and code reviews. Doing a review of someone else's code has 3 main goals:
For the person doing the review this is often not a rewarding job, it takes quite some time and it is difficult to justify as you are not "producing" anything new, but it is necessary since we are a big group of people contributing to the same codebase. This review process is not adversarial nor useless. It is what has allowed to keep this alive for about 8 years already (of the main 10 vp authors from before 2019 none of them continue in the collaboration* and the code is still alive, I think that's quite impressive!!! ) If you don't want to contribute that is perfectly ok. There are people who use the code as external users, that is fine and that's why the code is public. But as a contributor, however, it is important to treat code reviews not as a waste of time but as a way of improving the overall health of nnpdf. Also, a very important point to mention is that code reviews not only strengthen the quality of our code base but in addition they are one of the best ways to improve and hone your programming skills. The reviewer is not always right of course, and you will learn when it makes sense to argue and when it doesn't. You will also learn to write code (and this is crucial) differently depending on what you are doing. The code I write here is different to
I would instead say it is a productive thing. It is my fault that I didn't look into the filter files when some of the new data were merged, but it would be good to fix past mistakes instead of introducing new ones. Regarding the names of the functions, a search and replace + using black afterwards would be extremely helpful, since with only a couple of instructions (a few calls to this_is_the_first_function(some_function(on_some_other_function(on_yet_another_function(oh_my(input1, input2))))) into this much more readable tower: this_is_the_first_function(
some_function(
on_some_other_function(on_yet_another_function(oh_my(input1, input2)))
)
) (but really, going forward please don't apply more than two functions in the same line)
If you cannot dedicate time to this, that is also ok. Just work on this when you do have time. Just make this into a draft PR to signal that this is work in progress and we will come back at this when you can take care of it. We do this constantly (see for instance here) then people can make some comments (which can be useful even if they are about WIP code!) but there's no pressure on anyone to either finish it or review it. *contributing code, some have now become too senior for that :P |
@Radonirinaunimi @scarlehoff at no point did I mention that code review is not useful or should not be done, but a PR to fix a few things cannot be turned into a complete overhaul of a significant amount of code. Note that I explicitly mentioned that I would soon apply the suggestions including the use of black, and you are right, I didn't think earlier that find and replace together with black will solve the issue. But at the same time, it's also essential that we prioritize physics over the name of a function. 2 points here:
At first glance, this function looks much simpler than covmat_to_artunc, however there is no check if the matrix is positive definite, and in case of normalized distributions, one of the values could be 0, which furthermore could be less negative (but close to 0) due to numerical reasons and this needs to be checked and set to 0. And even more, in case of covmats that go over multiple normalized distributions, the function needs to allow for as many 0 eigenvalues as there are distributions. Without taking into account all these caveats, if you make a comparison, it serves no usefulness as you don't consider the different possibilities a utils function should be able to take on.
|
@Radonirinaunimi @scarlehoff here is the vp report: https://vp.nnpdf.science/MzdicX55Ti2M8PlOyp26Hg==/ and above I have pushed the changes as suggested. Please let me know if this is fine. |
closed in favor of #2021 |
No description provided.