Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optional support to add c++ typenames to parameters in uproot.dask #1375

Merged
merged 11 commits into from
Feb 21, 2025

Conversation

prayagyadav
Copy link
Contributor

Optional support to add c++ typenames to parameters with ak_add_typename in uproot.dask


Necessity:

  • Access to C++ typename info for each tree branch is essential in building COFFEA schema for files generated with EDM4HEP.

  • Similar to how ak_add_doc is used to add the __doc__ parameter to forms, ak_add_typename could be used to add the C++ typename of the branch in parameters.

  • This is an attempt to solve this feature request #1369 posted by @ianna in response to the discussions among @jpivarski @ianna @davidlange6 and @prayagyadav


Outcome/Example/Expected output:

Bash:

wget https://github.com/prayagyadav/coffea/raw/refs/heads/test-typenames-swan/tests/samples/p8_ee_WW_ecm240_edm4hep.root

Python:

file = "p8_ee_WW_ecm240_edm4hep.root"
tree = "events"
events = uproot.dask(file+":"+tree, open_files=False, ak_add_doc=True,  ak_add_typename=True)

events.Particle.form.parameters

Output:

{'__doc__': 'Particle_', 'typename': 'vector<edm4hep::MCParticleData>'}

Here is the live link to the notebook for the above example, where one can run these cells in real-time.


The bulk of the idea for these code changes was given by @davidlange6

Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prayagyadav - Looks great, thanks! Could, please, add a test? There is a skhep_testdata package with a number of root files already in the repository. We usually name the tests with a corresponding PR number, for example:
tests/test_0652_dask_for_awkward.py

Thanks!

@ianna
Copy link
Collaborator

ianna commented Feb 12, 2025

@prayagyadav - we want to keep the API to use ak_add_doc=.... For example, removing ak_add_typename, the behavior could support:

ak_add_doc=False (default behavior) → No metadata is added.
ak_add_doc=True → Adds only __doc__ with the TBranch title (preserving old behavior).
ak_add_doc="title" → Explicitly adds only __doc__.
ak_add_doc="typename" → Adds only typename (new behavior under ak_add_doc).
ak_add_doc="both" → Adds both __doc__ and typename.

@prayagyadav
Copy link
Contributor Author

@prayagyadav - we want to keep the API to use ak_add_doc=.... For example, removing ak_add_typename, the behavior could support:

ak_add_doc=False (default behavior) → No metadata is added.
ak_add_doc=True → Adds only __doc__ with the TBranch title (preserving old behavior).
ak_add_doc="title" → Explicitly adds only __doc__.
ak_add_doc="typename" → Adds only typename (new behavior under ak_add_doc).
ak_add_doc="both" → Adds both __doc__ and typename.

Hi @ianna , sorry for the super-late reply .....

Looks like, here, ak_add_doc = True and ak_add_doc="title" perform the same operation, namely, adding the item {"__doc__" : branch.title } to parameters. Meanwhile, ak_add_doc="title" adds the {"typename":branch.typename} to the parameters. Meaning that, ak_add_doc="title" adds the string __doc__ as the key while ak_add_doc="typename" adds the string typename as the key.
I think this might be a cause of confusion for users. Extending upon this, what I suggest is, we can have:

ak_add_doc=False and ak_add_doc=True as before.
And if the user wants any extra parameter other than this, they could pass a dictionary : {"their desired name": branch.property}.

For example:

ak_add_doc=False (default behavior)                        ---> No metadata is added.
ak_add_doc=True                                            ---> Adds only __doc__ with the TBranch title (preserving old behavior).
ak_add_doc={"typename": "typename"}                        ---> would add {"typename": branch.typename} to the parameters
ak_add_doc={"title": "title"}                              --->  would add {"title": branch.title} to the parameters
ak_add_doc={"__doc__": "title"}                            --->  would add {"__doc__": branch.title} to the parameters
ak_add_doc={ "__doc__":"title" , "typename": "typename"}   --->  would add {"__doc__":branch.title, "typename": branch.typename} to the parameters

Please let me know if this makes sense. An immediate drawback of this approach, which I can think of, is that this could expose some unnecessary properties and methods to the user.

@prayagyadav prayagyadav changed the title feat: Optional support to add c++ typenames to parameters with ak_add_typename in uproot.dask feat: Optional support to add c++ typenames to parameters in uproot.dask Feb 18, 2025
@prayagyadav
Copy link
Contributor Author

Hi @ianna ,
follow up of the previous comment: I went ahead and made the necessary changes to remove ak_add_typename and use ak_add_doc for the typename functionality (as explained in the previous comment).
I have also added a test with the appropriate naming convention.

Please let me know if this looks good.

Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prayagyadav - could you, please, give me an example how it works in a non-dask case? Thanks!

@ianna ianna merged commit fac3070 into scikit-hep:main Feb 21, 2025
26 checks passed
@prayagyadav
Copy link
Contributor Author

@ianna Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants