Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add templated links between arbitrary datatypes #257

Merged
merged 79 commits into from
Sep 30, 2024

Conversation

tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Jan 31, 2022

BEGINRELEASENOTES

  • Add a templated Link (and corresponding LinkCollection) class to allow for arbitrary links between two podio generated datatypes.
    • The Links work similar to a podio generated class of the form
Link:
  Members: [float weight]
  OneToOneRelations:
    - FromT from
    - ToT to
  • They offer templated access to the objects they link via get<T> on top of the usual "generated" access functions. The access functions are using a get prefix, e.g. getFrom.
  • They can be used in memory simply by declaring a collection via podio::LinkCollection<F, T>. In order to enable I/O for link collections it is necessary to register them via the PODIO_DECLARE_LINK macro

ENDRELEASENOTES

Fixes #255


NOTE (Jul 29 2024): The following discussion and descriptions span quite a bit of time and the original proposal used Association instead of Link as it's name (see key4hep/EDM4hep#344 for some reasoning of that). Refer to the included links.md file for the latest version of the documentation with usage examples. We leave the discussions unchanged below.


Preliminary (user facing) documentation

The main template that is introduced is the AssociationT and two type-aliases to have mutable and immutable Association types:

template<typename FromT, typename ToT, bool Mutable>
class AssociationT;

template<typename FromT, typename ToT>
using Association = typename AssociationT<FromT, ToT, false>;

template<typename FromT, typename ToT>
using MutableAssociation = typename AssociationT<FromT, ToT, true>;

The main interface for the user is that of the similarly templated AssociationCollection, which uses as the template parameters the immutable (i.e. default) types, like

#include "podio/AssociationCollection.h"
#include "edm4hep/MCParticle.h"
#include "edm4hep/ReconstructedParticle.h"

using MCRecoParticleAssociationCollection = podio::AssociationCollection<edm4hep::MCParticle, edm4hep::ReconstructedParticle>;

from here on the collection behaves the same as a normal collection (i.e. as if it would have been generated via a yaml definition file), i.e.

auto mcRecoAssocs = MCRecoParticleAssociationCollection();
auto mcParticles = edm4hep::MCParticleCollection();
auto recoParticles = edm4hep::ReconstructedParticleCollection();

// fill mc and reco collections

auto assoc = mcRecoAssocs.create(); // now have a mutable association
assoc.setWeight(3.14f); // set a weight for the association
assoc.setFrom(mcParticles[0]); // set a MCParticle as the from part of the association
assoc.setTo(recoParticles[0]); // set a ReconstructedParticle as the to part of the association

// getting associations from store
auto& associations = store.get<MCRecoParticleAssociationCollection()>("associations");
for (auto a : associations) {
  // a is now an immutable association
  auto mc = a.getFrom(); // get the MCParticle as the from part of the association
  auto reco = a.getTo(); // get the ReconstructedParticle as the to part of the association
  auto weight = a.getWeight(); // get the weight
}

It is also possible to use the relations as subset collections as they use the same mechanism under the hood as collections of "proper" datatypes.

Some internal technicalities

  • Each generated datatype now has DefT, MutT and CollT type-aliases to get the default (immutable), mutable and collection types respectively from them. This is used internally for making sure that we return the correct types to the user in the end. In order to not have to check in all templates, there are a few static_asserts at the "entry points" to the associations to make sure we can assume that the FromT and ToT template parameters are the default types internally.
  • The collection iterators use the same approach as the associations, i.e. there is a generalized template with a boolean parameter and two type-aliases to get to the immutable and mutable iterators. This avoids code duplication, but has the slight drawback that some enable_if code is necessary, to make sure that the conversion from mutable to immutable is possible but not vice-versa.

Includes changes from:

TODOs:

@tmadlener
Copy link
Collaborator Author

Should the association interface have "generic" get and set functions in addition to get(To|From) and set(To|From)? I.e. something that would look like:

using namespace edm4hep;
using MCRecoParticleAssociation = podio::Association<MCParticle, ReconstructedParticle>;
auto assoc = MCRecoParticleAssociation();

// get the MCParticle
auto mcP = assoc.getFrom(); // implicitly refers to the MCParticle

auto mcP2 = assoc.get<MCParticle>(); // get the MCParticle explicitly
auto reco = assoc.get<ReconstructedParticle>(); // get the ReconstrucedParticle explicitly

The untemplated versions are nice to use where the association is somewhat "directed", while for "symmetric" associations To and From do not really make sense, I think.

@andresailer
Copy link
Member

auto mcP2 = assoc.get<MCParticle>(); 

Having to and from be the same type isn't prohibited, right? So this wouldn't work

Something like the tuple get (https://en.cppreference.com/w/cpp/utility/tuple/get)

auto mcP1 = assoc.get<0>(); 
auto mcP2 = assoc.get<1>(); 
auto weight = assoc.get<2>(); 

@tmadlener
Copy link
Collaborator Author

Having to and from be the same type isn't prohibited, right? So this wouldn't work

Good point. I didn' think about that. In principle we could forbid associations between the same type (by failing this at compile time), but there probably are legitimate use cases for that.

Something like the tuple get

We could probably do that, but then I don't see the additional value compared to getTo, getFrom, and getWeight, because you will still have to know which index refers to which part of your association, same as with the From and To scheme.

Alternatively we could only make the templated get/set functionality available for associations which have different types? Or we leave that out for now and see how far we get with simply From, To?

@tmadlener tmadlener force-pushed the associations branch 2 times, most recently from fb1ad46 to 17dfba5 Compare April 7, 2022 10:11
@andresailer
Copy link
Member

@tmadlener tmadlener force-pushed the associations branch 2 times, most recently from e0b784d to 1f98dbf Compare December 5, 2022 18:10
@tmadlener
Copy link
Collaborator Author

Revived this via a rebase and then fixing up some of the things that have been broken in the meantime. On a very basic level this is now working, but I am not yet 100 % happy with it yet. My main concerns are related to having a somewhat "seamless" I/O experience for associations.

Currently to make either I/O backend work with all possible combinations I have to explicitly declare either a static podio::Association<FromT, ToT> (ROOT dictionary), or a static podio::AssociationSIOBlock<FromT, ToT> (SIO blocks) and compile all of them into the dictionary or the SIOBlocks shared lib respectively. This means that there will be N^2 of these static instances if there are N datatypes in the yaml file.

Currently this leads to roughly twice as long compilations (if run sequentially) compared to before. The main additional time is spent in dictionary and/or SioBlocks generation as can be seen in the following table:

translation unit before with associations
TestDataModel dict 3.4s 1m57.5s
ExensionDataModel dict 1.9s 40.6s
TestDataModel + ExtensionDataModel SIOBlocks 51.3s 2m48.5s
TestDataModel Associations SIOBlocks - 1m37.7s
ExensionDataModel Associations SIOBlocks - 32.3s

The table shows the time it takes to compile the Dict.cxx generated by genreflex, and the sum of all components of the SIOBlocks shared libraries (without linking). The fact that the dictionary compilation steps cannot be further parallelized and that the Associations SIOBlocks (last two rows), are also one translation unit also mean that these will become bottlenecks in parallel builds.

Additionally, these translation units and the preceeding genreflex step consume quite a bit of memory. Apparently enough, that we get killed by githubs CI runners when we try to build EDM4hep this way: https://github.com/AIDASoft/podio/pull/257/checks#step:6:1501

So to conclude: Generating all possible associations for I/O upfront seems to not be easily possible. This means that I/O dictionaries and SIOBlocks would have to be compiled / generated by users who actually want to store associations. Note that in-memory we don't have this problem, and users can simply declare whatever association they want. The main question is how far we want to / should go in facilitating I/O for associations.

@tmadlener
Copy link
Collaborator Author

Using the proposed functionality of having a central entity that hands out collection buffers, the need for building all possible combinations up front can be alleviated at the cost of having to use a macro (and another function call in the SIO case) to do this automatically at compile time. Usability is still not as good as desired, and I would like to improve that. Also the python side of things still needs to be handled as that currently does not have the templated things compiled into a library which it could simply load.

@tmadlener
Copy link
Collaborator Author

With #405 the name of the branches will also have to be considered in the whole "generatlity" aspect

doc/associations.md Outdated Show resolved Hide resolved
@tmadlener tmadlener changed the title [WIP] Add templated associations between arbitrary datatypes Add templated associations between arbitrary datatypes Jan 30, 2024
@tmadlener
Copy link
Collaborator Author

Revived this one more time (maybe this is the last now). This time mainly making sure to incorporate changes that came possible with the MaybeSharedPtr and the introduction of some member type defs (#465). No new features were added wrt the last revival.

include/podio/detail/Association.h Outdated Show resolved Hide resolved
include/podio/utilities/DeprecationMacros.h Outdated Show resolved Hide resolved
python/templates/MutableObject.h.jinja2 Outdated Show resolved Hide resolved
tests/read_test.h Outdated Show resolved Hide resolved
tests/unittests/associations.cpp Outdated Show resolved Hide resolved
tests/write_frame.h Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator Author

Is there anything left here to do? Otherwise I would merge this and then address issues in other PRs. Currently we have quite a few PRs that depend on this and that have very large diffs because they include this PR.

@hegner hegner self-requested a review September 30, 2024 11:16
Copy link
Collaborator

@hegner hegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!

@tmadlener
Copy link
Collaborator Author

I have no idea what makes the EDM4hep CI fail. The reason is that these workflows no longer discover the podio version that has just been built in the EDM4hep find_package(podio 1.1 REQUIRED HINTS $ENV{PODIO}), but rather the one from the underlying LCG stacks. However, nothing in this PR should change the behavior of that find_package call? Effectively, I just bump the patch version here, but that should not lead to any changes, I think. @jmcarcell @andresailer what am I missing here?

@tmadlener
Copy link
Collaborator Author

It actually looks like the correct podio version is found, but the wrong python bindings(?). I cannot reproduce this in a local key4hep stack though.

@jmcarcell
Copy link
Member

Isn't the test correct and the file was written with the previous version of podio which doesn't match the current one anymore?

@tmadlener
Copy link
Collaborator Author

No the expected podio version is (or at least should be) parse from the input filename. However, the regex matching for that was not working as expected and previously only passed because the podio and EDM4hep versions aligned with their current versions from HEAD. This PR changes the podio version and makes the test fail unexpectedly.

@tmadlener tmadlener merged commit 383c222 into AIDASoft:master Sep 30, 2024
18 checks passed
@tmadlener tmadlener deleted the associations branch September 30, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for relations between two arbitrary types
5 participants