Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in k4DataSvc for CaloHitContributions if not explicitly loaded #82

Closed
wdconinc opened this issue Apr 25, 2022 · 4 comments
Closed
Assignees

Comments

@wdconinc
Copy link
Contributor

  • OS version: ubuntu 21.10
  • Compiler version: gcc 11.2.0
  • Package version: 061f6b5 (HEAD -> master, tag: v01-00pre14, origin/master, origin/HEAD)
  • Reproduced by: (see below)
  • Input:
    sim_barrel_sciglass_electron_1.0_10.0.edm4hep.root uploaded to https://drive.google.com/open?id=1-GgZ9EFjEusFOjvTdSoaB2MTNGRndwcN (103 MB, 1000 events with 1-10 GeV electron into EIC SciGlass barrel calorimeter)
  • Output: (see below)
  • Goal: analysis of input files with podio/EDM4hep files with the gaudi framework

We are experiencing a memory leak in the EIC framework which we can reproduce in the key4hep framework as well. We take the same approach to the PodioDataSvc, so that's not very surprising, but I'm reporting it here so someone with expertise in podio can take a look too.

Steps to reproduce

  1. spack install [email protected] (or e.g. build an environment with [email protected])
  2. git clone https://github.com/key4hep/k4-project-template, build, install
  3. cp k4ProjectTemplate/options/readExampleEventData.py k4ProjectTemplate/options/readEICEventData.py and apply the following changes to read in the collection 'EcalBarrelHits' for all events in an existing file:
5,7c5
< podioevent.input = "output_k4test_exampledata.root"
< from Configurables import CreateExampleEventData
< producer = CreateExampleEventData()
---
> podioevent.input = "sim_barrel_sciglass_electron_1.0_10.0.edm4hep.root"
11c9
< inp.collections = ["MCParticles", "SimTrackerHit"]
---
> inp.collections = ["MCParticles", "EcalBarrelHits"]
16c14
<                 EvtMax=100,
---
>                 EvtMax=-1,
  1. Download sim_barrel_sciglass_electron_1.0_10.0.edm4hep.root (103 MB), generated by an unmodified DD4hep ddsim script against the EIC geometry.
  2. Back in build directory, run /usr/bin/time -v ./run k4run ../k4ProjectTemplate/options/readEICEventData.py.

Expected behavior

The value under Maximum resident set size (kbytes) should be independent of number of events (and about 500 MB)

Actual behavior

The memory use increases with number of events, and goes up to ~2 GB for this file (with 1000 events). Files with more than 10k events are not able to be run on typical systems due to memory limitations.

Workaround

The memory leak can be avoided by explicitly listing the linked collections, i.e. modifying the diff above to

11c9
< inp.collections = ["MCParticles", "SimTrackerHit"]
---
> inp.collections = ["MCParticles", "EcalBarrelHits", "EcalBarrelHitsContributions"]

in which case the EcalBarrelHitsContributions collection is properly deleted after each event.

This behavior may qualify as a bug since a data model may change without notice, and downstream options file will continue to work but appear to start leaking memory. It would be clearer if it just failed instead (or didn't leak the memory).

@tmadlener
Copy link
Contributor

I think I figured out the underlying problem for this, but I am not yet entirely sure if there is an easy solution to it.

The problem is the following: PodioDataSvc uses podio::EventStore to read collections from file. The podio::EventStore keeps track of all collections it has read, including the ones that are implicitly read when resolving relations (i.e. the CaloHitContributions in this case). It also takes ownership of all collections it reads(!). Calling EventStore::clear disposes of all read collections. The PodioDataSvc on the other hand hands over ownership to the Gaudi EventStore (since the DataWrapper takes ownership of the wrapped podio::CollectionBase*). To make the podio::EventStore "aware" of this fact it calls EventStore::clearCaches:

m_provider.clearCaches();
m_reader.endOfEvent();

That clears the internal vector of the EventStore, which in turn means that only explicitly read collections are now handled by the PodioDataSvc and implicitly read ones are no longer reachable from the EventStore and leak.

Given this interplay between the two, I am not sure there is an easy fix. Since overhauling the whole I/O is on our list once we have a prototype of the Frame in podio my preferred solution would actually be to wait until then, unless this is something that cannot be worked around by simply listing all (implicitly) read collections.

@wdconinc
Copy link
Contributor Author

Is there a way to get notified of implicitly read collections as a WARNING?

@tmadlener
Copy link
Contributor

I don't think there is an easy way to do that, since implicitly and explicitly read collections go through the same mechanism in the EventStore (especially in the way in which it is used here in k4FWCore). I will have a second look to make sure I didn't miss anything though.

@tmadlener
Copy link
Contributor

I think this should no longer apply with the introduction of Frame based I/O (#100). If this is still an issue, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants