-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NanoAOD-tools output file is polluted with deleted TTrees (and can 5x inflate the size) #249
Comments
Did anyone ever try implementing the proposed solution? It would be kind of useful to have this fixed. |
Hi, can the experts have a look at this, please? I tried a some naive solutions that might help point in the right directions, but there might be need of more advanced or creative solutions. Changing CWDFirst idea was to replace
outputTree.SetDirectory(0) after the clone in line
nanoAOD-tools/python/postprocessing/framework/output.py Lines 175 to 176 in 4396652
Adding This test was run on a nanoAOD files from the However, when I specified So there needs to be general solution other than loading the intermediate The reason for this final Deleting the temporary treeIn the nanoAOD-tools/python/postprocessing/framework/output.py Lines 175 to 176 in 4396652
I tried a different order, before, and after writing the tree and other objects. When using To use
I also tried renaming already just after cloning the input tree in L130, and the renaming the final output tree Following a proposed solution on a thread on the ROOT forum to reset the branch address before deleting the old tree did not solve the segmentation fault:
Following this ROOT forum thread, I tried
but it seems this list is empty: OverwritingThe last idea was to specify the nanoAOD-tools/python/postprocessing/framework/output.py Lines 102 to 104 in 4396652
using
in the hopes it would completely remove the intermediate tree. But this did not seem to make a difference in the final file size... Please let me know what you think... |
In case anyone needs a quick fix, using
|
please void hadd, use hadd-nano otherwise your trigger bits branches could be misaligned |
Thanks, @arizzi! This also reduces the file size:
Enabling |
This makes sense, as specifying the fwkJobReport triggers the internal
haddnano.py call, running over all of the output files produced (and
defaulting to 'tree.root' for the aggregate file).
…On Wed, May 12, 2021 at 2:41 PM Izaak ***@***.***> wrote:
Thanks, @arizzi <https://github.com/arizzi>!
This also reduces the file size:
haddnano.py output_tmp.root output.root
mv output_tmp.root output.root
Enabling fwkJobReport=True in the postprocessor gives, besides an
FrameworkJobReport.xml file, two output ROOT files: one large one like
before, and one smaller one with the name tree.root and half the size.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#249 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJDSMOTKYESISJODCHOGN4DTNJZPNANCNFSM4QHJVTLQ>
.
|
Just to follow up on this. This issue was effecting my WF which is built on your master branch but adds the functionality to run c++ modules as well as python ones. I think if you just add this line to your WF it will solve this problem : https://github.com/UBParker/nanoAOD-tools/blob/cloops/python/postprocessing/framework/output.py#L130 I also made some other changes but I think this one is the most important. I can try to put in a PR soon. |
hello - recently me and @mseidel42 ran into the same issue. |
It has been reported on HN that some simple processing with nanoaod tools can inflate the file size as much as 5x: https://hypernews.cern.ch/HyperNews/CMS/get/physTools/3734.html
This has been tracked to be likely due to the usage of the output file as temporary storage for other ROOT trees and objects that are eventually deleted.
This is not visible when jobFwkReport is on (or hadd filename is specified) as a simple hadd(-nano) removes the duplication.
More details are given in the HN thread. A possible fix is to avoid leaving the CWD to the output file after the TTree is initially cloned.
@osherson
The text was updated successfully, but these errors were encountered: