Processing datasets

Bruno Alves edited this page Nov 26, 2024

Prepare Big Ntuples

Clone this repository in lxplus (using release CMSSW_10_6_29). Look into branch 106X_HH_UL, under the LLRHiggsTauTau/NtupleProducer/test/ folder.

Submission with crab

  • The datasets are under NtupleProducer/test/datasets_UL18.txt (similar for other data periods). This file is picked up by NtupleProducer/test/
  • You might want to edit the script in the following places:
    • isMC flag set to True/False depending on the samples being processed
    • background MC samples potentially commented out
    • edit with isMC=True if needed, changing the YEAR variable
ssh lxplus # logs in to EL9 by default
cmssw-el7 # the CMSSW version of this repo is only compatible with SL7
PS1="${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[00;35m\]\w <Sing> \[\033[00m\]\$ " # improves CLI clarity
cd CMSSW_10_6_29/src/
cd LLRHiggsTauTau/NtupleProducer/test/;
source /cvmfs/;
  • Visualize progression with Grafana (sign in below with CERN’s credentials)
  • For LLR, the submission outputs are stored under root://${USER}/HHNtuples_res/ (access with gfal-ls tool).

Note #1: Make sure the isMC flag is the same in NtupleProducer/test/ and NtupleProducer/test/ Note #2: Common CRAB commands: crab submit / crab submit -d <folder> / crab status

Resubmission of failed jobs

Assuming the folder where the crab jobs were stored is crab3_Data_UL16_April2024, one can resubmit all failed jobs with:

folder=crab3_Data_UL16_April2024/; for i in $(ll ${folder} | awk '/crab_/ {print $9}'); do crab resubmit ${folder}/$i; done

EnrichedMiniAOD to LLRntuples (old instructions)

  1. : set path to the folder created by the CRAB submission (crab3_<tag>)
  2. python will print a list of published datasets names. For missing names listed at the end, typically CRAB submission failed
  3. datasets_Enriched.txt: copy the previous list and define a block name between === <whatever> ===
  4. tools/ define PROCESS and tag as before
  5. cmsenv ; source /cvmfs/ ; cd tools; python will create the file list for each published sample under inputFilesEMiniAOD<tag> NOTE : this can take some time depending on how fast the CRAB server responds and might need some retries. Se proprio non va, do this by hand from DAS interface.
