Skip to content
Konstantin Androsov edited this page Jun 28, 2018 · 9 revisions

Production v4 instructions

Framework installation

  1. Install framework on lxplus in a prod workspace directory without creating CMSSW release area

    curl -s https://raw.githubusercontent.com/hh-italian-group/h-tautau/prod_v4/install_framework.sh | bash -s prod17
  2. Check framework production functionality interactively for a few samples

    cd CMSSW_9_4_8/src
    cmsenv
    # Graviton 450 sample - if not in Pisa download it and put the correct path in the txt
    echo /store/mc/RunIIFall17MiniAODv2/GluGluToBulkGravitonToHHTo2B2Tau_M-450_narrow_13TeV-madgraph/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/90000/1CA54262-F442-E811-8262-0CC47A4D7694.root > Graviton_450.txt
    # Run in local interactively
    cmsRun h-tautau/Production/python/Production.py fileList=Graviton_450_local.txt fileNamePrefix='file:/gpfs/ddn/cms/user/grippo/Run2017_analysis_hh_bbtautau/CMSSW_9_4_4/src/miniaod/' anaChannels=all applyTriggerMatch=True sampleType=Fall17MC globalTag=94X_mc2017_realistic_v13 runSVfit=False runKinFit=False tupleOutput=eventTuple_Graviton_450.root maxEvents=100
    # Tau Run2017B - if not in Pisa download it and put the correct path in the txt
    echo /store/data/Run2017B/Tau/MINIAOD/31Mar2018-v1/00000/040B1CD3-9437-E811-8D93-A4BF01158FE0.root > Tau_B.txt
    # Run in local interactively
    cmsRun h-tautau/Production/python/Production.py fileList=Tau_RunB_v2.txt fileNamePrefix='file:/gpfs/ddn/cms/user/grippo/Run2017_analysis_hh_bbtautau/CMSSW_9_4_8/src/miniaod/' anaChannels=tauTau applyTriggerMatch=True sampleType=Run2017 globalTag=94X_dataRun2_v6 runSVfit=False runKinFit=False energyScales=Central tupleOutput=eventTuple_RunB_tauTau_v2.root lumiFile=/afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions17/13TeV/ReReco/Cert_294927-306462_13TeV_EOY2017ReReco_Collisions17_JSON.txt maxEvents=10
  3. Install framework on the stage out site (e.g. Pisa)

    curl -s https://raw.githubusercontent.com/hh-italian-group/h-tautau/prod_v4/install_framework.sh | bash -s prod17
    cd CMSSW_9_4_8/src
    cmsenv
    ./run.sh MergeRootFiles --help

Setup CRAB working environment

Each time after login:

source /cvmfs/cms.cern.ch/crab3/crab.sh
voms-proxy-init --voms cms --valid 168:00
cd CMSSW_DIR/src
cmsenv
cd h-tautau/Production/crab

Production spreadsheet legend

  • done: all crab jobs are successfully finished
  • tuple: tuples for a task that have all crab jobs successfully finished are merged and copied to the central storage (in the stage out site or in the cernbox)

Production workflow

Steps 2-6 should be repeated periodically, 1-2 times per day, until the end of the production.

  1. Submit jobs

    ./submit.py --work-area work-area --cfg ../python/Production.py --site T2_IT_Pisa --output hh_bbtautau_prod_v4_2017 config/2017/config1 [config/2017/config2] ...
  2. Check jobs status

    ./multicrab.py --workArea work-area --crabCmd status

    Analyze output of the status command for each task:

    1. If few jobs were failed without any persistent pattern, resubmit them:
      crab resubmit -d work-area/TASK_AREA
    2. If significant amount of jobs are failing, one should investigate the reason and take actions accordingly. For more details see the CRAB troubleshoot section below.
    3. If all jobs are successfully finished, move task area from 'work-area' into 'finished' directory (create it if needed).
      # mkdir -p finished
      mv work-area/TASK_AREA finished/

    Before moving the directory, make sure that all jobs has or 'failed' or 'finished', otherwise wait (use kill command, if necessary).

    1. Create task lists for tasks in 'finished' directory and transfer them into stage out server.

      if [ -d current-check ] ; then rm -rf prev-check ; mv current-check prev-check ; fi
      mkdir current-check
      for NAME in finished ; do ./create_job_list.sh $NAME | sort > current-check/$NAME.txt ; done
    2. Update prod_v4_2017 spreadsheet accordingly.

      for NAME in finished ; do echo "$NAME:" ; if [ -f prev-check/$NAME.txt ] ; then diff current-check/$NAME.txt prev-check/$NAME.txt ; else cat current-check/$NAME.txt ; fi ; done
  3. Submit merge jobs output files on the stage out server (before running this procedure the software has to be installed).

    1. If some merge jobs were already created during the previous iteration, use find_new_jobs.sh to create list of new jobs to submit.

      • N.B. The file current-check/*.txt has to be transferred from lxplus to stage out server in the src directory in order to run find_new_jobs.sh.
      ./h-tautau/Instruments/find_new_jobs.sh current-check/finished.txt output/tuples > finished.txt
      • N.B. Check that in the created job lists there are no jobs which are still running in batch system queue for merge (bjobs command on gridui). In case remove these jobs.
    2. Submit merge jobs in the local queue, where CRAB_OUTPUT_PATH in the output crab path specified in the submit.py command. For example for Pisa stage out server is /gpfs/ddn/srm/cms/store/user/#YOUR_USERNAME/hh_bbtautau_prod_v4_2017/.

      ./h-tautau/Instruments/submit_tuple_hadd.sh cms finished.txt output/merge CRAB_OUTPUT_PATH

      or do it interactively in stage out server on fai machine (bsub -Is -n 1 -q fai -a "docker-sl6" /bin/bash), using the following command line:

       ./h-tautau/Instruments/submit_tuple_hadd.sh interactive finished.txt output/merge CRAB_OUTPUT_PATH
    3. Collect finished jobs (this script can be run as many times as you want).

      ./h-tautau/Instruments/collect_tuple_hadd.sh output/merge output/tuples
  4. Split large merged files into several parts in order to satisfy cernbox requirement that file size should be less than 8GB.

    # mkdir -p output/tuples_split
    ./h-tautau/Instruments/python/split_tuple_file.py --input output/tuples/SAMPLE.root --output output/tuples_split/SAMPLE.root
    # copy split files back into the original directory
    mv output/tuples_split/SAMPLE*.root output/tuples
  5. Transfer tuple files into the local tuple storage. Pisa: /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/.

    • For 100% complete tasks use Full sub-directory.
    # Full
    rsync -auv --chmod=g+rw --exclude '*sub[0-9].root' --exclude '*recovery[0-9].root' --dry-run output/tuples/*.root /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/Full
    # if everything ok
    rsync -auv --chmod=g+rw --exclude '*sub[0-9].root' --exclude '*recovery[0-9].root' output/tuples/*.root /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/Full
    • Update prod_v4_2017 spreadsheet accordingly.
    • Tuples the will be transfered by the production coordinator into the central prod_v4_2017 cernbox directory: /eos/user/k/kandroso/cms-it-hh-bbtautau/Tuples2017_v4.
  6. When all production is over, after few weeks of safety delay, delete remaining crab output directories and root files in your area to reduce unnecessary storage usage.

CRAB troubleshoot

  1. Common failure reasons
    • Jobs are failing due to memory excess.

      Solution Resubmit jobs requiring more memory per job, e.g.:

      crab resubmit --maxmemory 4000 -d work-area/TASK_AREA
    • Jobs are failing on some servers

      Solution Resubmit jobs using black or white list, e.g.:

      crab resubmit --siteblacklist=T2_IT_Pisa -d work-area/TASK_AREA
      # OR
      crab resubmit --sitewhitelist=T2_IT_Pisa -d work-area/TASK_AREA