Install framework on lxplus in a prod workspace directory without creating CMSSW release area
curl -s https://raw.githubusercontent.com/hh-italian-group/h-tautau/prod_v4/install_framework.sh | bash -s prod17
Check framework production functionality interactively for a few samples
cd CMSSW_9_4_8/src cmsenv # Graviton 450 sample - if not in Pisa download it and put the correct path in the txt echo /store/mc/RunIIFall17MiniAODv2/GluGluToBulkGravitonToHHTo2B2Tau_M-450_narrow_13TeV-madgraph/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/90000/1CA54262-F442-E811-8262-0CC47A4D7694.root > Graviton_450.txt # Run in local interactively cmsRun h-tautau/Production/python/Production.py fileList=Graviton_450_local.txt fileNamePrefix='file:/gpfs/ddn/cms/user/grippo/Run2017_analysis_hh_bbtautau/CMSSW_9_4_4/src/miniaod/' anaChannels=all applyTriggerMatch=True sampleType=Fall17MC globalTag=94X_mc2017_realistic_v13 runSVfit=False runKinFit=False tupleOutput=eventTuple_Graviton_450.root maxEvents=100 # Tau Run2017B - if not in Pisa download it and put the correct path in the txt echo /store/data/Run2017B/Tau/MINIAOD/31Mar2018-v1/00000/040B1CD3-9437-E811-8D93-A4BF01158FE0.root > Tau_B.txt # Run in local interactively cmsRun h-tautau/Production/python/Production.py fileList=Tau_RunB_v2.txt fileNamePrefix='file:/gpfs/ddn/cms/user/grippo/Run2017_analysis_hh_bbtautau/CMSSW_9_4_8/src/miniaod/' anaChannels=tauTau applyTriggerMatch=True sampleType=Run2017 globalTag=94X_dataRun2_v6 runSVfit=False runKinFit=False energyScales=Central tupleOutput=eventTuple_RunB_tauTau_v2.root lumiFile=/afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions17/13TeV/ReReco/Cert_294927-306462_13TeV_EOY2017ReReco_Collisions17_JSON.txt maxEvents=10
Install framework on the stage out site (e.g. Pisa)
curl -s https://raw.githubusercontent.com/hh-italian-group/h-tautau/prod_v4/install_framework.sh | bash -s prod17 cd CMSSW_9_4_8/src cmsenv ./run.sh MergeRootFiles --help
Each time after login:
source /cvmfs/cms.cern.ch/crab3/crab.sh
voms-proxy-init --voms cms --valid 168:00
cd CMSSW_DIR/src
cd h-tautau/Production/crab
- done: all crab jobs are successfully finished
- tuple: tuples for a task that have all crab jobs successfully finished are merged and copied to the central storage (in the stage out site or in the cernbox)
Steps 2-6 should be repeated periodically, 1-2 times per day, until the end of the production.
Submit jobs
./submit.py --work-area work-area --cfg ../python/Production.py --site T2_IT_Pisa --output hh_bbtautau_prod_v4_2017 config/2017/config1 [config/2017/config2] ...
Check jobs status
./multicrab.py --workArea work-area --crabCmd status
Analyze output of the status command for each task:
- If few jobs were failed without any persistent pattern, resubmit them:
crab resubmit -d work-area/TASK_AREA
- If significant amount of jobs are failing, one should investigate the reason and take actions accordingly. For more details see the CRAB troubleshoot section below.
- If all jobs are successfully finished, move task area from 'work-area' into 'finished' directory (create it if needed).
# mkdir -p finished mv work-area/TASK_AREA finished/
Before moving the directory, make sure that all jobs has or 'failed' or 'finished', otherwise wait (use kill command, if necessary).
Create task lists for tasks in 'finished' directory and transfer them into stage out server.
if [ -d current-check ] ; then rm -rf prev-check ; mv current-check prev-check ; fi mkdir current-check for NAME in finished ; do ./create_job_list.sh $NAME | sort > current-check/$NAME.txt ; done
Update prod_v4_2017 spreadsheet accordingly.
for NAME in finished ; do echo "$NAME:" ; if [ -f prev-check/$NAME.txt ] ; then diff current-check/$NAME.txt prev-check/$NAME.txt ; else cat current-check/$NAME.txt ; fi ; done
- If few jobs were failed without any persistent pattern, resubmit them:
Submit merge jobs output files on the stage out server (before running this procedure the software has to be installed).
If some merge jobs were already created during the previous iteration, use find_new_jobs.sh to create list of new jobs to submit.
- N.B. The file current-check/*.txt has to be transferred from lxplus to stage out server in the src directory in order to run find_new_jobs.sh.
./h-tautau/Instruments/find_new_jobs.sh current-check/finished.txt output/tuples > finished.txt
- N.B. Check that in the created job lists there are no jobs which are still running in batch system queue for merge (bjobs command on gridui). In case remove these jobs.
Submit merge jobs in the local queue, where CRAB_OUTPUT_PATH in the output crab path specified in the submit.py command. For example for Pisa stage out server is /gpfs/ddn/srm/cms/store/user/#YOUR_USERNAME/hh_bbtautau_prod_v4_2017/.
./h-tautau/Instruments/submit_tuple_hadd.sh cms finished.txt output/merge CRAB_OUTPUT_PATH
or do it interactively in stage out server on fai machine (bsub -Is -n 1 -q fai -a "docker-sl6" /bin/bash), using the following command line:
./h-tautau/Instruments/submit_tuple_hadd.sh interactive finished.txt output/merge CRAB_OUTPUT_PATH
Collect finished jobs (this script can be run as many times as you want).
./h-tautau/Instruments/collect_tuple_hadd.sh output/merge output/tuples
Split large merged files into several parts in order to satisfy cernbox requirement that file size should be less than 8GB.
# mkdir -p output/tuples_split ./h-tautau/Instruments/python/split_tuple_file.py --input output/tuples/SAMPLE.root --output output/tuples_split/SAMPLE.root # copy split files back into the original directory mv output/tuples_split/SAMPLE*.root output/tuples
Transfer tuple files into the local tuple storage. Pisa: /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/.
- For 100% complete tasks use Full sub-directory.
# Full rsync -auv --chmod=g+rw --exclude '*sub[0-9].root' --exclude '*recovery[0-9].root' --dry-run output/tuples/*.root /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/Full # if everything ok rsync -auv --chmod=g+rw --exclude '*sub[0-9].root' --exclude '*recovery[0-9].root' output/tuples/*.root /gpfs/ddn/cms/user/androsov/store/cms-it-hh-bbtautau/Tuples2017_v4/Full
- Update prod_v4_2017 spreadsheet accordingly.
- Tuples the will be transfered by the production coordinator into the central prod_v4_2017 cernbox directory: /eos/user/k/kandroso/cms-it-hh-bbtautau/Tuples2017_v4.
When all production is over, after few weeks of safety delay, delete remaining crab output directories and root files in your area to reduce unnecessary storage usage.
- Common failure reasons
Jobs are failing due to memory excess.
Solution Resubmit jobs requiring more memory per job, e.g.:
crab resubmit --maxmemory 4000 -d work-area/TASK_AREA
Jobs are failing on some servers
Solution Resubmit jobs using black or white list, e.g.:
crab resubmit --siteblacklist=T2_IT_Pisa -d work-area/TASK_AREA # OR crab resubmit --sitewhitelist=T2_IT_Pisa -d work-area/TASK_AREA