cerminar · cerminar · May 15, 2023 · May 15, 2023 · May 23, 2023 · May 26, 2023
diff --git a/README.md b/README.md
@@ -1,9 +1,14 @@
 # ntuple-tools
 
-The python scripts in this repository should help you get started analysing the [HGCAL L1 TP ntuples](https://github.com/PFCal-dev/cmssw/tree/hgc-tpg-devel-CMSSW_10_3_0_pre4/L1Trigger/L1THGCal/plugins/ntuples)
+PYTHON framework for the analysis of [ROOT](https://root.cern/) `TTree` data using [uproot](https://uproot.readthedocs.io/en/latest/) for the IO and [awkward-array](https://awkward-array.org/doc/main/) for the columnar data analysis.
+
+The tool is originally developed for the analysis of [L1T ntuples for Phase-2 e/g](https://github.com/cerminar/Phase2EGTriggerAnalysis) but should work with any kind of flat ntuples.
 
 ## Pre-requisites: first time setup
 
+The tool can be run on any private machines using just `python`, `pip` and `virtualenvwrapper`.
+If you plan to run it on lxplus you might want to look at the point `1` below.
+
 ### 1. lxplus setup
 
 This step is `lxplus` specific, givin access to a more recent `python` and `root` version.
@@ -39,7 +44,7 @@ The **first time** you will have to create the actual instance of the `virtualen
 
 and 
 
-[requirements_py3.8.txt](requirements_py3.10.txt)
+[requirements_py3.10.txt](requirements_py3.10.txt)
 
 for python 3.8 and 3.10 respectively.
 
@@ -57,7 +62,6 @@ Edit/skip it accordingly for your specific system.
 
 `source setup_lxplus.sh`
 
-
 ### 2. setup `virtualenvwrapper`
 
 For starting using virtualenvwrapper
@@ -75,16 +79,22 @@ After this initial (once in a time) setup is done you can just activate the virt
 
 ## Running the analysis
 
-The main script is `analyzeHgcalL1Tntuple.py`:
+The main script is `analyzeNtuples.py`:
 
-`python analyzeHgcalL1Tntuple.py --help`
+`python analyzeNtuples.py --help`
 
 An example of how to run it:
 
-`python analyzeHgcalL1Tntuple.py -f cfg/hgctps.yaml -i cfg/datasets/ntp_v81.yaml -c tps -s doubleele_flat1to100_PU200 -n 1000 -d 0`
+`python analyzeNtuples.py -f cfg/hgctps.yaml -i cfg/datasets/ntp_v81.yaml -c tps -s doubleele_flat1to100_PU200 -n 1000 -d 0`
+
+## General idea
+
+Data are read in `collections` of objects corresponding to an `array` and are processed by `plotters` which creates set of histograms for different `selections` of the data `collections`.
+
 
 ### Configuration file
 The configuration is handled by 2 yaml files. 
+
 One specifying    
    - output directories
    - versioning of the plots
@@ -94,12 +104,13 @@ The other prividing
    - details of the input samples (location of the ntuple files)
 
 Example of configuration file can be found in:
- - [cfg/default.yaml](cfg/default.yaml)
- - [cfg/datasets/ntp_v66.yaml](cfg/datasets/ntp_v66.yaml)
+ - [cfg/egplots.yaml](cfg/egplots.yaml)
+ - [cfg/datasets/ntp_v92.yaml](cfg/datasets/ntp_v92.yaml)
 
 
 ### Reading ntuple branches or creating derived ones
-The list of branches to be read and converted in pandas `DataFrame` format is specified in the module
+
+The list of branches to be read and converted to `Awkward Arrays` format is specified in the module
 
 [collections](python/collections.py)
 
@@ -111,7 +122,7 @@ Selections are defined as strings in the module:
 [selections](python/selections.py)
 
 Different collections are defined for different objects and/or different purposes. The selections have a `name` whcih is used for the histogram naming (see below). Selections are used by the plotters.
-
+Selections can be combined and retrieved via regular expressions in the configuration of the plotters.
 
 ### Adding a new plotter
 The actual functionality of accessing the objects, filtering them according to the `selections` and filling `histograms` is provided by the plotter classes defined in the module:
@@ -137,9 +148,22 @@ The histogram naming follows the convention:
 This is assumed in all the `plotters` and in the code to actually draw the histograms.
 
 
+## Histogram drawing
+
+Of course you can use your favorite set of tools. I use mine [plot-drawing-tools](https://github.com/cerminar/plot-drawing-tools), which is based on `jupyter notebooks`.
+
+`cd ntuple-tools`
+`git clone [email protected]:cerminar/plot-drawing-tools.git`
+`jupyter-notebook`
+
+## HELP
+
+I can't figure out how to do some manipulation using the `awkward array` or `uproot`....you can take a look at examples and play witht the arrays in:
+[plot-drawing-tools/blob/master/eventloop-uproot-ak.ipynb](https://github.com/cerminar/plot-drawing-tools/blob/master/eventloop-uproot-ak.ipynb)
+
 ## Submitting to the batch system
 
-Note that the script `analyzeHgcalL1Tntuple.py` can be used to submit the jobs to the HTCondor batch system invoking the `-b` option. A dag configuration is created and you can actually submit it following the script output.
+Note that the script `analyzeNtuples.py` can be used to submit the jobs to the HTCondor batch system invoking the `-b` option. A dag configuration is created and you can actually submit it following the script output.
 
 ### Note about hadd job.
 For each sample injected in the batch system a DAG is created. The DAG will submitt an `hadd` command once all the jobs will succeed.

diff --git a/analyzeHgcalL1Tntuple.py → analyzeNtuples.py b/analyzeHgcalL1Tntuple.py → analyzeNtuples.py
@@ -30,7 +30,7 @@
 
 # import root_numpy as rnp
 import pandas as pd
-import uproot4 as up
+import uproot as up
 
 from python.main import main
 import python.l1THistos as histos
@@ -122,10 +122,9 @@ def analyze(params, batch_idx=-1):
     if params.rate_pt_wps:
         calib_manager.set_pt_wps_version(params.rate_pt_wps)
 
-
-    output = ROOT.TFile(params.output_filename, "RECREATE")
-    output.cd()
+    output = up.recreate(params.output_filename)
     hm = histos.HistoManager()
+    hm.file = output
 
     # instantiate all the plotters
     plotter_collection = []
@@ -156,7 +155,8 @@ def analyze(params, batch_idx=-1):
     for tree_file_name in files_with_protocol:
         if break_file_loop:
             break
-        tree_file = up.open(tree_file_name, num_workers=2)
+        # tree_file = up.open(tree_file_name, num_workers=2)
+        tree_file = up.open(tree_file_name, num_workers=1)
         print(f'opening file: {tree_file_name}')
         ttree = tree_file[params.tree_name.split('/')[0]][params.tree_name.split('/')[1]]
 
@@ -180,7 +180,7 @@ def analyze(params, batch_idx=-1):
                 # if tree_reader.global_entry % 100 == 0:
                 #     tr.collect_stats()
 
-                if tree_reader.global_entry != 0 and tree_reader.global_entry % 1000 == 0:
+                if tree_reader.global_entry != 0 and tree_reader.global_entry % 10000 == 0:
                     print("Writing histos to file")
                     hm.writeHistos()
 
@@ -205,11 +205,8 @@ def analyze(params, batch_idx=-1):
     # print("Processed {} events/{} TOT events".format(nev, ntuple.nevents()))
 
     print("Writing histos to file {}".format(params.output_filename))
-
-    output.cd()
     hm.writeHistos()
-
-    output.Close()
+    output.close()
     # ROOT.ROOT.DisableImplicitMT()
 
     return tree_reader.n_tot_entries

diff --git a/cfg/compIDtuples.py b/cfg/compIDtuples.py
@@ -0,0 +1,18 @@
+from __future__ import absolute_import
+import python.plotters as plotters
+import python.collections as collections
+import python.selections as selections
+
+
+# simple_selections = (selections.Selector('^EGq[4-5]$')*('^Pt[1-3][0]$|all'))()
+
+comp_selections = (selections.Selector('^Pt15|all')&('^EtaABC$|^EtaBC$|all'))()
+sim_selections = (selections.Selector('^GEN$')&('^Ee$|all')&('^Pt15|all')&('^EtaABC$|^EtaBC$|all'))()
+
+compid_plotters = [
+    plotters.CompTuplesPlotter(collections.TkEleEE, comp_selections),
+    plotters.CompCatTuplePlotter(collections.TkEleEE, collections.sim_parts, comp_selections, sim_selections)
+]
+
+for sel in sim_selections:
+    print(sel)
diff --git a/cfg/compIDtuples.yaml b/cfg/compIDtuples.yaml
@@ -0,0 +1,44 @@
+
+common:
+  output_dir:
+    default: /eos/user/c/cerminar/hgcal/CMSSW1015/plots/
+    matterhorn: /Users/cerminar/cernbox/hgcal/CMSSW1015/plots/
+    Matterhorn: /Users/cerminar/cernbox/hgcal/CMSSW1015/plots/
+    triolet: /Users/cerminar/cernbox/hgcal/CMSSW1015/plots/
+  output_dir_local: /Users/cerminar/cernbox/hgcal/CMSSW1015/plots/
+  output_dir_lx: /eos/user/c/cerminar/hgcal/CMSSW1015/plots/
+  plot_version: v160A
+  run_clustering: False
+  run_density_computation: False
+# +AccountingGroup = "group_u_CMS.u_zh.users"
+# +AccountingGroup = "group_u_CMST3.all"
+
+collections:
+
+  compid:
+    file_label:
+      compid
+    samples:
+      # - ele_flat2to100_PU0
+      # - ele_flat2to100_PU200
+      # - doubleele_flat1to100_PU0
+      - doublephoton_flat1to100_PU200
+      - doubleele_flat1to100_PU200
+      - nugun_alleta_pu200
+      # - photon_flat8to150_PU0
+      # - photon_flat8to150_PU200
+      # - dyll_PU200
+    plotters:
+      - !!python/name:cfg.compIDtuples.compid_plotters
+    htc_jobflavor:
+      microcentury
+    priorities:
+      doubleele_flat1to100_PU0: 2
+      doubleele_flat1to100_PU200: 7
+      doublephoton_flat1to100_PU200: 6
+      nugun_alleta_pu200: 6
+    events_per_job:
+      doubleele_flat1to100_PU0: 10000
+      doubleele_flat1to100_PU200: 10000
+      doublephoton_flat1to100_PU200: 10000
+      nugun_alleta_pu200: 10000
diff --git a/cfg/datasets/ntp_v91.yaml b/cfg/datasets/ntp_v91.yaml
@@ -8,7 +8,7 @@ samples:
 
   # tree_name: hgcalTriggerNtuplizer/HGCalTriggerNtuple
   tree_name: l1EGTriggerNtuplizer_l1tCorr/L1TEGTriggerNtuple
-  rate_pt_wps: data/rate_pt_wps_v152B.90A.json
+  rate_pt_wps: data/rate_pt_wps_v160A.91G.json
   # tree_name: l1CaloTriggerNtuplizer/HGCalTriggerNtuple
 
   # doubleele_flat1to100_PU0:
@@ -58,10 +58,10 @@ samples:
     # input_sample_dir: NuGunAllEta_PU200/NTP/v80A/
     # input_sample_dir: NeutrinoGun_E_10GeV/NuGunAllEta_PU200_v47/191105_135050/0000/
     events_per_job: 300
-  # 
-  # ttbar_PU200:
-  #   input_sample_dir: TT_TuneCP5_14TeV-powheg-pythia8/TT_PU200_v82B/
-  #   events_per_job: 200
+
+  ttbar_PU200:
+    input_sample_dir: TT_TuneCP5_14TeV-powheg-pythia8/TT_PU200_FWTest10k
+    events_per_job: 200
 
   # zprime_ee_PU200:
   #   input_sample_dir: ZprimeToEE_M-6000_TuneCP5_14TeV-pythia8/ZPrimeEE_PU200_v82

diff --git a/cfg/datasets/ntp_v92.yaml b/cfg/datasets/ntp_v92.yaml
@@ -0,0 +1,73 @@
+# NOTE: fix of track extrapolation (digitized tracks with bitwise extrapolation)
+# branch:
+
+samples:
+  input_dir:  /eos/cms/store/cmst3/group/l1tr/cerminar/l1teg/ntuples/
+  calib_version: calib-v134C
+  version: 92G
+
+  # tree_name: hgcalTriggerNtuplizer/HGCalTriggerNtuple
+  tree_name: l1EGTriggerNtuplizer_l1tCorr/L1TEGTriggerNtuple
+  rate_pt_wps: data/rate_pt_wps_v152B.90A.json
+  # tree_name: l1CaloTriggerNtuplizer/HGCalTriggerNtuple
+
+  # doubleele_flat1to100_PU0:
+  #   input_sample_dir: DoubleElectron_FlatPt-1To100/DoubleElectron_FlatPt-1To100_PU0_v64E/
+  #   events_per_job : 500
+  #   # gen_selections: !!python/name:python.selections.genpart_photon_selections
+
+  doubleele_flat1to100_PU200:
+    input_sample_dir: DoubleElectron_FlatPt-1To100-gun/DoubleElectron_FlatPt-1To100_PU200_v92G/
+    events_per_job : 200
+
+  doublephoton_flat1to100_PU200:
+    input_sample_dir: DoublePhoton_FlatPt-1To100-gun/DoublePhoton_FlatPt-1To100_PU200_v92G/
+    events_per_job : 200
+
+  # ele_flat2to100_PU0:
+  #   input_sample_dir: SingleElectron_PT2to200/SingleE_FlatPt-2to200_PU0_v60G2/
+  #   events_per_job : 500
+  #   # gen_selections: !!python/name:python.selections.genpart_photon_selections
+  # 
+  # ele_flat2to100_PU200:
+  #   input_sample_dir: SingleElectron_PT2to200/SingleE_FlatPt-2to200_PU200_v60G2/
+  #   events_per_job : 200
+  # 
+  # photon_flat8to150_PU0:
+  #   input_sample_dir: SinglePhoton_PT2to200/SinglePhoton_FlatPt-2to200_PU0_v60D/
+  #   events_per_job : 500
+  # 
+  # photon_flat8to150_PU200:
+  #   input_sample_dir: SinglePhoton_PT2to200/SinglePhoton_FlatPt-2to200_PU200_v60D/
+  #   events_per_job : 200
+  # 
+  # pion_flat2to100_PU0:
+  #   input_sample_dir: SinglePion_FlatPt-2to100/SinglePion_FlatPt-2to100_PU0_v33/190911_081445/0000/
+  #   events_per_job : 500
+  # 
+  # pion_flat2to100_PU200:
+  #   input_sample_dir: SinglePion_FlatPt-2to100/SinglePion_FlatPt-2to100_PU200_v33/190911_081546/0000/
+  #   events_per_job : 200
+  # #
+  # nugun_alleta_pu0:
+  #   input_sample_dir: SingleNeutrino/NuGunAllEta_PU0_v14/190123_172948/0000/
+  #   events_per_job: 500
+
+  nugun_alleta_pu200:
+    input_sample_dir: MinBias_TuneCP5_14TeV-pythia8/NuGunAllEta_PU200_v92G/
+    # input_sample_dir: NuGunAllEta_PU200/NTP/v80A/
+    # input_sample_dir: NeutrinoGun_E_10GeV/NuGunAllEta_PU200_v47/191105_135050/0000/
+    events_per_job: 300
+  # 
+  # ttbar_PU200:
+  #   input_sample_dir: TT_TuneCP5_14TeV-powheg-pythia8/TT_PU200_v82B/
+  #   events_per_job: 200
+
+
+  dyll_PU200:
+    input_sample_dir: DYToLL_M-50_TuneCP5_14TeV-pythia8/DYToLL_PU200_v92G
+    events_per_job: 200
+
+  dyll_M10to50_PU200:
+    input_sample_dir: DYToLL_M-10To50_TuneCP5_14TeV-pythia8/DYToLL_M10To50_PU200_v92G
+    events_per_job: 200