Skip to content

A benchmark to the challenge of compressing an obscured dataset containing Particle Identification

Notifications You must be signed in to change notification settings

weissercn/LHCb_PID_Compression

Repository files navigation

LHCb_PID_Compression

A benchmark to the challenge of compressing an obscured dataset containing Particle Identification

GANs and Flows are trained in the exp_3_features directory with jupyter notebooks. Once run, the models and preprocessing scalers are saved in exp_3_features\weights and exp_3_features\gan_preprocessors, respectively. Once trained the models can be accessed by running scripts like generate_Flow_gencut_ksttrain_nspd.py.

Both the data provided for running, and the processing of the generated data is done with the following repo https://github.com/weissercn/LUV_ML and the scriot Efficiency.py within it.
For individuals in the Mike Williams group at MIT, you can access the data in exp_3_features/data in the following directory of geno: /data/weisser_genmodels/LHCb_PID_Compression_exp_3_features_data.

OLD INSTRUCTIONS:

The data can be found here: https://zenodo.org/record/1231531#.WyZSQFOFO3V

Download it and put it into a folder called 'Data' in this directory.

Installing FunctionScaler(https://github.com/weissercn/FunctionScaler/) as a dependency is required. This can be done using pip (pip install FunctionScaler). Other requirements: keras, sklearn, pandas, matplotlib, pickle

Run the following notebook in this order:

  1. Prepare
  2. Train
  3. Analyse Output
  4. Cross Check
  5. ROOT compression

About

A benchmark to the challenge of compressing an obscured dataset containing Particle Identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published