Prerequisites

Train a classifier using xgboost via sklearn interface

Prerequisites

Install xgboost and scikit-learn, e.g. via conda

Training

Copy the training file (coffea output) into the local folder and run train.py

Optimising the training

A simple algorithm to tune BDT trainings:

Choose an as large number of estimators (aka boosting rounds aka trees) as you find reasonable - typical final values are around 1000, with the training time roughly going linearly with the number of estimators
Tune the eta and max_depth parameters
I've always had good experience with using subsamples and putting the value around 0.5, but this can also be experimented with

Notes

Usually, the training should also work, and perform better, with weights. In first tries, the training didn't seem to perform properly when passing weights. Maybe there's some parameter (e.g. minimum leaf weight) that prevents the training from working properly with the default weights. It may be an option to understand this better and possibly scale the weights so that the training performs better.

More documentation is here: https://xgboost.readthedocs.io/en/stable/parameter.html

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train a classifier using xgboost via sklearn interface

Prerequisites

Training

Optimising the training

Notes

About

Releases

Packages

Languages

cms-hnl/highmass_train

Folders and files

Latest commit

History

Repository files navigation

Train a classifier using xgboost via sklearn interface

Prerequisites

Training

Optimising the training

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages