Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.
/ fast_retraining Public archive

Show how to perform fast retraining with LightGBM in different business cases

License

Notifications You must be signed in to change notification settings

Azure/fast_retraining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast Retraining

In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.

Installation and Setup

The installation instructions can be found here.

Project

In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.

  • Airline
  • BCI
  • Football
  • Planet Kaggle
  • Fraud Detection
  • HIGGS

In the folder experiment/libs there is the common code for the project.

Benchmark

In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:

Dataset Experiment Data size Features xgb time:
CPU (GPU)
xgb_hist time:
CPU (GPU)
lgb time:
CPU (GPU)
ratio xgb/lgb:
CPU (GPU)
ratio xgb_hist/lgb:
CPU
(GPU)
Football Link CPU
Link GPU
19673 46 2.27 (7.09) 2.47 (4.58) 0.58 (0.97) 3.90
(7.26)
4.25
(4.69)
Fraud Detection Link CPU
Link GPU
284807 30 4.34 (5.80) 2.01 (1.64) 0.66 (0.29) 6.58
(19.74)
3.04
(5.58)
BCI Link CPU
Link GPU
20497 2048 11.51 (12.93) 41.84 (42.69) 7.31 (2.76) 1.57
(4.67)
5.72
(15.43)
Planet Kaggle Link CPU
Link GPU
40479 2048 313.89 (-) 2115.28 (2028.43) 194.57 (317.68) 1.61
(-)
10.87
(6.38)
HIGGS Link CPU
Link GPU
11000000 28 2996.16 (-) 121.21 (114.88) 119.34 (71.87) 25.10
(-)
1.01
(1.59)
Airline Link CPU
Link GPU
115069017 13 - (-) 1242.09 (1271.91) 1056.20 (645.40) -
(-)
1.17
(1.97)

In the next table we summarize the performance results using the F1-Score.

Dataset Experiment Data size Features xgb F1:
CPU (GPU)
xgb_hist F1:
CPU (GPU)
lgb F1:
CPU (GPU)
Football Link
Link
19673 46 0.458 (0.470) 0.460 (0.472) 0.459 (0.470)
Fraud Detection Link
Link
284807 30 0.824 (0.821) 0.802 (0.814) 0.813 (0.811)
BCI Link
Link
20497 2048 0.110 (0.093) 0.142 (0.120) 0.137 (0.138)
Planet Kaggle Link
Link
40479 2048 0.805 (-) 0.822 (0.822) 0.822 (0.821)
HIGGS Link
Link
11000000 28 0.763 (-) 0.767 (0.767) 0.768 (0.767)
Airline Link
Link
115069017 13 - (-) 0.741 (0.745) 0.732 (0.745)

The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.