leaves is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in pure Go. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.
NOTE: Before 1.0.0
release the API is a subject to change.
- General Features:
- support parallel predictions for batches
- support sigmoid, softmax transformation functions
- Support LightGBM (repo) models:
- read models from
text
format and fromJSON
format - support
gbdt
,rf
(random forest) anddart
models - support multiclass predictions
- addition optimizations for categorical features (for example, one hot decision rule)
- addition optimizations exploiting only prediction usage
- read models from
- Support XGBoost (repo) models:
- read models from binary format
- support
gbtree
,gblinear
,dart
models - support multiclass predictions
- support missing values (
nan
)
- Support scikit-learn (repo) tree models (experimental support):
- read models from pickle format (protocol
0
) - support
sklearn.ensemble.GradientBoostingClassifier
- read models from pickle format (protocol
In order to start, go get this repository:
go get github.com/dmitryikh/leaves
Minimal example:
package main
import (
"fmt"
"github.com/dmitryikh/leaves"
)
func main() {
// 1. Read model
useTransformation := true
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
if err != nil {
panic(err)
}
// 2. Do predictions!
fvals := []float64{1.0, 2.0, 3.0}
p := model.PredictSingle(fvals, 0)
fmt.Printf("Prediction for %v: %f\n", fvals, p)
}
In order to use XGBoost model, just change leaves.LGEnsembleFromFile
, to leaves.XGEnsembleFromFile
.
Documentation is hosted on godoc (link). Documentation contains complex usage examples and full API reference. Some additional information about usage examples can be found in leaves_test.go.
Some additional information on new features and backward compatibility can be found in NOTES.md.
Below are comparisons of prediction speed on batches (~1000 objects in 1 API
call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ГБ
2133 MHz LPDDR3. C API implementations were called from python bindings. But
large batch size should neglect overhead of python bindings. leaves
benchmarks were run by means of golang test framework: go test -bench
. See
benchmark for mode details on measurments. See
testdata/README.md for data preparation pipelines.
Single thread:
Test Case | Features | Trees | Batch size | C API | leaves |
---|---|---|---|---|---|
LightGBM MS LTR | 137 | 500 | 1000 | 49ms | 51ms |
LightGBM Higgs | 28 | 500 | 1000 | 50ms | 50ms |
LightGBM KDD Cup 99* | 41 | 1200 | 1000 | 70ms | 85ms |
XGBoost Higgs | 28 | 500 | 1000 | 44ms | 50ms |
4 threads:
Test Case | Features | Trees | Batch size | C API | leaves |
---|---|---|---|---|---|
LightGBM MS LTR | 137 | 500 | 1000 | 14ms | 14ms |
LightGBM Higgs | 28 | 500 | 1000 | 14ms | 14ms |
LightGBM KDD Cup 99* | 41 | 1200 | 1000 | 19ms | 24ms |
XGBoost Higgs | 28 | 500 | 1000 | ? | 14ms |
(?) - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings
(*) - KDD Cup 99 problem involves continuous and categorical features simultaneously
- LightGBM models:
- limited support of transformation functions (support only sigmoid, softmax)
- XGBoost models:
- limited support of transformation functions (support only sigmoid, softmax)
- could be slight divergence between C API predictions vs. leaves because of floating point convertions and comparisons tolerances
- scikit-learn tree models:
- no support transformations functions. Output scores is raw scores (as from
GradientBoostingClassifier.decision_function
) - only pickle protocol
0
is supported - could be slight divergence between sklearn predictions vs. leaves because of floating point convertions and comparisons tolerances
- no support transformations functions. Output scores is raw scores (as from
In case if you are interested in the project or if you have questions, please contact with me by
email: khdmitryi at
gmail.com