Total incorrect python xgboost train model,use leaves load model and predict #69

luowencai · 2019-11-22T07:27:52Z

We use spark to generate libsvm file, then use python sklearn to load it and xgboost to train and save model， finaly use leaves load it and predict.
the predict result was total incorrect between python demo and go.
just want to ask if leve not support or we use leaves wrong.
the python code like:

my_workpath = 'D:\\project\\py\\train_demo\\'
X_train, y_train = load_svmlight_file(my_workpath + 'train')
X_test, y_test = load_svmlight_file(my_workpath + 'validation')
bst = XGBClassifier()
bst.fit(X_train, y_train)
bst.save_model(my_workpath + "train_model")
train_preds = [x[1] for x in bst.predict_proba(X_train)]
test_preds = [x[1] for x in bst.predict_proba(X_test)]

the go code like:

model, e := leaves.XGEnsembleFromFile(model_path,true)
	if e != nil{
		println(e)
	}
	if model.Transformation().Type() != transformation.Logistic {
		log.Fatalf("expected TransforType = Logistic (got %s)", model.Transformation().Name())
	}
	csr, err := mat.CSRMatFromLibsvmFile(validate_path, 0, true)
	if err != nil{
		println(err)
	}
	predictions := make([]float64, csr.Rows()*model.NOutputGroups())
	e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)
	if e != nil{
		println(e)
	}
	fmt.Printf("Prediction for %v\n", predictions)

The text was updated successfully, but these errors were encountered:

dmitryikh · 2019-12-02T13:39:54Z

Hello! Thank for your report.

e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)

why do you use only 50 trees to predict? Try use all tress in ensemble, like in python script.

Also, If you can provide your train & test files, I can check the case precisely.

luowencai · 2019-12-03T09:44:30Z

Sorry for the mistake python code, here's the right python code we actually use:

from sklearn.datasets import load_svmlight_file
from xgboost import XGBClassifier


class train_classifier:
    bst = XGBClassifier(max_depth=8, n_estimators=50, learning_rate=0.1, silent=False, objective='binary:logistic',
                        min_child_weight=3, gamma=0, scale_pos_weight=45.1193405554875, subsample=0.9,
                        colsample_bytree=0.6, reg_alpha=3, reg_lambda=3, verbose=False)
    my_workpath = 'D:\\project\\py\\train_demo\\'

    def __init__(self):
        self.bst.load_model(self.my_workpath + "train_model")

    def train(self, train_path='train'):
        X_train, y_train = load_svmlight_file(self.my_workpath + train_path)
        self.bst.fit(X_train, y_train)
        self.bst.save_model(self.my_workpath + "train_model")

    def test_predict(self, test_file='validation'):
        X_test, y_test = load_svmlight_file(self.my_workpath + test_file)
        return [x[1] for x in self.bst.predict_proba(X_test)]

Here's the predict result we run python predict and go predict_csr:
predict_result.zip

dmitryikh added the question Further information is requested label Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total incorrect python xgboost train model,use leaves load model and predict #69

Total incorrect python xgboost train model,use leaves load model and predict #69

luowencai commented Nov 22, 2019 •

edited

Loading

dmitryikh commented Dec 2, 2019

luowencai commented Dec 3, 2019 •

edited

Loading

Total incorrect python xgboost train model,use leaves load model and predict #69

Total incorrect python xgboost train model,use leaves load model and predict #69

Comments

luowencai commented Nov 22, 2019 • edited Loading

dmitryikh commented Dec 2, 2019

luowencai commented Dec 3, 2019 • edited Loading

luowencai commented Nov 22, 2019 •

edited

Loading

luowencai commented Dec 3, 2019 •

edited

Loading