Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add number_samples to XGBoost ML Models #398

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

V1NAY8
Copy link
Contributor

@V1NAY8 V1NAY8 commented Sep 30, 2021

Work on #243

TreeNode of xgboost has the following information:

Tree               1
Node               0
ID               1-0
Feature           f1
Split      -0.030468
Yes              1-1
No               1-2
Missing          1-1
Gain       44.768673
Cover      23.395073
Name: 7, dtype: object

So, based on

I am thinking Cover gives us the number of training samples that is being covered by the node or leaf since "Gain" is already used as leaf_value. I might be wrong though!

@benwtrent Please give a review :)

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@benwtrent benwtrent self-requested a review September 30, 2021 16:15
@@ -103,6 +107,7 @@ def build_tree_node(self, row: pd.Series, curr_tree: int) -> TreeNode:
right_child=self.extract_node_id(row["No"], curr_tree),
threshold=float(row["Split"]),
split_feature=self.get_feature_id(row["Feature"]),
number_samples=int(row["Cover"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is gonna work.

cover is a float. I don't know how they calculate cover related to the total number of samples that hit the tree. The docs indicate its the average of the total number of samples effected by the split.

I wonder if cover * number of features ~= The total number of docs? Can you investigate and confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants