-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Fix dump_model()
information for root node
#6569
Conversation
Currently the CI is not passing as #6574 is blocking. |
Tests should now be sufficient for the change. |
internal_value_
is not calculated properlyinternal_value_
for root node
@jameslamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shiyu1994 or @guolinke could you help with a review of this?
I'm not sure if this will correctly handle these cases:
- custom
init_score
provided (viaDataset
) boost_from_average=False
passed
@neNasko1 could you also look at #5962 and let us know if you think this change would fix the issue @thatlittleboy reported there?
Thank you for taking the time to look into the PR and linking a relevant issue.
I think those cases are handled as the results are consistent with what leaf values report, I also remade the test to boost from average.
I took the liberty to merge @thatlittleboy's WIP code into mine, additionally fixing the issues that they reported. I will also change the description of the PR to reflect both of the fixes. |
internal_value_
for root nodedump_model()
information for root node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
But I'll keep following the discussion about Dask Ranker test (#6569 (comment)).
@jameslamb can you submit a final review on the change, so that we can merge it? Sorry for the caused inconvenience! |
I will look when I can. I have spent most of my limited open source time in the last few weeks investigating and fixing multiple difficult, time-sensitive CI issues in this project, and there is yet another one that is still not done and a primary focus for me right now (#6651). If @StrikerRUS has time to re-review the commits and comments you've pushed since his approval, and if he approves, then my review can be dismissed and this can be merged without another review from me. Otherwise, you will have to be patient a bit longer. |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've left some minor suggestions for your consideration, around making the tests stricter and easier to understand.
I've also triggered valgrind checks on this branch, to ensure no new memory-management issues have been introduced by this PR.
Unfortunately, this still needs a bit more investigation before I'm confident in it... I was finally able to investigate your comments in #6569 (comment), and found that that Dask test checking the trees_to_dataframe()
output in the presence of init_score
really was testing that the init_score
wasn't ignored. I am going to try right now to figure out why that was, and if it has implications for this PR.
By the way, most of the commits you've pushed here are not tied to your GitHub account.
Doesn't really matter in this repo because if this is merged we'll squash everything into one commit, and that'll be correctly tied to your account. But just making you aware of it as it might cause problems for you in other GitHub-based projects. You can fix that for future commits like this:
git config --global user.email "${EMAIL}"
replacing ${EMAIL}
with an email address tied to your GitHub account.
Sorry for making it seem like there is some rush around this. I understand that this project is run mainly by volunteers and I do not want to harass any of the maintainers.
Thanks for the suggestions, all of them are reasonable and are now merged. I grouped them into one commit as I wanted to test if everything is okay. Again, thanks to everyone for the time spent! |
Is this related to
I retract my comment of the test being a no-op previously, however after this fix there is a need for a change in this test. |
Thanks, changes look great. For your other questions, let's please stay in the thread you're quoting, so the conversation can all be grouped together. I've responded there: #6569 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Just some small suggestions on the new Dask test, and then I think we can merge this. Thanks for all your hard work, I'm glad we'll be able to get this fix in.
Co-authored-by: James Lamb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, thanks very much for the help!
Thanks for the all the time spent! Glad to see it merged 🚀 |
This PR corrects the output of
dump_model()
and other dump-related functions liketrees_to_dataframe()
. There are 2 fixes implemented:Tree::Split
implementation incorrectly saves the old leaf output value in theinternal_value_
array when called on the root node. This in turn makes inspecting the whole training process from python incomplete.Before:
After: