Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality for early stopping rounds. #193

Merged
merged 19 commits into from
Nov 17, 2023

Conversation

david-sun-1
Copy link
Contributor

Had a need for early stopping rounds and followed the discussion found here #158 to implement this version of it.

src/booster.jl Outdated Show resolved Hide resolved
@ExpandingMan
Copy link
Collaborator

ExpandingMan commented Oct 8, 2023

Thanks for this.

I have a few concerns with this implementation. The first is that, we had thus far carefully avoided parsing the output messages for obvious reasons. There is apparently no way around the fact that having this kind of functionality requires it, so I don't think we could disallow it, however, I think that if we are going to do this we should (at the very least) have dedicated log message parsing functions, and expose that useful functionality in the API rather than cramming it into update!. Ideally, we probably should provide the ability to insert arbitrary callbacks between calls to updateone!. That doesn't have to come in this PR, but we should think carefully about what the interface should look like for it to see if we can do it without breaking (I have not thought it through carefully yet myself). I also don't love the idea of removing the evaluations from the calls to updateone! itself, as it could be useful if someone wants to write a custom loop that uses it, though I can certainly see a good argument that it's getting in the way here. Lastly, shouldn't we have some kind of tolerance for the score comparison? It would be really nice to be able to do some kind of simple convergence check and stop when that is reached. In most of my cases I'd be far more likely to use this to do that and I think that is a good use case.

@david-sun-1
Copy link
Contributor Author

Hi @ExpandingMan , thanks for the feedback.

  • I've added a function to do the parsing
  • Agree with you on the callback, ( for future! ). This would make it a much cleaner implementation
  • I've added back evaluations to updateone! In exchange, I've had to return msg from the "updateone!" function and separate msg assignment and @info into 2 separate lines.

For the tolerance on score comparison -that it is extra functionality beyond what is currently available. Early stopping criteria within the R & Python implementations are both based on number of rounds where no improvement has been made and this is how convergence is tested for.

This is generally fine so long as you have an evaluation set which would stop overfitting. In the event that we may be checking early stopping against a training set only, it would take much longer to reach a point where there is no improvement. I think what you mentioned is a good suggestion which would be valuable where the only other alternative would be to define the number of rounds upfront.

I suggest we tackle that particular enhancement in a separate PR.

@ExpandingMan
Copy link
Collaborator

Thanks. I have to take a more careful look at this later, right now I'm mostly concerned about making sure we can add a tolerance without breaking changes, though hopefully a keyword argument will suffice.

One quick comment I wanted to make is that we can't have updateone! return this pair, it would be breaking (also I don't love the idea of it returning a message, just because it seems like a bit of a weird interface). I realize there is now an issue in that there is a log inside updateone! that we don't want to evaluate twice, so it's not clear what to do. I think if we are going to have this function for parsing message stuff, it's probably fine if we move the info logs out of updateone! so that that function on its own is silent. Technically not breaking, but also not ideal 🤷 As you can tell I don't know exactly how this should be done either, but we should try to do this without breaking.

`early_stopping_rounds` if 0, the early stopping function is not triggered. If set to a positive integer,
training with a validation set will stop if the performance doesn't improve for k rounds.

`maximize` If early_stopping_rounds is set, then this parameter must be set as well.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that the user must input the value whenever they input early_stopping_rounds. Maybe better to say something like "only used if early_stopping_rounds > 0".

@ExpandingMan
Copy link
Collaborator

Ok, comments in line. I think the gist is that, in order not to return a message string from updateone! (if anything I think it should return the actual metric, not a string, but this would still be technically breaking), logging will have to be moved into update!. Doesn't seem ideal, but I don't see a lot of options so I think I'm ok with that.

@wilan-wong-1
Copy link
Contributor

@ExpandingMan I've updated the early stopping rounds branch to incorporate some fixes. Predominantly, as the watchlist was previously defaulting to a Dict, this is an unordered data structure which meant relying on the ordering of elements in this data structure could cause unintentional behaviour. Modifications have been made to account for this to ensure order is preserved based on user input (if an OrderedDict is provided).

To ensure backwards compatibility, I've maintained the same xgboost interface but placed a warning if a user inputs a Dict watchlist with more than 1 element and has decided to use the early stopping rounds functionality.

To support early stopping rounds, I've also extended the Booster class to have additional attributes which capture the best round (iteration) and best score as per other XGBoost language implementations. This can then be directly used with the already defined ntree_limit parameter in the predict() method.

I've made additional test cases to also give test coverage to these enhancements.

@ExpandingMan
Copy link
Collaborator

ExpandingMan commented Nov 1, 2023

Thanks, especially for all the testing. The way I see it the major issue here is the return of the message string, as commented above. Even if it were not breaking it's a bizarre interface. There are lots of other ways of handling this. (See inline comments above, they are still there.)

Copy link
Collaborator

@ExpandingMan ExpandingMan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry guys, I just realized after doing the same thing on another PR that nobody could see the comments I wrote. I hate this entire github feature.

src/booster.jl Show resolved Hide resolved
src/booster.jl Outdated Show resolved Hide resolved
src/booster.jl Outdated Show resolved Hide resolved
src/booster.jl Outdated Show resolved Hide resolved
src/booster.jl Outdated Show resolved Hide resolved
src/booster.jl Outdated Show resolved Hide resolved
@ExpandingMan
Copy link
Collaborator

This looks good to me, thanks! Let me give it a more thorough look when I get a chance, and hopefully I can merge.

@ExpandingMan
Copy link
Collaborator

I haven't forgotten this but I haven't gone through it yet... feel free to ping me if I don't get back to it this week.

@ExpandingMan
Copy link
Collaborator

Looks good to me, thanks for this.

@ExpandingMan ExpandingMan merged commit 4ead83f into dmlc:master Nov 17, 2023
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants