-
-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Local Adaptive Streaming Tree #1610
Conversation
…e class Docs are also updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D
I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.
My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier
you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.
Hi Saulo, Thank you for your fast and good review. I agree that some parts of the code can be inherited by other classes. I will put some effort into this. LAST replaces the parameters grace period, tau threshold, and confidence level by change detector and track error. I will leave them as None so. I'm glad that you liked my idea. I will send you the pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danielnowakassis, thanks for the changes already in place!
I left some additional comments in your code and noticed some problems with the tests:
- you need to run the pre-commit actions locally. It seems ruff is modifying some formating of the code in the CI, which makes the code quality tests fail
- You will also need to implement
current_merit
for the remaining split criteria, as these classes cannot be currently instantiated with an abstract method (please check the tests logs for more details).
Also, please do not forget to add an entry in the release notes (docs/releases/unreleased.md
) with your contributions (including the changes in arff files handling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the contribution, @danielnowakassis. I really like the idea behind LAST and I am eager to see the next steps of this theoretical framework.
As a note for posterity: @danielnowakassis and I have identified a possible way to improve the tree module via some refactoring, much akin to what was done to the tree splitters a long time ago. The impacts of these changes are yet to be measured. An issue will be open to track this effort.
Thank you, @smastelini. We did a great job :) |
Hi.
I've implemented my decision tree in River and run all the tests for adding a new estimator.
The published paper is here: https://dl.acm.org/doi/10.1145/3605098.3635899
We proposed an adaptive splitting mechanism, where a change detector monitors the error rate or data distribution purity (parameter) of the leaf node to determine the split point (and if merit > 0). It had some good results against SOTA decision trees, and I have also tested it in real datasets in River.
Accuracy :
I'm open to any questions. LAST would be error rate monitoring, LAST_D would be data distribution purity monitoring.