Adding Local Adaptive Streaming Tree #1610

danielnowakassis · 2024-09-05T12:27:40Z

Hi.

I've implemented my decision tree in River and run all the tests for adding a new estimator.

The published paper is here: https://dl.acm.org/doi/10.1145/3605098.3635899

We proposed an adaptive splitting mechanism, where a change detector monitors the error rate or data distribution purity (parameter) of the leaf node to determine the split point (and if merit > 0). It had some good results against SOTA decision trees, and I have also tested it in real datasets in River.

Accuracy :

I'm open to any questions. LAST would be error rate monitoring, LAST_D would be data distribution purity monitoring.

…e class Docs are also updated

river/tree/hoeffding_adaptive_tree_classifier.py

river/tree/nodes/last_nodes.py

smastelini

Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D

I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.

My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.

river/tree/nodes/last_nodes.py

river/tree/split_criterion/variance_ratio_split_criterion.py

river/tree/last_classifier.py

danielnowakassis · 2024-09-05T20:44:01Z

Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D

I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.

My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.

Hi Saulo,

Thank you for your fast and good review.

I agree that some parts of the code can be inherited by other classes. I will put some effort into this. LAST replaces the parameters grace period, tau threshold, and confidence level by change detector and track error. I will leave them as None so.

I'm glad that you liked my idea. I will send you the pdf

smastelini

Hi @danielnowakassis, thanks for the changes already in place!

I left some additional comments in your code and noticed some problems with the tests:

you need to run the pre-commit actions locally. It seems ruff is modifying some formating of the code in the CI, which makes the code quality tests fail
You will also need to implement current_merit for the remaining split criteria, as these classes cannot be currently instantiated with an abstract method (please check the tests logs for more details).

Also, please do not forget to add an entry in the release notes (docs/releases/unreleased.md) with your contributions (including the changes in arff files handling)

river/stream/iter_arff.py

river/tree/last_classifier.py

river/tree/nodes/last_nodes.py

river/tree/last_classifier.py

docs/releases/unreleased.md

river/tree/hoeffding_adaptive_tree_classifier.py

river/tree/last_classifier.py

smastelini

Thank you so much for the contribution, @danielnowakassis. I really like the idea behind LAST and I am eager to see the next steps of this theoretical framework.

As a note for posterity: @danielnowakassis and I have identified a possible way to improve the tree module via some refactoring, much akin to what was done to the tree splitters a long time ago. The impacts of these changes are yet to be measured. An issue will be open to track this effort.

river/tree/last_classifier.py

danielnowakassis · 2024-09-06T20:01:36Z

Thank you so much for the contribution, @danielnowakassis. I really like the idea behind LAST and I am eager to see the next steps of this theoretical framework.

As a note for posterity: @danielnowakassis and I have identified a possible way to improve the tree module via some refactoring, much akin to what was done to the tree splitters a long time ago. The impacts of these changes are yet to be measured. An issue will be open to track this effort.

Thank you, @smastelini. We did a great job :)

danielnowakassis and others added 9 commits April 23, 2024 18:26

add first last with out-of-date docs

c2d1ade

Merge branch 'online-ml:main' into main

c7abcfa

update LAST to detect change in the data distribution + iter_arff non…

099c19d

…e class Docs are also updated

update docs

0468d66

Merge branch 'online-ml:main' into main

88145b5

Update hoeffding_adaptive_tree_classifier.py

a394429

Merge branch 'online-ml:main' into main

ccce5fe

changes after tests

4061821

Merge branch 'online-ml:main' into main

cec9b69

danielnowakassis requested review from MaxHalford and smastelini as code owners September 5, 2024 12:27

smastelini reviewed Sep 5, 2024

View reviewed changes

river/tree/hoeffding_adaptive_tree_classifier.py Outdated Show resolved Hide resolved

smastelini reviewed Sep 5, 2024

View reviewed changes

river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved

smastelini requested changes Sep 5, 2024

View reviewed changes

danielnowakassis and others added 3 commits September 5, 2024 19:11

solving inheritance and small fixes

313b742

tests + current_merit method

fca6254

Update river/tree/hoeffding_adaptive_tree_classifier.py

5d4af5a

smastelini requested changes Sep 6, 2024

View reviewed changes

update docs

b9f376d