ilof pietro #1317

pietro-tanure · 2023-08-24T22:19:44Z

…into ilof-pietro

review-notebook-app · 2023-08-24T22:19:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

pietro-tanure · 2023-08-24T22:21:30Z

@hoanganhngo610 This is the same code of the previous pull request, that I accidentally closed (#1232 (comment)) when trying to sync my branch with the current state of river. This branch is now up to date and compatible with the latest river commit

hoanganhngo610 · 2023-08-25T04:44:26Z

Thank you very much @pietro-tanure. I can see this would be an editable PR, so if possible and wherever necessary I would make the necessary changes to the code, or leave comments for further discussion.
I would be able to dig deeper into this from next week.

…core_one already assumes that one sample is taken into account at a certain time point).

MaxHalford · 2023-08-29T14:46:05Z

@hoanganhngo610 thanks for taking care of this. Is it going well?

I don't think we want to merge the notebook. What would be cool is to turn the notebook into a unit test, to show that the results are consistent with scikit-learn.

Regarding the name, I would call it LocalOutlierFactor, to be coherent with scikit-learn. Indeed, both implementations provide the same results. Also, I don't think many people can be expected to know what the acronym LOF corresponds to.

hoanganhngo610 · 2023-09-02T04:57:15Z

@MaxHalford Thank you so much for the comments!
First of all, it's going quite well IMO. What I'm actually doing is refactoring code and modify certain points in the algorithm to make sure that the output it generates goes along with our standards and what we have been doing with previous algorithms. Apart from that, this is a really decent implementation of the online LOF.

Moreover, regarding the name, if you prefer to go with the full name of the LOF, i.e LocalOutlierFactor, I would suggest going with IncrementalLocalOutlierFactor, to emphasize the fact that the algorithm implemented within River is actually the incremental version of the original LOF version!

MaxHalford · 2023-09-02T10:31:32Z

Every algorithm in River is incremental, so I don't think there's a need to prepend the name with Incremental :)

hoanganhngo610 · 2023-09-03T03:19:35Z

Yep got it! In this case I will change the name to LocalOutlierFactor in future commits.

…st (to comply with the implementation of other incremental anomaly detection algorithms)

hoanganhngo610 · 2023-09-06T10:38:16Z

@pietro-tanure @MaxHalford After going through the code, I am suggesting that we can have two functions, one for one instance and one for multiple instances, for the learning phase (learn_one and learn_many) and for the scoring phase (score_one and score_many). Do you think that is a good option?

pietro-tanure · 2023-09-06T11:21:44Z

@pietro-tanure @MaxHalford After going through the code, I am suggesting that we can have two functions, one for one instance and one for multiple instances, for the learning phase (learn_one and learn_many) and for the scoring phase (score_one and score_many). Do you think that is a good option?

@hoanganhngo610 Thank you a lot for what you have been doing. I think that's a good idea, the original code had learn_one and learn_many functions. It had as well a score_one function with an argument window_score that represented the size of the batch to score, so it could score many points at once as well, but a score_many function might me better to make it clearer.

MaxHalford · 2023-09-06T12:21:00Z

Hey there! Great work to both of you.

I'm aligned with learn_many. However, I would only add score_many if it's not a for loop over score_one.

hoanganhngo610 · 2023-09-07T03:52:07Z

Thank you so much @pietro-tanure @MaxHalford for the comments! In that case, I would bring back learn_many to the table for it to work with inputs as data frames, while doing a if/else within score_one to make it adapt to the input that is given (dict as score_one and pandas dataframe for score_many). Does that sound like a good plan?

…ring test to take into account the newly implemented function.

… to static methods.

…nce of ILOF.

…file within the same module).

hoanganhngo610 · 2023-09-09T17:10:19Z

@pietro-tanure @MaxHalford I think I have gone extensively through this PR. Taking into account the suggestion of Max, I have removed the notebook from the PR and replaced it with a test file. However, @pietro-tanure, please rest assured that the notebook is fascinating, and I am quite sure that I will be able to demonstrate it elsewhere (will update you on that later).
In the meantime, I think this PR is ready to be merged. The only problem remains would be the tests that keep failing, especially ubuntu river build and code quality. @MaxHalford from my side, the code quality tests all passed locally, but somehow one error remains with the tests running on GitHub, at test_glm file.

MaxHalford · 2023-09-09T17:20:20Z

Thanks @hoanganhngo610! I can take over from here and take care of the tests. Congrats to both of you on the good work :)

pietro-tanure · 2023-09-11T08:53:09Z

Thank you very much @hoanganhngo610 for all the work, I'm very happy with the result of the code. Thank you @MaxHalford as well for your readiness to help.

MaxHalford · 2023-09-11T10:20:50Z

@hoanganhngo610 @pietro-tanure FYI the tests were failing because LOF needs to be warm-started to function correctly. Indeed if you run it with a simple progressive validation for loop it breaks.

from river import anomaly, utils, datasets

model = anomaly.LocalOutlierFactor()
dataset = datasets.Phishing()

for x, y in dataset:
    model.learn_one(x)
    model.score_one(x)

This isn't ideal, so for now I've disabled the tests. I'd love to see this fixed at some point.

hoanganhngo610 · 2023-09-11T10:27:33Z

@MaxHalford I also noticed this when testing the algorithm locally, and that's also why I designed the test file so that it learns a certain amount of data points before actually doing the scoring. I know that this would not be ideal, but given the nature of the algorithm, I believe it is somewhat acceptable. However, we can try to fix this in the future.

Merge branch 'ilof-pietro' of https://github.com/pietro-tanure/river …

7767164

…into ilof-pietro

pietro-tanure requested review from MaxHalford and smastelini as code owners August 24, 2023 22:19

hoanganhngo610 self-requested a review August 25, 2023 04:44

hoanganhngo610 added 6 commits August 26, 2023 12:20

Update ilof_notebook.ipynb

fc3ee3e

Modify name IncrementalLOF in __init__ file of anomaly module

e4cb46e

Refactor code after precommit run

30caddf

Remove window_score in score_one function of Incremental LOF (since s…

56c17e8

…core_one already assumes that one sample is taken into account at a certain time point).

Remove learn_many and refactor Docstring test

cb7e45f

Remove import pandas since unused.

ac4ab62

hoanganhngo610 added 7 commits September 5, 2023 17:59

�Refactor

ebec94f

Change output of score_one to only return one single number, not li…

a3b4093

…st (to comply with the implementation of other incremental anomaly detection algorithms)

Refactor Docstring test

bc78169

Refactor from IncrementalLOF to LocalOutlierFactor

5224265

Refactor

c64b803

Refactor X_batch to x_batch to align with PEP8.

05375fc

Verbose re-wording.

4d209b4

Refactor X_score to x_scores

e25c800

Remove type of returned output in DocString

01798d6

hoanganhngo610 added 24 commits September 7, 2023 15:53

Remove description for unnecessary variables.

e9f0096

Re-wording for variable description.

1f1a99f

Refactor.

a5afc23

Add learn_many to learn multiple instances at a time and update Docst…

456cb6e

…ring test to take into account the newly implemented function.

Remove description of unnecessary variables.

b4605e3

Spelling correction.

0e47a2b

Modify variable names by removing capital letters to align with PEP8.

f3a2cb1

Refactor calc_local_reach_dist and cal_lof and change these functions…

508373f

… to static methods.

Change expand_objects to static method

5af882d

Refactor.

c7ed8cc

Change define_sets to static method.

fedf6f7

Change calc_reach_dist_newpoints to static method.

8b7bb8c

Change calc_reach_dist_otherpoints to static method and refactor.

cc1864a

Refactor.

74d814a

Refactor calc_reach_dist_new_points

fca09a0

Refactor calc_reach_dist_other_points.

385fc89

Remove import

2d18fe6

Refactor and add description in docstring regarding expected performa…

7ef647e

…nce of ILOF.

Refactor.

f22393c

Remove one unnecessary variable of calc_reach_dist_other_points.

1eb210d

Modify docstring description of the algorithm

7ae1ab8

Add comments to justify the returning results of score_one function.

3b5f851

Add tests for the newly implemented iLOF algorithm.

8dae3dc

Remove iLOF notebook (since the content has been covered by the test …

7ce72be

…file within the same module).

MaxHalford merged commit 11f6cf9 into online-ml:main Sep 11, 2023
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ilof pietro #1317

ilof pietro #1317

pietro-tanure commented Aug 24, 2023

review-notebook-app bot commented Aug 24, 2023

pietro-tanure commented Aug 24, 2023 •

edited

Loading

hoanganhngo610 commented Aug 25, 2023

MaxHalford commented Aug 29, 2023

hoanganhngo610 commented Sep 2, 2023 •

edited

Loading

MaxHalford commented Sep 2, 2023

hoanganhngo610 commented Sep 3, 2023 •

edited

Loading

hoanganhngo610 commented Sep 6, 2023

pietro-tanure commented Sep 6, 2023

MaxHalford commented Sep 6, 2023

hoanganhngo610 commented Sep 7, 2023

hoanganhngo610 commented Sep 9, 2023

MaxHalford commented Sep 9, 2023

pietro-tanure commented Sep 11, 2023

MaxHalford commented Sep 11, 2023

hoanganhngo610 commented Sep 11, 2023

ilof pietro #1317

ilof pietro #1317

Conversation

pietro-tanure commented Aug 24, 2023

review-notebook-app bot commented Aug 24, 2023

pietro-tanure commented Aug 24, 2023 • edited Loading

hoanganhngo610 commented Aug 25, 2023

MaxHalford commented Aug 29, 2023

hoanganhngo610 commented Sep 2, 2023 • edited Loading

MaxHalford commented Sep 2, 2023

hoanganhngo610 commented Sep 3, 2023 • edited Loading

hoanganhngo610 commented Sep 6, 2023

pietro-tanure commented Sep 6, 2023

MaxHalford commented Sep 6, 2023

hoanganhngo610 commented Sep 7, 2023

hoanganhngo610 commented Sep 9, 2023

MaxHalford commented Sep 9, 2023

pietro-tanure commented Sep 11, 2023

MaxHalford commented Sep 11, 2023

hoanganhngo610 commented Sep 11, 2023

pietro-tanure commented Aug 24, 2023 •

edited

Loading

hoanganhngo610 commented Sep 2, 2023 •

edited

Loading

hoanganhngo610 commented Sep 3, 2023 •

edited

Loading