Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

hoanganhngo610 · 2023-11-06T08:05:46Z

KNN-ICAD is a conformalized density- and distance-based anoaly detection algorithms for a one-dimensional time-series data. This algorithm uses a combination of a feature extraction method, an approach to assess a score whether a new observation differs significantly from a previously observed data, and a probabilistic interpretation of this score based on the conformal paradigm.

This implementation is adapted from the implementation within PySAD (Python Streaming Anomaly Detection) and NAB (Numenta Anomaly Benchmark).

This implementation relies heavily on numpy due to the following reasons:

Computational complexity upon handling 2-dimensional data (for example, the proper training matrix and calibration matrix)
The fact that River currently does not support a utility to calculate the inverse matrix, which is crucial in the calculation of the $\sigma$ matrix used in the calculation of the Non-Conformity Measure.

…w_item.

MaxHalford · 2023-11-06T08:09:32Z

The fact that River currently does not support a utility to calculate the inverse matrix, which is crucial in the calculation of the matrix used in the calculation of the Non-Conformity Measure.

That's not true, we have EmpiricalPrecision :)

I really rather not add numpy based implementations to River. It's just not our philosophy. This could go in river-extra through. In this case in particular, PySAD's implemention already supports mini-batches, so I don't see a lot of value adding this to River.

@hoanganhngo610 I encourage you to ask before implementing stuff. I don't want you to work and spend time polishing methods that might not be suited to River.

hoanganhngo610 · 2023-11-06T08:34:41Z

@MaxHalford I initially thought that EmpiricalPrecision is the inverse matrix of the covariance matrix, which is not a direct approach to calculate the inverse of any static matrix. That I would say was the primary reason to implement this algorithm into River.

Although PySAD has had this within its ecosystem, I really want to have one in our ecosystem to conduct any anomaly benchmarking, particularly when I was intending to implement more algorithms to the anomaly submodule. If possible, I would hope that KNNICAD can be brough to river-extra, if you find it OK to do so!

MaxHalford · 2023-11-06T10:06:24Z

I initially thought that EmpiricalPrecision is the inverse matrix of the covariance matrix, which is not a direct approach to calculate the inverse of any static matrix. That I would say was the primary reason to implement this algorithm into River.

You're right, it's just inverse covariance matrix. But it could be generalized.

Although PySAD has had this within its ecosystem, I really want to have one in our ecosystem to conduct any anomaly benchmarking, particularly when I was intending to implement more algorithms to the anomaly submodule. If possible, I would hope that KNNICAD can be brough to river-extra, if you find it OK to do so!

If the goal is make a benchmark, what I would do is create a separate repository to do those benchmarks. You could create a wrapper in there to unify PySAD and River on the same API. I don't think porting their code to us just for the sake of benchmarking is a good idea. Every model we add to River is an added model we need to maintain. We already have a lot to maintain. For instance, we have non-resolved issues for clustering methods that have been open for months.

Anyway, I think if PySAD is an active project, there isn't any justification to add their stuff to River, especially if it implies using NumPy. I would do it the other way round: make some benchmarks; if a model from PySAD really stands out, then yes maybe let's add it to River. But let's please not add stuff to River for the sake of it.

I'm sorry to be contrarian here, but I need to make sure we don't add too much stuff to River. Our users don't care too much if we have a lot of models. They just want a few models that work well.

hoanganhngo610 · 2023-11-06T10:22:25Z

@MaxHalford I totally understand your point. In that case, I will close the PR at the moment!

MaxHalford · 2023-11-06T10:28:29Z

Thanks a lot for your understanding Hoang

hoanganhngo610 added 10 commits November 6, 2023 07:49

Add the initial implementation of KNN-ICAD.

6cd838f

Add description for KNN-ICAD and modify tests.

f5584fc

Add descriptions for parameter within the algorithm.

7caef60

Rename to KNNInductiveCAD.

232860f

Update self.buffer directly instead of having to pass through self.ne…

65f8afe

…w_item.

Refactor.

fc1ce50

Add description.

1ddfc93

Refactor code with pre-commit hook.

af6db0e

Refactor.

5cc26c4

Update knn_icad.py with pre-commit hook.

cf4af67

hoanganhngo610 requested review from MaxHalford and smastelini as code owners November 6, 2023 08:05

hoanganhngo610 closed this Nov 6, 2023

hoanganhngo610 deleted the knncad-implementation branch November 6, 2023 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023 •

edited

Loading

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023

Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

Conversation

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023 • edited Loading

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023

hoanganhngo610 commented Nov 6, 2023

MaxHalford commented Nov 6, 2023

MaxHalford commented Nov 6, 2023 •

edited

Loading