Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the KNN-ICAD (KNN Inductive Conformal Anomaly Detection Algorithm) #1441

Closed
wants to merge 10 commits into from

Conversation

hoanganhngo610
Copy link
Contributor

KNN-ICAD is a conformalized density- and distance-based anoaly detection algorithms for a one-dimensional time-series data. This algorithm uses a combination of a feature extraction method, an approach to assess a score whether a new observation differs significantly from a previously observed data, and a probabilistic interpretation of this score based on the conformal paradigm.

This implementation is adapted from the implementation within PySAD (Python Streaming Anomaly Detection) and NAB (Numenta Anomaly Benchmark).

This implementation relies heavily on numpy due to the following reasons:

  • Computational complexity upon handling 2-dimensional data (for example, the proper training matrix and calibration matrix)
  • The fact that River currently does not support a utility to calculate the inverse matrix, which is crucial in the calculation of the $\sigma$ matrix used in the calculation of the Non-Conformity Measure.

@MaxHalford
Copy link
Member

MaxHalford commented Nov 6, 2023

The fact that River currently does not support a utility to calculate the inverse matrix, which is crucial in the calculation of the matrix used in the calculation of the Non-Conformity Measure.

That's not true, we have EmpiricalPrecision :)

I really rather not add numpy based implementations to River. It's just not our philosophy. This could go in river-extra through. In this case in particular, PySAD's implemention already supports mini-batches, so I don't see a lot of value adding this to River.

@hoanganhngo610 I encourage you to ask before implementing stuff. I don't want you to work and spend time polishing methods that might not be suited to River.

@hoanganhngo610
Copy link
Contributor Author

@MaxHalford I initially thought that EmpiricalPrecision is the inverse matrix of the covariance matrix, which is not a direct approach to calculate the inverse of any static matrix. That I would say was the primary reason to implement this algorithm into River.

Although PySAD has had this within its ecosystem, I really want to have one in our ecosystem to conduct any anomaly benchmarking, particularly when I was intending to implement more algorithms to the anomaly submodule. If possible, I would hope that KNNICAD can be brough to river-extra, if you find it OK to do so!

@MaxHalford
Copy link
Member

I initially thought that EmpiricalPrecision is the inverse matrix of the covariance matrix, which is not a direct approach to calculate the inverse of any static matrix. That I would say was the primary reason to implement this algorithm into River.

You're right, it's just inverse covariance matrix. But it could be generalized.

Although PySAD has had this within its ecosystem, I really want to have one in our ecosystem to conduct any anomaly benchmarking, particularly when I was intending to implement more algorithms to the anomaly submodule. If possible, I would hope that KNNICAD can be brough to river-extra, if you find it OK to do so!

If the goal is make a benchmark, what I would do is create a separate repository to do those benchmarks. You could create a wrapper in there to unify PySAD and River on the same API. I don't think porting their code to us just for the sake of benchmarking is a good idea. Every model we add to River is an added model we need to maintain. We already have a lot to maintain. For instance, we have non-resolved issues for clustering methods that have been open for months.

Anyway, I think if PySAD is an active project, there isn't any justification to add their stuff to River, especially if it implies using NumPy. I would do it the other way round: make some benchmarks; if a model from PySAD really stands out, then yes maybe let's add it to River. But let's please not add stuff to River for the sake of it.

I'm sorry to be contrarian here, but I need to make sure we don't add too much stuff to River. Our users don't care too much if we have a lot of models. They just want a few models that work well.

@hoanganhngo610
Copy link
Contributor Author

@MaxHalford I totally understand your point. In that case, I will close the PR at the moment!

@MaxHalford
Copy link
Member

Thanks a lot for your understanding Hoang

@hoanganhngo610 hoanganhngo610 deleted the knncad-implementation branch November 6, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants