Replies: 1 comment 3 replies
-
Hey there Sebastian. I think that's a really good topic. It's a solid model: if you have a good model, then comparing its prediction to the ground truth is a good way to detect an anomaly. Then again, a ground truth isn't always given. But you could pick any feature and use it as a label. So yes it's a solid approach to explore, especially online. I would differentiate the scoring from the thresholding: a good anomaly detector is not necessarily one that outputs 0/1 correctly. It outputs a score, and then it's a separate concern to threshold that score into a 0/1. That's how all River's anomaly detectors operate. See what I mean? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey everyone,
I've been working on this idea for my bachelor thesis for a while now and I figured that it might be interesting to add this kind of anomaly detection algorithm to river.
The idea is to use some sort of online estimator to learn the "normal behavior" of data (in the use case I am researching, this is always time series data). This "reconstruction model" of the data is then used to forecast/predict a target based on some input features, while the actual ground truth is known.
It then calculates the error of the prediction and the ground truth (while keeping track of the mean and standard deviation of all the errors it has made). When the estimator is good at "reconstructing" the data that error should always have a relatively low mean and a small standard deviation. But when there is an anomaly in the data the prediction will be further away from the ground truth resulting in a larger error. When using the squared error this difference between the mean error of the estimator and such a large error (in case of an anomaly) will be fairly easy to detect. My first approach here was to build some sort of threshold: when an error is larger than the mean + 3 times the STD the example will be classified as anomalous. I also have some other ideas but this is the basis.
I would highly appreciate any input on this topic since this will be the basis of a large part of my bachelor's thesis. I hope that you guys find that this is a nice idea to add to the library.
Beta Was this translation helpful? Give feedback.
All reactions