Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to scale rrcf to detect thousands of time series #59

Open
zcsh opened this issue Jul 9, 2019 · 1 comment
Open

how to scale rrcf to detect thousands of time series #59

zcsh opened this issue Jul 9, 2019 · 1 comment

Comments

@zcsh
Copy link

zcsh commented Jul 9, 2019

dear mdb,
I have tried using rrcf to detect anomalies in a single time series, and konwn it is efficient in doing this work. But how can I build an application with rrcf to detect several thousands of time series that stream in every 1 minute synchronously with reasonable cpu and memory consumption? Have you any suggestions?

@mdbartos
Copy link
Member

Here are a few suggestions:

  • Instead of shingling, I would recommend computing summary statistics that capture the type of anomaly you are looking for. This will reduce the dimension of the points you are inserting into each tree, which will in turn result in better performance. So, if you are looking for spikes you may your points to consist of second central differences. If you are looking for long-term trends, you may want your points to consist of rolling means at different window sizes, etc.

  • If data is arriving too quickly to be inserted, you can compute a rolling summary statistic over some buffered input (mean, median, max, etc. or some combination of these). This would reduce the number of points that need to be inserted.

  • Use parallelization, as discussed in this thread: insert_point is slow #28

  • If the time series are independent, you could run different rrcf instances for each time series, using separate processes or server instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants