Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

Question about abillities #124

Open
asafcombo opened this issue Feb 7, 2018 · 5 comments
Open

Question about abillities #124

asafcombo opened this issue Feb 7, 2018 · 5 comments

Comments

@asafcombo
Copy link

I intend to test Skyline to monitor anomalous behavior of cpu usage across several instances hosted on the company's DC.

I fully understand how to config skyline to find anomalous behavior of a server cpu vs its timeseries.

What I want to achieve is to find anomalous behavior of this instance vs the rest of the servers as a function of time. It will help us flag unwanted behavior of a distributed system that allocates more work (or harder computation) to that instance.

For example: the mean cpu for all servers is 30%. one instance is now at 40% for 1 hour.
Whereas if this cpu behavior could be found anomalous against its timeseries - after 1 hour of that usage - it will be flagged as normal.
In my case, because the 40% is compared against 30% - it would definitely be flagged as anomalous for the entire cycle.

@astanway
Copy link
Contributor

astanway commented Feb 7, 2018 via email

@astanway
Copy link
Contributor

astanway commented Feb 7, 2018 via email

@asafcombo
Copy link
Author

asafcombo commented Feb 7, 2018

Hard-coding won't suffice as the servers ids could change by the Resource manager.

I think that what I could do is change analyzer.py function def spin_process

If I take several raw_series (one for each server) at the same time and work on them.
But I would imagine that it will require some more work (I can't have this happen for each metric, because for example If I have 40 servers then I also have 40 metrics, and each time I check one metric I'll have to check the other 40, which is redundant).

@earthgecko
Copy link

@asafcombo, @astanway is correct it could be done with some customisation and in terms of hard coding the server metric names, you could instead match on the namespace and then server id changes are not an issue, as long as you have a common namespace for servers, e.g. metrics.servers.<server_id>.*

In terms of the cost of checking the metrics, if done properly the penalty incurred should not be too steep, especially if it is only 40 metrics you are talking about compositing.

However, that said, even though all the servers may be the same, you may find that there metrics are somewhat different at times, normally. However as @astanway said this is an interesting use case I have been thinking about myself for some time in terms of metric clustering, however I feel it would work better if Skyline learnt related metrics and clustered the namespaces and did it all (or most) by itself :) Although I can tell you that it is probably not as easy as it sounds.

Skyline Mirage and Ionosphere may be able to help you out now in the interim, until you or I or someone else does that. I maintain an unforked version at https://github.com/earthgecko/skyline, the additional functionality is outlined here - #123 (comment)

And I shall definitely be looking at adding something similar to what you have outlined here in the not too .... future. I am currently adding an autocorrelations module and with Skyline learning using autocorrelations and user defined correlations, the addition of another module to analyse the metric, in terms of clustered or composite metric medians, etc is one of the next logical steps, as you nicely outlined here. I am not certain how well it will work, but it will definitely work in some way :)

Good luck with your endeavours with Skyline.

@asafcombo
Copy link
Author

@earthgecko thanks.
I will take a look to see if there is a quick workaround I can do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants