Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] calculation of rate for counters #61

Open
liyichao opened this issue Apr 10, 2015 · 13 comments
Open

[Feature request] calculation of rate for counters #61

liyichao opened this issue Apr 10, 2015 · 13 comments

Comments

@liyichao
Copy link

It will be helpful if aggregation supports rate calculation.

For example, network inflow or outflow data collected by collectd is monotonic increasing. We need to subtract two points and divide by time to get bps.

@grobian
Copy link
Owner

grobian commented Apr 10, 2015

You mean a derivative or something? How exactly should the aggregation be computed in that case?

@liyichao
Copy link
Author

Yes, derivative. Given two datapoints with (v1,t1), (v2, t2). It can be calculated as (v2 - v1) / (t2 - t1).

@grobian
Copy link
Owner

grobian commented Apr 10, 2015

so I imagine you'd want to do something like avg or sum of the derivatives of input streams, is that correct?

derivative in graphite-context doesn't take timestamp into account though, so this is a bit more complex than that

@liyichao
Copy link
Author

liyichao commented May 9, 2015

I think it's the graphite's perSecond function (http://graphite.readthedocs.org/en/latest/functions.html#graphite.render.functions.perSecond).

@liyichao
Copy link
Author

liyichao commented May 9, 2015

I think this feature is really more helpful than the fancy hot reload. Graphite has derivative, nonnegativederivative, and perSecond. And influxdb will not release 0.9 before it implements derivative and nonnegativederivative (http://influxdb.com/blog/2015/05/01/InfluxDB-v0_9_0-release-update.html). Heka even directly calculates rate for counter in their statsd plugin (https://github.com/mozilla-services/heka/blob/dev/pipeline%2Fstat_accum_input.go#L245). You can see from these facts how this function is so important.

As to me, we would like to compute how many bytes a network interface receives/sends per second, while the data collected by collectd is monotonic increasing,making it useless. And we must put the value into graphite and use graphite function to make useable. But we also have an realtime alerting system to which carbon-c-relay feeds data in realtime. And if we want to alert on network flow, we must compute it on the fly.

@grobian
Copy link
Owner

grobian commented May 9, 2015

Ok, I see how this is important, I just need to get an idea myself how to (properly) define how the rate counter or derivative should look like.

Obviously, this will take an amazing amount of memory and cpu cycles to do (for we have to keep track of the metrics that would need this). It hints a lot like a "special" aggregation, but perhaps that is too much overhead to make this doable.

@grobian
Copy link
Owner

grobian commented May 9, 2015

derive seconds from metric.a.b.c ;
or maybe somewhat more generic
calculate derivative from metric.a.b.c ;
such that I could also think about doing
calculate movingaverage from metric.a.b.c ;
/me keeps thinking about it

@liyichao
Copy link
Author

How about calculate rate from metric.a.b.c into metric.a.b.c.rate interval 10s,

you wait for 10s, and you calculate some functions based on the data points received during the 10s. As to rate, take the last and the first point, minus them and divide by their time diff. As to average, sum all and divides by the count.

When the calculation is done, the result metric is put back into carbon-c-relay, so we can use it to calculate other metrics.

@grobian
Copy link
Owner

grobian commented May 11, 2015

I actually think that's a good idea. I also throught about it some more and think the stop keyword on aggregations and calculations would make sense for those scenarios where you don't care about the original.

@grobian
Copy link
Owner

grobian commented May 16, 2015

while implementing this:
values 1:10 2:11 3:13 4:14 5:16 6:17 interval 3s, would yield (13 - 10) / (3 - 1) = 1.5 and (17 - 14) / (6 - 4) = 1.5 in contrast to (17 - 13) / (6 - 3) = 1.3.
e.g. I cannot join with the previous value, which for a single value interval is a problem.

@liyichao
Copy link
Author

I think if the interval is 3s. Then the bucket will be [0, 3] , [3, 6] ...
you put value in each bucket, and calculates each rate with the first and last value in the bucket, emit the rate with timestamp 0, 3, 6,...

@grobian
Copy link
Owner

grobian commented May 17, 2015

I need to think of how I can get the same entry in two buckets in a sane way, or make the operation work on buckets instead of values.

grobian added a commit that referenced this issue May 31, 2015
This is first work for issue #61.  Allow storing all values in a bucket
such that we can do calculations which need those (like e.g. true
stddev).  The objective to create a rate counter isn't met by this work
since we need to compare values from different buckets.  Perhaps this
needs sliding/moving window support versus the current tumbling window
support.
grobian added a commit that referenced this issue Oct 31, 2015
This is first work for issue #61.  Allow storing all values in a bucket
such that we can do calculations which need those (like e.g. true
stddev).  The objective to create a rate counter isn't met by this work
since we need to compare values from different buckets.  Perhaps this
needs sliding/moving window support versus the current tumbling window
support.
@grobian
Copy link
Owner

grobian commented Nov 5, 2015

Note to self and others: the feature branch was merged with master, but doesn't allow solving this particular issue. Perhaps the easier way to do this would be to add a new construct, like rewrite, that will replace the value with the value since last seen. It might consume a lot of memory obviously, and would be pretty much unrestricted in consumption, so something that sounds like a lot of sharp edges.

pkittenis pushed a commit to pkittenis/carbon-c-relay that referenced this issue Feb 3, 2016
This is first work for issue grobian#61.  Allow storing all values in a bucket
such that we can do calculations which need those (like e.g. true
stddev).  The objective to create a rate counter isn't met by this work
since we need to compare values from different buckets.  Perhaps this
needs sliding/moving window support versus the current tumbling window
support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants