-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate metric datapoints over time period #29461
Comments
@crobert-1 who is the sponsor for this issue? |
@djaglowski volunteered to be the sponsor here. |
This is something that is on our radar, and would like to support as much as possible. Our use case is to enable remote writing of metrics from the count connector to Thanos. |
Some clarifications of the spec for this one, given that #30479 will exist:
@djaglowski Do you have any thoughts for a name? |
@RichieSams, this all looks good to me. |
other name idea: intervalprocessor (because it emits at fixed intervals) for implementing this, you may want to take a look at |
Thanks for the pointer! I like that name as well; it's less of a mouthful. |
Just so I understand the current situation:
Is this correct? |
As an added comment: I have started working on this issue, PRs to follow in the coming days |
Thanks @RichieSams! |
There is no reason why we can't aggregate deltas over time too, though, right? |
It just duplicates the code of the new deltatocumulativeprocessor. Unless you mean something else like:
This could work. But I'd be curious to the use-cases for that. Vs just converting to cumulative and aggregating those. |
Yes exactly, many deltas in, one delta out. I think the batch processor does something similar, although the output trigger is batch size, not a clock period. I guess one use case would be where deltas are preferred/required downstream, but you want to reduce the data rate, e.g., sending a single delta of 10,000 over the wire after 15s is much more efficient than sending 10,000 deltas of 1. I'm not saying this has to be in scope for the initial implementation if there is no immediate need, but just trying to think about how all this might work. |
I'm assuming that the config across all these processes will be somewhat consistent. The Cumulative to Delta Processor can be configured with metric include and exclude rules. Isn't it more appropriate to simply pass through metric data that isn't matched? If a user wants to drop a certain metric, they should configure a filter processor, no? This gives maximum flexibility, and the pipeline is nice and explicit. We'll, I guess it's best to align with what other processors do in comparable situations... |
It looks like the batchprocessor doesn't do any aggregation. It collects groups of metrics, and then sends them all at once in a single go. https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/batch_processor.go#L425 IE, it would collect 10,000 deltas, and send them all at once to the next component in the pipeline. Vs without it, the next component would get 10,000 individual requests with each single metric. So it's optimizing for things like TCP connection / request overhead.
For sure. I think it could be added in a future scope if the need arises. I don't think that would impact the immediate design. |
I don't have a strong opinion either way. I'd go with whatever the convention is for other processors. So for delta metrics, we'd "consume" all the metrics and then periodically export on the interval. For cumulative metrics, we'd pass them to the next component in the chain untouched. Yes? |
I think it's the other way round😅. The principle use-case is for cumulative metrics, as produced by the new delta to cumulative processor. But otherwise, yes, exactly 👍 |
Right right, my brain is fried today. lol |
I think the first approach it correct, the exporter publishes the latest / current value every interval.
Yes, the timestamp should be updated to This is then similar to how things work if we were to publish a metric to a prometheus exporter, and the scrape that back in with a prometheus receiver. We can export datapoints to a prometheus exporter at any rate, but as the prometheus receiver scrapes at a set, steady interval, it just produces one datapoint per timeseries every scrap interval. This includes the case where no new datapoints were sent to the prometheus exporter, the next scrape will just output a new datapoint with the Doing it this way meets a key use-case of working with native OTLP metrics in the whole pipeline, until eventually sending to a prometheusremotewrite exporter. |
I think if/when we have the ability to work with delta datapoints, we would do something similar, or at least set the value to zero (we would still keep state for the timeseries). But as I understand it, we are just looking at cumulative datapoints for now, and will pass-through deltas, right? |
Wouldn't |
Looking at the spec, for cumulative datapoints the
(I will edit my comment above to correct it) So I suppose in our case, that is going to mean the cumulative datapoints' |
It is not clear to me, but it looks like a Sum consists of NumberDataPoints, and |
@sh0rez Can I get your opinion on the two export options I presented above? We have one vote for each atm :P |
imo the reason for this processor to exist is to limit the flowrate of datapoints (datapoints per minute, etc).
In the case where no new datapoints are received within We can leave the delta case out for now. |
Yes this makes sense 👍
I would say the reason if to fix the flowrate, rather than just limit. I think we have an open question as to whether we should publish a new cumulative datapoint at every interval in the case where the cumulative value has not changed. e.g., would we publish a stream of I detailed my use case in more detail here: #30827 (comment) |
Following the discussion here: open-telemetry#29461 (comment)
Description: This PR implements the main logic of this new processor. Link to tracking Issue: #29461 This is a re-opening of [30827](#30827) Testing: I added a test for the main aggregation / export behavior, but I need to add more to test state expiry Documentation: I updated the README with the updated configs, but I should add some examples as well.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
Closed by: #32054 |
…code owners (#33019) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> @jpkrohling and @djaglowski volunteered to be sponsors of the delta to cumulative processor, and @djaglowski also volunteered to be sponsor of the interval processor in relation to this. They should also be code owners. From [CONTRIBUTING.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#adding-new-components): ``` A sponsor is an approver who will be in charge of being the official reviewer of the code and become a code owner for the component. ``` **Link to tracking Issue:** <Issue number if applicable> #30479 - Delta to cumulative processor #29461 - Interval processor --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
The purpose and use-cases of the new component
This processor would receive a stream of data points for a a metric timeseries, and periodically emit an "aggregate" at a set interval.
We can achieve something similar already by exporting metrics to the Prometheus exporter, then periodically scraping the prom endpoint with a Prometheus receiver. However this is clunky and somewhat less efficient than using a processor.
One concrete use case is where we want to send high frequency metric datapoints to the Prometheus remote write exporter, for example datapoints produces by the count connector. When counting spans, the count connector will produce a single delta datapoints (increment value 1) for each counted span, which could of course be many times per second. However, typically we would only want to remote write to Prometheus periodically, as we would if we were scraping, perhaps once every 30 seconds. This is especially true for downstream metric sinks that charge for datapoints per minute.
This proposed processor would be stateful, tracking metrics by identity, maintaining a single aggregate value. This aggregate will be output every interval.
For example, if the processor received the following delta datapoints: 1, 3, 5, then at the next "tick" of the interval clock, a single delta datapoint of 9 would be emitted.
Similarly, if the processor received the following cumulative datapoints: 7, 9, 11, then at the next "tick" of the interval clock, a single cumulative datapoint of 11 would be emitted.
There would be a "max_staleness" config option so that we can stop tracking metrics which don't receive any data for a given time.
Example configuration for the component
max_staleness
Include/exclude MatchMetrics
Telemetry data types supported
Metrics
Is this a vendor-specific component?
Code Owner(s)
No response
Sponsor (optional)
@djaglowski
Additional context
No response
The text was updated successfully, but these errors were encountered: