-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/interval] Refacor with time-base partitioning strategy #34948
[processor/interval] Refacor with time-base partitioning strategy #34948
Conversation
20a61ca
to
d0aec7c
Compare
md []pmetric.Metrics | ||
rmLookup []map[identity.Resource]pmetric.ResourceMetrics | ||
smLookup []map[identity.Scope]pmetric.ScopeMetrics | ||
mLookup []map[identity.Metric]pmetric.Metric | ||
numberLookup []map[identity.Stream]pmetric.NumberDataPoint | ||
histogramLookup []map[identity.Stream]pmetric.HistogramDataPoint | ||
expHistogramLookup []map[identity.Stream]pmetric.ExponentialHistogramDataPoint | ||
summaryLookup []map[identity.Stream]pmetric.SummaryDataPoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having 8 slices that must be used with the same index, can we have 1 struct slice?
type Partition struct {
md pmetric.Metrics
rmLookup map[identity.Resource]pmetric.ResourceMetrics
// ...
}
type Processor struct {
// ...
parts []Partition
}
would also eliminate p.partitions
, as len(p.parts)
has this info already
// Find the ResourceMetrics | ||
resID := identity.OfResource(rm.Resource()) | ||
partition := resID.Hash().Sum64() % uint64(len(p.rmLookup)) | ||
rmClone, ok := p.rmLookup[partition][resID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless I'm totally mistaken, you can't do it like that.
iiuc the partition must be determined using identity.Stream
only, or you end up writing to different partitions here, as resID.Sum64() % 60
is nearly guaranteed to be different than streamID.Sum64() % 60
.
in exportMetrics
you clear all slices belonging to a partition, so those indices must match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sh0rez is correct. You have to always calculate the partition using the stream ID hash and the number of total partitions.
Which unfortunately, will be a bit of a nuisance here in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it further, by partitioning on streamID, it does mean that on average, we'll have up to "extra" instances of resource metrics, scope metrics, and metrics instances.
IE if you have 8 partitions, and 8 datapoints (with different label sets each). Assuming "ideal" partition hashing, each datapoint will be in a separate partition. Duplicating rm, sm, and m for each partition. This is "fine". But just something to note for the algorithm.
I think we'll have to call getOrCloneMetric()
inside the loop of aggregateDataPointsByPartition()
I wouldn't trust RSS here too much to assess the effects of your changes. after all, we are not exactly optimizing for memory usage here, but for an even throughput on the network (long-term average memory usage shouldn't really change, maybe go up slightly due to partition overhead). I'd suggest to compare |
@@ -26,6 +26,9 @@ type Config struct { | |||
// SummaryPassThrough is a flag that determines whether summary metrics should be passed through | |||
// as they are or aggregated. | |||
SummaryPassThrough bool `mapstructure:"summary_pass_through"` | |||
|
|||
// Experimental feature to enable aggregation of metrics with time-based partitioning. | |||
Partitions int `mapstructure:"partitions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this really need to be configurable?
technically having too many partitions on very low load might introduce some overhead. If I get to decide, I'd wait for this to be a problem instead of fixing it now.
having another knob brings another thing for users to think about. I believe setting this to 60 for everybody does a good-enough job, at least until someone opens an issue and complains because it didn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have the same impression. I'm preferring to not let this configurable, but I was hoping to benchmark this with several different numbers of partitions to see what could be the sweet spot here
dp := dataPoints.At(i) | ||
|
||
streamID := identity.OfStream(metricID, dp) | ||
partition := streamID.Hash().Sum64() % uint64(len(dpLookup)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the partition is per-stream (aka per-datapoint), I believe this must be done here
Interesting, I had a different goal in mind. I've seen people complain that we keep way too much state in memory, so using intervalprocessor is a no-go for folks because of memory issues. My idea with partitioning is to clear the state more often while ensuring the same stream is sent in the configured interval. What is our motivation to stabilize throughput over time? |
We wanted to smooth out the overall data transfer downstream. With the previous design, we'd gather X amount of data, and then at the interval, we'd flush it all at once. This means potentially a very large spike in CPU / network usage as we attempt to flush all that data. With this shard approach, we break up that large spike into multiple smaller spikes. (60 in the current code). So we reduce the downstream load |
a12a451
to
95b32a6
Compare
@sh0rez and I had the chance to meet at PromCon and he explained the problem we're trying to solve here :) I've made a few changes but this is not working at the moment (not even compiling actually). I admit I haven't spent too much time thinking here, but at first glance, it was not clear to me what changes we need to make on |
37140a2
to
3fdd06a
Compare
I had a wack at it, and this is what I came up with: We delete
|
There's a fair amount of code duplication, which I'm not loving. But it's easy to follow at least. Maybe we can use the some the type wrangling magic that @sh0rez did for the deltatocumulative processor. |
I got it working on my local machine, the code looks super similar to yours :) I'll push once I have a good connection (I'm commuting through airports at the moment) |
3fdd06a
to
35bd9da
Compare
Signed-off-by: Arthur Silva Sens <[email protected]>
35bd9da
to
c64e20d
Compare
Alright, I think it's ready! I still want to do macrobenchmarks to see the avg throughput difference, not sure how to measure it yet though. |
|
||
nextConsumer: nextConsumer, | ||
} | ||
} | ||
|
||
func (p *Processor) Start(_ context.Context, _ component.Host) error { | ||
exportTicker := time.NewTicker(p.config.Interval) | ||
exportTicker := time.NewTicker(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be p.config.Interval / numPartitions
instead of hardcoded to 1 second
return ErrInvalidIntervalValue | ||
} | ||
|
||
if config.Interval%time.Second != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, choosing an interval of like 16s345ms
is odd for sure. But I don't think it's necessarily a problem for the algorithm
histogramLookup: map[identity.Stream]pmetric.HistogramDataPoint{}, | ||
expHistogramLookup: map[identity.Stream]pmetric.ExponentialHistogramDataPoint{}, | ||
summaryLookup: map[identity.Stream]pmetric.SummaryDataPoint{}, | ||
numPartitions := int(config.Interval.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think this should be either hardcoded to something like 60. Or a configurable. If someone chooses a large interval like 15 minutes, we don't want to create 900 partitions as a result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might sound like a dumb question, but why 900 partitions is a problem? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just overkill. Also, I don't think we want to export every second. It's a lot of work to be doing every second.
I guess this highlights the issue with hardcoding or such. If the interval is short, like 15 seconds, we probably don't need to partition at all. But if it's 15 minutes or an hour, then partitioning will be very impactful. But again, we don't want to export every second.
IE, maybe we calculate the number of intervals to have, based on some "ideal" export frequency. (no idea what value to choose for that).
For example, say we choose the "ideal" export interval to be 15 seconds. If the "processor interval" is less than 15 seconds, we only have one partition. Between 15 and 30 seconds, two intervals. Between 30 and 45 seconds, three intervals. Etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just overkill. Also, I don't think we want to export every second. It's a lot of work to be doing every second.
Ok I see your point. I guess our approach here is switching where the heavy computation load is happening.
- No partitions = heavy load on who is receiving the aggregated metrics since everything arrives at the same time
- More partitions = Smoother load on the next consumer, but higher load on intervalprocessor
IE, maybe we calculate the number of intervals to have, based on some "ideal" export frequency. (no idea what value to choose for that).
Not sure if we'll be able to find a one-fits-all here, it all depends on the characteristics of the incoming metrics 😕.
@@ -109,16 +122,16 @@ func (p *Processor) ConsumeMetrics(ctx context.Context, md pmetric.Metrics) erro | |||
return false | |||
} | |||
|
|||
mClone, metricID := p.getOrCloneMetric(rm, sm, m) | |||
aggregateDataPoints(m.Summary().DataPoints(), mClone.Summary().DataPoints(), metricID, p.summaryLookup) | |||
mClone, metricID, partition := p.getOrCloneMetric(rm, sm, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to partition on metricID or streamID? (cc @sh0rez)
The current code partitions on metricID. Which is much simpler code-wise. But I do worry that it won't be enough. IE, if all the data are the same metrics, but only differentiated by datapoint labels. They'll all be partitioned into the same bucket, and thus defeat the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping @sh0rez :)
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
not stale, it's still on my todo list 🙃 |
I'm feeling that we're inclining towards making the number of partitions configurable. Unfortunately this has decreased in priority for me in the last days, but I'll try to get back to this in one or two weeks! |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
re-implemention of interval processor using a time-base partitioning strategy. The goal is to smoothen the throughput, avoiding CPU/Memory spikes of the next processor/exporter when receiving the aggregated data.
We're now splitting different metrics into different partitions, where the partitioning decision is made by hashing the metric identity. The number of partitions is decided based on the processor interval, e.g. interval 60s = 60 partitions.
Every second we export the metrics from one partition, and we do it one by one until the whole interval passes and we go back to the initial partition.
Link to tracking Issue: #34906
I'm opening the PR early with a rough implementation and passing unit tests, but there's a lot to clean up before marking this as ready.
TODO:
Partitioned
from config - It was just a workaround to have a clean git diffProcessor
interface and deleteSimpleProcessor