-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove replace directive for golang.org/x/exp #5972
Conversation
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we got rid of a lot of cruft! Awesome
@mattdurham unfortunately the Linux build doesn't work because Pyroscope's eBPF module has a replace directive for the |
I raised a PR for Pyroscope. |
fc61ae3
to
ea5d929
Compare
I had to upgrade our Prometheus dependency, and this became a much bigger PR than expected. |
`enable_http2` | `bool` | Whether HTTP2 is supported for requests. | `true` | no | ||
`honor_labels` | `bool` | Indicator whether the scraped metrics should remain unmodified. | `false` | no | ||
`honor_timestamps` | `bool` | Indicator whether the scraped timestamps should be respected. | `true` | no | ||
`track_timestamps_staleness` | `bool` | Indicator whether to track the staleness of the scraped timestamps. | `false` | no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bboreham Regarding #5921 - for now I intend to default this to false
for a few reasons:
- IIUC,
track_timestamps_staleness = true
only makes sense if out of order ingestion is allowed in the back end database. - Apparently there are implications to querying and alerting.
Please let me know if you disagree. I'm open to changing this to default to true
prior to merging. If it causes a backwards-incompatible change for the users, we'll have to mention it explicitly in our changelog and write docs with steps on what users need to change to get the behaviour they need (e.g. how to change their alerts or dashboards).
We usually list backwards incompatible changes here:
https://grafana.com/docs/agent/latest/flow/release-notes/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, track_timestamps_staleness = true only makes sense if out of order ingestion is allowed in the back end database.
Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone.
The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here.
@bboreham I think the assumption behind setting track_timestamps_staleness = true
is that if the scraper didn't get any samples then that must be because there aren't any. But is this a good assumption to make in the general case? If a sample is exposed via explicit timestamps, and if there is no new value to report, then what sample should be exposed for its series the next time a scrape happens? Is the convention to just report the same value with a new timestamp?
I think it makes sense to default track_timestamps_staleness
to true
if:
- The convention is that the absence of a sample is considered enough evidence to decide that the series is stale.
- There is no need to enable out of order ingestion. This is so that we prevent a situation where a timestamp arrives late (e.g. because it takes a long time to generate) but it can't be ingested in the TSDB because there is already a staleness marker with a more recent timestamp.
This explicit timestamp feature seems like a way to "push" metrics.... so I'm not sure what assumptions are ok to make. I suspect that Agents in "push" systems like OTel don't just declare a series as stale if no samples were pushed in a certain time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bboreham, would you mind getting back to us on the comment above please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no sample is supplied for a timestamp, PromQL (at query time) will use the preceding value up to 5 minutes old.
This creates a long-standing issue with cAdvisor (i.e. Kubelet container metrics).
It's got nothing to do with "push".
There is no expectation that explicit timestamps come out of order. This never worked historically in Prometheus, and there is no reason to suppose people started sending them.
But is this a good assumption to make in the general case?
Yes, it is the standard behaviour when exporters do not supply the timestamp. Which is the vast majority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I don't think we should change this default in the middle of an 800-line PR doing other things.
I can make a separate PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you, I'll take the track_timestamps_staleness
parameter out of this PR. We can introduce it in a different PR. I don't want to add it now and change its default value later, because prometheus.scrape
is a stable component and its defaults aren't meant to change often.
Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone. The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here. |
ed0925d
to
d94e489
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just a few nits, thanks for taking care of this :)
- `SERVICE`: The OVHcloud service of the targets to retrieve. | ||
- `PROMETHEUS_REMOTE_WRITE_URL`: The URL of the Prometheus remote_write-compatible server to send metrics to. | ||
- `USERNAME`: The username to use for authentication to the remote_write API. | ||
- `PASSWORD`: The password to use for authentication to the remote_write API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for the example I would suggest to also set refresh_interval and endpoint + I would directly set some real looking data instead of the placeholders (the placeholders are already used in the ##usage part)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I also don't like how here we repeat the definitions of the arguments. I did it this way because it's consistent with other discovery components, but I agree we should change this for all discovery components at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this was a chunky one! Approving this to unblock you so we can move ahead once CI is green, and the discussion with Bryan has settled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD.
Add discovery.ovhcloud to "targets" compatible components.
Co-authored-by: William Dumont <[email protected]>
35ad5f9
to
b1a9911
Compare
@tpaschalis @mattdurham @wildum I am re-requesting a review, because I rebased the branch and as discussed I removed
If you agree that it's ok to not support this argument for now, please feel free to approve the PR again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's follow up with discussion for the new argument in a new PR.
* Remove replace directive for golang.org/x/exp * Update pyroscope/ebpf from 0.4.0 to 0.4.1 * Fill in missing docs about HTTP client options. Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD. * Add discovery.ovhcloud * Add a converter for discovery.ovhcloud * Update cloudwatch_exporter docs * Fix converter tests * Mention Prometheus update in the changelog. --------- Co-authored-by: William Dumont <[email protected]>
* Remove replace directive for golang.org/x/exp * Update pyroscope/ebpf from 0.4.0 to 0.4.1 * Fill in missing docs about HTTP client options. Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD. * Add discovery.ovhcloud * Add a converter for discovery.ovhcloud * Update cloudwatch_exporter docs * Fix converter tests * Mention Prometheus update in the changelog. --------- Co-authored-by: William Dumont <[email protected]>
Removing a replace directive for
golang.org/x/exp
form the Agent's go.mod file.This is necessary because on a separate branch I am upgrading Agent to a new OpenTelemetry version, and it requires a new version of
github.com/grafana/loki/pkg/push
which needs the latestgolang.org/x/exp
.The reason why
golang.org/x/exp
has been problematic is because the return type ofSortFunc
changed frombool
(in the old version) toint
(in the new version). Apparently some packages like to usegolang.org/x/exp
because it's more performant.Fixes #5921