-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporter time series errors do not include metric names #443
Comments
We see this frequently as well, which is caused by duplicate metrics when really it is just multiple systems sending identifcal metrics without adding uniquely identifiable resources such as a host.name.
|
Regarding that second one, and really all errors in general, it is extremely beneficial to know which metric is having the error. Especially in environments with hundreds of metrics. |
I am running into this again today. The issue is clearly in my configuration, but impossible to narrow down. The error I am getting exceeds 65k characters. |
Unfortunately, this long error is an issue with the Cloud Monitoring Metrics API. The only way to solve it for this project would be to parse the error message and attempt to produce a better one. Instead we'll escalate this against the Cloud Monitoring API itself. |
Any progress on this? I encountered it again yesterday. |
Sorry, still no updates. I'll check with the Cloud Monitoring API team again to see if they have any updates. |
Still no updates. |
@dashpole Thanks, we're still seeing this, so it is still an important issue. |
Still no updates. If others run into the same issue, feel free to thumbs up the original comment |
@dashpole We still see this regularly, and it actually causes a deeper issue. If the persistent queue is enabled, these failures are put into the queue and retried over and over again. This causes issues all over the place, such as repeated API failures, an extreme log growth, and of course the persistent queue also growing on disk. |
If you are using the collector exporter, we do not recommend enabling the retry on failure setting (which we default to false). The exporter (well, really the cloud monitoring client library) has a (relatively) intelligent retry mechanism already built in, which should avoid spamming logs. This issue is just tracking making the error response more helpful. If you are experiencing other issues, feel free to open an new issue in this repo. |
As you say, those other issues are preventable with settings tuning. However, I was not aware that the library had a separate retry mechanism. We should probably have an internal discussion on this. Thank you for the insight. |
Source for retry settings built-into the client: https://github.com/googleapis/google-cloud-go/blob/main/monitoring/apiv3/metric_client.go#L63. It is different per-api-call. CreateTimeSeries does not retry. |
When working with Google Exporter, it would be nice if time series errors returned the name of the metric(s) being rejected by the API, as sometimes a system will have hundreds of metric, with only a subset of them being rejected by the API. This is very difficult to track down as it requires the user to use metrics explorer and look at every single metric to try and find one that has spotty data.
Error
This error indicates a real problem, but does not include the name of the metrics being rejected.
The text was updated successfully, but these errors were encountered: