Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; #408

Open
jicki opened this issue Oct 29, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jicki
Copy link

jicki commented Oct 29, 2024

What is the version?

3.3.8-3.6.0-ubuntu22.04

What happened?

dcgm-exporter-m9prp   0/1     CrashLoopBackOff
  • logs
time="2024-10-29T09:58:01Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724376 vs. 4194304)"
2024/10/29 09:58:01 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-10-29T09:58:26Z" level=error msg="Failed to collect metrics; err: failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724552 vs. 4194304)"
time="2024-10-29T09:58:31Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724492 vs. 4194304)"
2024/10/29 09:58:31 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-10-29T09:58:32Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724492 vs. 4194304)"
2024/10/29 09:58:32 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-10-29T09:58:56Z" level=error msg="Failed to collect metrics; err: failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724376 vs. 4194304)"
time="2024-10-29T09:59:01Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724376 vs. 4194304)"
2024/10/29 09:59:01 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-10-29T09:59:02Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724376 vs. 4194304)"
2024/10/29 09:59:02 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-10-29T09:59:26Z" level=error msg="Failed to collect metrics; err: failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4724492 vs. 4194304)"

What did you expect to happen?

running dcgm-exporter

What is the GPU model?

No response

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response

@jicki jicki added the bug Something isn't working label Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant