Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.12.1] node-agent metrics missing #7014

Closed
HaveFun83 opened this issue Oct 25, 2023 · 8 comments
Closed

[1.12.1] node-agent metrics missing #7014

HaveFun83 opened this issue Oct 25, 2023 · 8 comments
Assignees
Labels
Metrics Related to prometheus metrics target/1.12.2

Comments

@HaveFun83
Copy link

HaveFun83 commented Oct 25, 2023

What steps did you take and what happened:

Since rollout of velero 1.12.0 the node-agent prometheus metrics are missing

kubectl port-forward -n velero node-agent-cwhjc 8085
curl localhost:8085/metrics
curl: (52) Empty reply from server

Prometheus shows an "connect: connection refused" towards the node-agent pods

What did you expect to happen:

node-agent prometheus metrics are available on the desired port

Anything else you would like to add:

Environment:

  • Velero version (use velero version): v1.12.0 / v1.12.1
  • Velero features (use velero client config get features): -
  • Kubernetes version (use kubectl version): v1.27.4
  • Kubernetes installer & version: Kubeadm
  • Cloud provider or hardware configuration: hardware
  • OS (e.g. from /etc/os-release): Fedora CoreOS 38.20231002.3.1

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@HaveFun83 HaveFun83 changed the title [1.12.0] node-agent metrics missing [1.12.1] node-agent metrics missing Oct 25, 2023
@HaveFun83
Copy link
Author

Found the issue
since release 1.12.0 the node-agent metric port changed to 8080
log from a node-agent

time="2023-10-25T10:16:27Z" level=info msg="Setting log-level to INFO"
time="2023-10-25T10:16:27Z" level=info msg="Starting Velero node-agent server v1.12.1 (5c4fdfe147357ec7b908339f4516cd96d6b97c61-dirty)" logSource="pkg/cmd/cli/nodeagent/server.go:103"
1.698228987275461e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
time="2023-10-25T10:16:27Z" level=info msg="Starting metric server for node agent at address []" logSource="pkg/cmd/cli/nodeagent/server.go:229"
time="2023-10-25T10:16:39Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/nodeagent/server.go:245"
time="2023-10-25T10:16:39Z" level=info msg="Controllers starting..." logSource="pkg/cmd/cli/nodeagent/server.go:287"
1.698228999510614e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}

but the pod spec point to 8085

    ports:
    - containerPort: 8085
      name: http-monitoring
      protocol: TCP

@HaveFun83
Copy link
Author

maybe related to 069c280

@qiuming-best
Copy link
Contributor

What you mentioned was a bug that was not fixed in v1.12.1. the 069c280 is merged into the main branch only.

if you use the velero:velero:main image it will be all that you expected.

image

@qiuming-best qiuming-best added the Metrics Related to prometheus metrics label Oct 26, 2023
@HaveFun83
Copy link
Author

@qiuming-best thanks for your reply.
But we run velero in production environments and need stable and reliable releases.
Can 069c280 be picked for the next release?

@HaveFun83
Copy link
Author

@qiuming-best @allenxu404
any news here?

@allenxu404
Copy link
Contributor

Given that this issue did not originate from 1.12, we recommend including the fix(#6784) in Velero 1.13 rather than cherry-picking it into the 1.12 minor version.

@HaveFun83
Copy link
Author

Given that this issue did not originate from 1.12, we recommend including the fix(#6784) in Velero 1.13 rather than cherry-picking it into the 1.12 minor version.

This issue originate from the 1.12 release in the 1.11 release the node-agent had a working metric endpoint.

@allenxu404
Copy link
Contributor

OK, if that's the case, I think we should cherry-pick it to 1.12.2.

@qiuming-best Can you help to add label target/1.12.2 to this issue? I will cherry-pick the fix when we release 1.12.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metrics Related to prometheus metrics target/1.12.2
Projects
None yet
Development

No branches or pull requests

3 participants