Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd snapshot controller spews logs while waiting for nodes to annotate #8892

Closed
brandond opened this issue Nov 17, 2023 · 6 comments
Closed
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Nov 17, 2023

Early in cluster startup, before any etcd nodes have reconciled their snapshots and set the annotations, the full sync loop will continually return errNotReconciled and be immediately reenqueued:

Unfortunately this creates a LOT of log spew, especially on RKE2 where startup takes a little longer to settle out.

We should move the log line further down the function, or reduce its verbosity.

@brandond
Copy link
Member Author

brandond commented Nov 17, 2023

A good example of this is available in the logs from rancher/rke2#5048

It continues on like this for a while until things settle down.

Nov 08 13:09:50 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:50Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:50 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:50Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:51 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:51Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:51 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:51Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:52 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:52Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:52 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:52Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:53 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:53Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:53 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:53Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:53 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:53Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:54 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:54Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:54 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:54Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:55 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:55Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:55 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:55Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:55 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:55Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:56 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:56Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:56 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:56Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:57 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:57Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:57 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:57Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:57 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:57Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:58 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:58Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:58 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:58Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:59 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:59Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:09:59 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:09:59Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:10:00 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:10:00Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:10:00 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:10:00Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"
Nov 08 13:10:00 test-node-xr6bx rke2[379]: time=\"2023-11-08T13:10:00Z\" level=info msg=\"Reconciling snapshot ConfigMap data\"

@mdrahman-suse
Copy link

Hi @brandond, is this issue and the corresponding backports ready to be moved to Test? I see the PR linked has been merged

@brandond brandond self-assigned this Nov 28, 2023
@brandond
Copy link
Member Author

Yes, sorry - I apparently missed assigning it to myself so I didn't see it in the project board.

@brandond brandond moved this from Peer Review to To Test in K3s Development Nov 28, 2023
@rancher-max rancher-max self-assigned this Nov 28, 2023
@rancher-max
Copy link
Contributor

I'm wondering if whatever rancher manager is doing in their CI is causing this because I cannot reproduce it whether through rancher or standalone, even when having the etcd-snapshots tuned to take one every minute. I have tried all versions available. I will confirm no regressions to this logging and then close this issue. Using journalctl -u rke2-server | grep "Reconciling snapshot ConfigMap data" for rke2 and journalctl -u k3s | grep "Reconciling snapshot ConfigMap data" for k3s.

@brandond
Copy link
Member Author

@rancher-max you'll see this spew when using embedded etcd, but no snapshots exist - either locally, or in the configured S3 bucket.

@rancher-max
Copy link
Contributor

rancher-max commented Nov 28, 2023

Validated on v1.28.4-rc1+k3s1

Environment Details

Infrastructure

  • Cloud (AWS)
  • Hosted

Node(s) CPU architecture, OS, and Version:

PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cluster Configuration:

1 server

Config.yaml:

write-kubeconfig-mode: 644
cluster-init: true
etcd-s3: true
etcd-s3-bucket: mybucket
etcd-s3-folder: myfolder
etcd-s3-region: us-east-2
etcd-s3-access-key: <redacted>
etcd-s3-secret-key: <redacted>
token: sometoken

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
$ curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.4-rc1+k3s1 sh -
  1. Check logs for unnecessary spew:
$ journalctl -u k3s | grep "Reconciling snapshot ConfigMap data"
Nov 28 23:09:32 ip-172-31-46-231 k3s[1605]: time="2023-11-28T23:09:32Z" level=info msg="Reconciling snapshot ConfigMap data"

Replication Results:

  • k3s version used for replication:
k3s version v1.28.3+k3s2 (bbafb86e)
go version go1.20.10
  • I was not able to replicate the issue
$ journalctl -u k3s | grep "Reconciling snapshot ConfigMap data"
Nov 28 23:09:21 ip-172-31-37-124 k3s[1598]: time="2023-11-28T23:09:21Z" level=info msg="Reconciling snapshot ConfigMap data"

Validation Results:

  • k3s version used for validation:
k3s version v1.28.4-rc1+k3s1 (3f237230)
go version go1.20.11
  • The results were the same:
$ journalctl -u k3s | grep "Reconciling snapshot ConfigMap data"
Nov 28 23:09:32 ip-172-31-46-231 k3s[1605]: time="2023-11-28T23:09:32Z" level=info msg="Reconciling snapshot ConfigMap data"
  • No regressions noticed. Attempted in SLES15 SP4 and RHEL 9.2 as well with multinode setups and without s3 snapshots configured
  • Also attempted with the same k8s versions in rke2
  • In one case during validation, I saw two logs of this spit out but never more than that which should be acceptable.

Based on the above, I'm going to close this out. I think the log spew must appear under different circumstances, and I'm not noticing any base functionality changes here so should be okay either way.

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants