You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node(s) CPU architecture, OS, and Version:
Linux 5.10.0-27-cloud-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux
Cluster Configuration:
3 masters 5 agents
Describe the bug:
By default etcd quota backend is 2 GB. I updated to 4 GB on all my master nodes updating config.yaml file on /etc/rancher/rke2/config.yaml.
etcd-arg:
- "quota-backend-bytes=4294967296"
Steps To Reproduce:
Installed RKE2:
What Happened:
After updating my config.yaml file 2 of the 3 master nodes come up but one of the master node is failing to start now.
Additional context / logs:
This is my log from failing master node from directory /var/log/pods/kube-system_etcd
2024-01-07T04:25:34.916035393Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/config.go:339","msg":"loaded server configuration, other configuration command line flags and environment variables will be ignored if provided","path":"/var/lib/rancher/rke2/server/db/etcd/config"}
2024-01-07T04:25:34.916101706Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
2024-01-07T04:25:34.916107708Z stderr F {"level":"warn","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"config","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
2024-01-07T04:25:34.916112106Z stderr F {"level":"warn","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"name","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
2024-01-07T04:25:34.916116384Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/rancher/rke2/server/db/etcd","dir-type":"member"}
2024-01-07T04:25:34.916123317Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":[,"https://127.0.0.1:2380"]}
2024-01-07T04:25:34.916127906Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key, client-cert=, client-key=, trusted-ca = /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
2024-01-07T04:25:34.931013231Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.930Z","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":[,"https://127.0.0.1:2379"]}
2024-01-07T04:25:34.931230737Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.931Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.10b7","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"rke2-dev-s1-5da27fff","data-dir":"/var/lib/rancher/rke2/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/rke2/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["https://127.0.0.1:2380"],"advertise-client-urls":[],"listen-client-urls":[,"https://127.0.0.1:2379"],"listen-metrics-urls":,"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"existing","initial-cluster-token":"","quota-size-bytes":4294967296,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
2024-01-07T04:25:35.472023137Z stderr F {"level":"info","ts":"2024-01-07T04:25:35.471Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/db","took":"540.147935ms"}
2024-01-07T04:25:36.02201363Z stderr F {"level":"info","ts":"2024-01-07T04:25:36.021Z","caller":"etcdserver/server.go:508","msg":"recovered v2 store from snapshot","snapshot-index":1843155874,"snapshot-size":"408 kB"}
2024-01-07T04:25:36.065593705Z stderr F {"level":"warn","ts":"2024-01-07T04:25:36.065Z","caller":"snap/db.go:88","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":1843155874,"snapshot-file-path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/000000006ddc53a2.snap.db","error":"snap: snapshot file doesn't exist"}
2024-01-07T04:25:36.065666Z stderr F {"level":"panic","ts":"2024-01-07T04:25:36.065Z","caller":"etcdserver/server.go:515","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/go/src/go.etcd.io/etcd/server/etcdserver/server.go:515\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/go/src/go.etcd.io/etcd/server/embed/etcd.go:245\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/server/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
2024-01-07T04:25:36.067989744Z stderr F panic: failed to recover v3 backend from snapshot
2024-01-07T04:25:36.068003881Z stderr F
2024-01-07T04:25:36.06800898Z stderr F goroutine 1 [running]:
2024-01-07T04:25:36.06801401Z stderr F go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0005103c0, 0xc0000bca80, 0x1, 0x1)
2024-01-07T04:25:36.068018357Z stderr F /go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:234 +0x58d
2024-01-07T04:25:36.068022506Z stderr F go.uber.org/zap.(*Logger).Panic(0xc00002a0f0, 0x13c0efe, 0x2a, 0xc0000bca80, 0x1, 0x1)
2024-01-07T04:25:36.068026624Z stderr F /go/pkg/mod/go.uber.org/[email protected]/logger.go:227 +0x85
2024-01-07T04:25:36.068031723Z stderr F go.etcd.io/etcd/server/v3/etcdserver.NewServer(0xc00018b890, 0x14, 0x0, 0x0, 0x0, 0x0, 0xc000325290, 0x1, 0x1, 0xc000324360, ...)
2024-01-07T04:25:36.068035761Z stderr F /go/src/go.etcd.io/etcd/server/etcdserver/server.go:515 +0x1656
2024-01-07T04:25:36.068040519Z stderr F go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc000030000, 0xc000030600, 0x0, 0x0)
2024-01-07T04:25:36.068044797Z stderr F /go/src/go.etcd.io/etcd/server/embed/etcd.go:245 +0xef8
2024-01-07T04:25:36.068048875Z stderr F go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc000030000, 0x1394c9c, 0x6, 0xc00017ac01, 0x2)
2024-01-07T04:25:36.068053123Z stderr F /go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:228 +0x32
2024-01-07T04:25:36.068057171Z stderr F go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2(0xc00003c060, 0x2, 0x2)
2024-01-07T04:25:36.068061218Z stderr F /go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:123 +0x257a
2024-01-07T04:25:36.068065186Z stderr F go.etcd.io/etcd/server/v3/etcdmain.Main(0xc00003c060, 0x2, 0x2)
2024-01-07T04:25:36.068069113Z stderr F /go/src/go.etcd.io/etcd/server/etcdmain/main.go:40 +0x13f
2024-01-07T04:25:36.068073872Z stderr F main.main()
2024-01-07T04:25:36.068078721Z stderr F go.etcd.io/etcd/server/main.go:32 +0x45
There is no any snapshots on my directory
/var/lib/rancher/rke2/server/db/snapshots
But i have this particular file
/var/lib/rancher/rke2/server/db/etcd/member/snap/000000006ddc53a2.snap.db
What could have go wrong and how can i save my failing master node
The text was updated successfully, but these errors were encountered:
You are on a very old release of rke2 that may not honor that argument properly at all times. Please upgrade to the latest v1.24 release at the very least, but preferably to a minor that is not end of life.
It also appears that you may have some corruption in your etcd datastore on one of the nodes; you may need to remove it from the cluster and rejoin it.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Environmental Info:
RKE2 Version: v1.24.4+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux 5.10.0-27-cloud-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux
Cluster Configuration:
3 masters 5 agents
Describe the bug:
By default etcd quota backend is 2 GB. I updated to 4 GB on all my master nodes updating config.yaml file on /etc/rancher/rke2/config.yaml.
Steps To Reproduce:
What Happened:
After updating my config.yaml file 2 of the 3 master nodes come up but one of the master node is failing to start now.
Additional context / logs:
This is my log from failing master node from directory /var/log/pods/kube-system_etcd
There is no any snapshots on my directory
/var/lib/rancher/rke2/server/db/snapshots
But i have this particular file
/var/lib/rancher/rke2/server/db/etcd/member/snap/000000006ddc53a2.snap.db
What could have go wrong and how can i save my failing master node
The text was updated successfully, but these errors were encountered: