Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to recover v3 backend from snapshot is:closed #5214

Closed
sachinshakya507 opened this issue Jan 7, 2024 · 2 comments
Closed

failed to recover v3 backend from snapshot is:closed #5214

sachinshakya507 opened this issue Jan 7, 2024 · 2 comments

Comments

@sachinshakya507
Copy link

Environmental Info:
RKE2 Version: v1.24.4+rke2r1

Node(s) CPU architecture, OS, and Version:
Linux 5.10.0-27-cloud-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux

Cluster Configuration:
3 masters 5 agents

Describe the bug:
By default etcd quota backend is 2 GB. I updated to 4 GB on all my master nodes updating config.yaml file on /etc/rancher/rke2/config.yaml.

etcd-arg:
  - "quota-backend-bytes=4294967296"

Steps To Reproduce:

  • Installed RKE2:

What Happened:
After updating my config.yaml file 2 of the 3 master nodes come up but one of the master node is failing to start now.

Additional context / logs:
This is my log from failing master node from directory /var/log/pods/kube-system_etcd

2024-01-07T04:25:34.916035393Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/config.go:339","msg":"loaded server configuration, other configuration command line flags and environment variables will be ignored if provided","path":"/var/lib/rancher/rke2/server/db/etcd/config"}
2024-01-07T04:25:34.916101706Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
2024-01-07T04:25:34.916107708Z stderr F {"level":"warn","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"config","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
2024-01-07T04:25:34.916112106Z stderr F {"level":"warn","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"name","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
2024-01-07T04:25:34.916116384Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/rancher/rke2/server/db/etcd","dir-type":"member"}
2024-01-07T04:25:34.916123317Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":[,"https://127.0.0.1:2380"]}
2024-01-07T04:25:34.916127906Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.913Z","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key, client-cert=, client-key=, trusted-ca = /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
2024-01-07T04:25:34.931013231Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.930Z","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":[,"https://127.0.0.1:2379"]}
2024-01-07T04:25:34.931230737Z stderr F {"level":"info","ts":"2024-01-07T04:25:34.931Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.10b7","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"rke2-dev-s1-5da27fff","data-dir":"/var/lib/rancher/rke2/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/rke2/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["https://127.0.0.1:2380"],"advertise-client-urls":[],"listen-client-urls":[,"https://127.0.0.1:2379"],"listen-metrics-urls":,"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"existing","initial-cluster-token":"","quota-size-bytes":4294967296,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
2024-01-07T04:25:35.472023137Z stderr F {"level":"info","ts":"2024-01-07T04:25:35.471Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/db","took":"540.147935ms"}
2024-01-07T04:25:36.02201363Z stderr F {"level":"info","ts":"2024-01-07T04:25:36.021Z","caller":"etcdserver/server.go:508","msg":"recovered v2 store from snapshot","snapshot-index":1843155874,"snapshot-size":"408 kB"}
2024-01-07T04:25:36.065593705Z stderr F {"level":"warn","ts":"2024-01-07T04:25:36.065Z","caller":"snap/db.go:88","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":1843155874,"snapshot-file-path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/000000006ddc53a2.snap.db","error":"snap: snapshot file doesn't exist"}
2024-01-07T04:25:36.065666Z stderr F {"level":"panic","ts":"2024-01-07T04:25:36.065Z","caller":"etcdserver/server.go:515","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/go/src/go.etcd.io/etcd/server/etcdserver/server.go:515\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/go/src/go.etcd.io/etcd/server/embed/etcd.go:245\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/server/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
2024-01-07T04:25:36.067989744Z stderr F panic: failed to recover v3 backend from snapshot
2024-01-07T04:25:36.068003881Z stderr F 
2024-01-07T04:25:36.06800898Z stderr F goroutine 1 [running]:
2024-01-07T04:25:36.06801401Z stderr F go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0005103c0, 0xc0000bca80, 0x1, 0x1)
2024-01-07T04:25:36.068018357Z stderr F 	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:234 +0x58d
2024-01-07T04:25:36.068022506Z stderr F go.uber.org/zap.(*Logger).Panic(0xc00002a0f0, 0x13c0efe, 0x2a, 0xc0000bca80, 0x1, 0x1)
2024-01-07T04:25:36.068026624Z stderr F 	/go/pkg/mod/go.uber.org/[email protected]/logger.go:227 +0x85
2024-01-07T04:25:36.068031723Z stderr F go.etcd.io/etcd/server/v3/etcdserver.NewServer(0xc00018b890, 0x14, 0x0, 0x0, 0x0, 0x0, 0xc000325290, 0x1, 0x1, 0xc000324360, ...)
2024-01-07T04:25:36.068035761Z stderr F 	/go/src/go.etcd.io/etcd/server/etcdserver/server.go:515 +0x1656
2024-01-07T04:25:36.068040519Z stderr F go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc000030000, 0xc000030600, 0x0, 0x0)
2024-01-07T04:25:36.068044797Z stderr F 	/go/src/go.etcd.io/etcd/server/embed/etcd.go:245 +0xef8
2024-01-07T04:25:36.068048875Z stderr F go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc000030000, 0x1394c9c, 0x6, 0xc00017ac01, 0x2)
2024-01-07T04:25:36.068053123Z stderr F 	/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:228 +0x32
2024-01-07T04:25:36.068057171Z stderr F go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2(0xc00003c060, 0x2, 0x2)
2024-01-07T04:25:36.068061218Z stderr F 	/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:123 +0x257a
2024-01-07T04:25:36.068065186Z stderr F go.etcd.io/etcd/server/v3/etcdmain.Main(0xc00003c060, 0x2, 0x2)
2024-01-07T04:25:36.068069113Z stderr F 	/go/src/go.etcd.io/etcd/server/etcdmain/main.go:40 +0x13f
2024-01-07T04:25:36.068073872Z stderr F main.main()
2024-01-07T04:25:36.068078721Z stderr F 	go.etcd.io/etcd/server/main.go:32 +0x45

There is no any snapshots on my directory
/var/lib/rancher/rke2/server/db/snapshots
But i have this particular file
/var/lib/rancher/rke2/server/db/etcd/member/snap/000000006ddc53a2.snap.db

What could have go wrong and how can i save my failing master node

@brandond
Copy link
Member

brandond commented Jan 7, 2024

You are on a very old release of rke2 that may not honor that argument properly at all times. Please upgrade to the latest v1.24 release at the very least, but preferably to a minor that is not end of life.

It also appears that you may have some corruption in your etcd datastore on one of the nodes; you may need to remove it from the cluster and rejoin it.

Copy link
Contributor

github-actions bot commented Mar 1, 2024

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants