Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod for kube-apiserver not synced (no current running pod found), retrying #4907

Closed
bataliero opened this issue Oct 17, 2023 · 5 comments
Closed

Comments

@bataliero
Copy link

Environmental Info:
RKE2 Version:

rke2 version v1.26.9+rke2r1 (368ba42)
go version go1.20.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-18-26 5.19.0-1025-aws #26~22.04.1-Ubuntu SMP Mon Apr 24 01:58:15 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Single server

Describe the bug:

It seams that API server is not started.

Steps To Reproduce:

  • Installed RKE2:
    I just run on a fresh Ubuntu 22.04 (AWS EC2)
curl -sfL https://get.rke2.io | sudo sh -
sudo systemctl start rke2-server.service  #  <- this never ends

Expected behavior:

I would expect to be able to use kubectl that is able to connect to an API server (executed locally on a server machine)

>> sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes
E1017 08:59:07.669658    1911 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": dial tcp 127.0.0.1:6443: connect: connection refused
E1017 08:59:07.669941    1911 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": dial tcp 127.0.0.1:6443: connect: connection refused
E1017 08:59:07.671214    1911 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": dial tcp 127.0.0.1:6443: connect: connection refused
E1017 08:59:07.674289    1911 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": dial tcp 127.0.0.1:6443: connect: connection refused
E1017 08:59:07.674538    1911 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": dial tcp 127.0.0.1:6443: connect: connection refused
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

Actual behavior:

Pod for kube-apiserver not synced (no current running pod found), retrying"

Additional context / logs:

Oct 17 08:54:47 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:54:47Z" level=info msg="Waiting for API server to become available"
Oct 17 08:54:47 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:54:47Z" level=info msg="Pod for etcd is synced"
Oct 17 08:54:47 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:54:47Z" level=info msg="Pod for kube-apiserver not synced (no current running pod found), retrying"
Oct 17 08:54:52 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:54:52Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Oct 17 08:54:57 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:54:57Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Oct 17 08:55:02 ip-172-31-18-26 rke2[1497]: time="2023-10-17T08:55:02Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"

logs:
journalctl_rke2_server.log
kubelet.log

@brandond
Copy link
Member

It is normal to see that momentarily during startup while RKE2 is waiting for the pod to start. I don't see the message repeated, it looks like the apiserver is now running. You'd need to look at the apiserver pod logs (in /var/log/pods) to see why it's not ready yet.

@IsaSih
Copy link

IsaSih commented Jan 27, 2024

Hello, I'm facing the same situation in a Ubuntu machine (AWS EC2), single server node. After enabling the service and running the systemctl start rke2-server.service command, I see the same error with rke2 version 1.26.11.

Looking into the API server pod logs, I see this error:

2024-01-24T13:35:14.689255139Z stderr F {"level":"info","ts":"2024-01-24T13:35:14.689002Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
2024-01-24T13:35:14.68963836Z stderr F {"level":"warn","ts":"2024-01-24T13:35:14.689444Z","caller":"etcdmain/etcd.go:446","msg":"**found invalid file under data directory","filename":"config","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}**
2024-01-24T13:35:14.689647893Z stderr F {"level":"warn","ts":"2024-01-24T13:35:14.689471Z","caller":"etcdmain/etcd.go:446","msg":"**found invalid file under data directory","filename":"name","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}**
2024-01-24T13:35:14.690816649Z stderr F {"level":"info","ts":"2024-01-24T13:35:14.689519Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://127.0.0.1:2380","https://172.31.5.23:2380"]}

Here are the full log files:

pod.log
rke2-server.log
journalctl-rke2-server.log

Copy link
Contributor

github-actions bot commented Apr 6, 2024

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2024
@andriy-bulynko
Copy link

I have the same issue.

RKE2 Version:

rke2 version v1.30.6+rke2r1 (2959cd2)
go version go1.22.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

Arch: x86_64, OS: Rocky Linux, Version: 8.10 (Green Obsidian)

Cluster Configuration:

Single server

journalctl -u rke2-server -f keeps printing logs like:

Nov 18 18:32:28 rocky3 rke2[5945]: time="2024-11-18T18:32:28-05:00" level=info msg="Waiting for API server to become available"
Nov 18 18:32:28 rocky3 rke2[5945]: time="2024-11-18T18:32:28-05:00" level=info msg="Pod for etcd is synced"
Nov 18 18:32:28 rocky3 rke2[5945]: time="2024-11-18T18:32:28-05:00" level=info msg="Pod for kube-apiserver not synced (pod sandbox not found), retrying"
Nov 18 18:32:28 rocky3 rke2[5945]: time="2024-11-18T18:32:28-05:00" level=info msg="Waiting for API server to become available"
Nov 18 18:32:30 rocky3 rke2[5945]: time="2024-11-18T18:32:30-05:00" level=warning msg="Failed to list nodes with etcd role: runtime core not ready"
Nov 18 18:32:45 rocky3 rke2[5945]: time="2024-11-18T18:32:45-05:00" level=warning msg="Failed to list nodes with etcd role: runtime core not ready"

@brandond
Copy link
Member

Check the kubelet and containerd logs. You might also ensure that your node has sufficient CPU and memory resources available for the kubelet to schedule all the static pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants