-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to install RKE2 on Amazon Linux 2023 #4527
Comments
We don't technically support Amazon Linux at this time. Ref: https://docs.rke2.io/install/requirements#linux |
Hey @brandond, definitely understand and didn't expect it to work without any troubleshooting or fixes! It's may be referenced internally with a little bit more discussion due to some of our government customers. |
Forgot to tag you earlier... @dweomer |
For the various log files that "do not appear to have any useful information", can you attach them anyway? Along with whatever is in |
For sure... I didn't want to overcrowd the GH Issue. I'll attach them now. |
Aug 01 03:10:45 ip-172-31-42-40.ec2.internal rke2[26348]: {"level":"warn","ts":"2023-08-01T03:10:45.436Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000f0c540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
Aug 01 03:10:45 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:45Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Aug 01 03:10:46 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:46Z" level=info msg="Container for etcd not found (no matching container found), retrying"
Aug 01 03:10:46 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:46Z" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Aug 01 03:10:50 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:50Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Aug 01 03:10:51 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:51Z" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Aug 01 03:10:55 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:55Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Aug 01 03:10:56 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:56Z" level=info msg="Waiting for etcd server to become available"
Aug 01 03:10:56 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:56Z" level=info msg="Waiting for API server to become available"
Aug 01 03:10:56 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:10:56Z" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Aug 01 03:11:00 ip-172-31-42-40.ec2.internal rke2[26348]: time="2023-08-01T03:11:00Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
E0801 03:12:13.723095 26379 event.go:276] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ip-172-31-42-40.ec2.internal.1777230d62cd7dd6", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-172-31-42-40.ec2.internal", UID:"ip-172-31-42-40.ec2.internal", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node ip-172-31-42-40.ec2.internal status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"ip-172-31-42-40.ec2.internal"}, FirstTimestamp:time.Date(2023, time.August, 1, 2, 58, 45, 500091862, time.Local), LastTimestamp:time.Date(2023, time.August, 1, 2, 58, 45, 579890048, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://127.0.0.1:6443/api/v1/namespaces/default/events/ip-172-31-42-40.ec2.internal.1777230d62cd7dd6": dial tcp 127.0.0.1:6443: connect: connection refused'(may retry after sleeping)
E0801 03:12:13.758179 26379 kubelet.go:2448] "Error getting node" err="node \"ip-172-31-42-40.ec2.internal\" not found"
time="2023-08-01T03:13:11.996045748Z" level=info msg="cleaning up dead shim"
time="2023-08-01T03:13:12.011302269Z" level=warning msg="cleanup warnings time=\"2023-08-01T03:13:12Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=30134 runtime=io.containerd.runc.v2\ntime=\"2023-08-01T03:13:12Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ff527d466f55f3f472de15bd68b23fdc4db058e9eaa4a1190c96d95576acc2cc/init.pid: no such file or directory\" runtime=io.containerd.runc.v2\n"
time="2023-08-01T03:13:12.011502689Z" level=error msg="copy shim log" error="read /proc/self/fd/20: file already closed"
time="2023-08-01T03:13:12.015109397Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-ip-172-31-42-40.ec2.internal,Uid:e18aa5e5b83a5a3c56d78e4054612394,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown"
time="2023-08-01T03:13:22.700218436Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-ip-172-31-42-40.ec2.internal,Uid:e18aa5e5b83a5a3c56d78e4054612394,Namespace:kube-system,Attempt:0,}"
time="2023-08-01T03:13:22.726881131Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2023-08-01T03:13:22.726961198Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2023-08-01T03:13:22.726976863Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-08-01T03:13:22.727130660Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/4248b813ddeb9cad229cb501b5e5dddc68efa178880387b81f08715a769d3706 pid=30154 runtime=io.containerd.runc.v2
time="2023-08-01T03:13:23.105979383Z" level=info msg="shim disconnected" id=4248b813ddeb9cad229cb501b5e5dddc68efa178880387b81f08715a769d3706
time="2023-08-01T03:13:23.106031080Z" level=warning msg="cleaning up after shim disconnected" id=4248b813ddeb9cad229cb501b5e5dddc68efa178880387b81f08715a769d3706 namespace=k8s.io
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
kube-system_etcd-ip-172-31-42-40.ec2.internal_e18aa5e5b83a5a3c56d78e4054612394 |
Can you attach (not paste inline) the full contents, not just the last few lines? Same for the pod logs - just knowing that there is a log file doesn't help me much. |
I definitely missed the attach part of your previous comment. No logs in Logs: kubelet.log | containerd.log | journalctl.txt |
From your containerd log file:
A little searching brings up #1539 (comment) and the following comments, which suggests that the rke2-selinux package you chose to install is not compatible with the version of container-selinux that Amazon Linux is providing. Did you see and ignore any errors from the postinstall script when installing the EL7 rke2-selinux package? Have you tried the EL8 or EL9 RPMs from https://github.com/rancher/rke2-selinux/releases/tag/v0.14.stable.1 ? |
I let the install script handle it, I didn't install anything manually. I don't remember seeing any errors with the |
Additionally, upgrading container-selinux to container-selinux v2.205.0 should work, if the Errors when running [root@ip-172-31-39-26 yum.repos.d]# curl -sfL https://get.rke2.io | sh
[INFO] finding release for channel stable
[INFO] using 1.25 series from channel stable
Rancher RKE2 Common Latest 5.3 kB/s | 2.6 kB 00:00
Rancher RKE2 1.18 Latest 12 kB/s | 6.0 kB 00:00
Rancher RKE2 Common (stable) 13 kB/s | 2.9 kB 00:00
Rancher RKE2 1.25 (stable) 13 kB/s | 2.9 kB 00:00
Error:
Problem: package rke2-server-1.25.12~rke2r1-0.el7.x86_64 requires rke2-common = 1.25.12~rke2r1-0.el7, but none of the providers can be installed
- package rke2-common-1.25.12~rke2r1-0.el7.x86_64 requires rke2-selinux >= 0.12-0, but none of the providers can be installed
- conflicting requests
- nothing provides container-selinux < 2:2.164.2 needed by rke2-selinux-0.12-1.el7.noarch
- nothing provides container-selinux < 2:2.164.2 needed by rke2-selinux-0.13-1.el7.noarch
- nothing provides container-selinux < 2:2.164.2 needed by rke2-selinux-0.14-1.el7.noarch |
That sounds about right. Glad to know it works when you install the correct selinux package. I don't believe we have any plans to support Amazon Linux, but we can leave this issue open so the next person who tries can find your steps. |
Appreciate the help working through it! It's nice to have a workaround for it. |
AL2023 Package Request: amazonlinux/amazon-linux-2023#409 |
Greetings from Amazon Linux land: it's important to note that AL2023 is not a CentOS clone, and does not claim any level of compatibility with any particular version of CentOS, thus using el7, el8, or el9 RPMs is going to be an uphill battle for anything non-trivial. If you're looking to build packages for AL2023, you can do so the standard way ( Happy to chat as to what the build requirements could be to enable builds of AL2023 packages. |
Hey @stewartsmith, I saw your comment on the other issue (amazonlinux/amazon-linux-2023#409) and replied to it a few minutes ago. Apologies for not seeing this comment! I definitely understand that |
Validated on Version:-$ rke2 version v1.28.4+dev.c1494f5d (c1494f5de1d2f6ae26cbb7d8ec365344dc1209d8)
Environment DetailsInfrastructure Node(s) CPU architecture, OS, and Version: NAME="Amazon Linux" Cluster Configuration: Steps to validate the fix
Validation Results:
|
Environmental Info:
RKE2 Version: v1.25.12+rke2v1
Node(s) CPU architecture, OS, and Version: Amazon Linux 2023 (AL2023) with
ami-0f34c5ae932e6f0e4
Cluster Configuration: Single Node (testing purposes)
Describe the bug: Unable to the download, install, or activate RKE2 on Amazon Linux 2023 (AL2023).
Steps To Reproduce:
OR
After editing Line 477 on the install.sh script to include
[ -r /etc/amazon-linux-release ] ||
or creating a file at/etc/centos-release
, RKE2 will successfully download and install necessary requirements, but versions down tov1.25.4+rke2v1
andel7
when I would expect AL2023 to be more similar toel8
.After this change, upon activating RKE2 with
systemctl start rke2-server
, it fails and does not produce any useful troubleshooting information.Expected behavior: Download, Install, and Activate RKE2 on Amazon Linux 2023 (AL2023).
Actual behavior: RKE2 fails and errors when downloading, install, and activating on Amazon Linux 2023 (AL2023).
Additional context / logs:
journalctl -xefu rke2-server
does not produce any useful information (500 Internal Server Error)./var/lib/rancher/rke2/agent/logs/kubelet.log
does not appear to have any useful information./var/lib/rancher/rke2/agent/containerd/containerd.log
does not appear to have any useful information./var/lib/rancher/rke2/bin/crictl ps
does not appear to have any useful information.The text was updated successfully, but these errors were encountered: