Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster stuck on server provisioning #2361

Closed
nicomfer opened this issue Jan 17, 2022 · 4 comments
Closed

Cluster stuck on server provisioning #2361

nicomfer opened this issue Jan 17, 2022 · 4 comments

Comments

@nicomfer
Copy link

Environmental Info:
RKE2 Version:

v1.21.7+rke2r2
v1.21.8+rke2r2
v1.22.4+rke2r2

Node(s) CPU architecture, OS, and Version:

Ubuntu 18.04 LTS
4.15.0-2000-aws-fips #4-Ubuntu SMP Tue Jan 28 12:41:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 Server node
1 Agent node

Describe the bug:
Cluster is stuck on server provisioning. I'm using cloud-provider-name: aws running in AWS GovCloud region and being deployed using Custom Cluster on Rancher v2.6.3 and bootstraped via user-data. Server node is running with proper IAM profile and policy attached.

Kubelet error:

I0117 07:42:08.757566   27761 kubelet_node_status.go:71] "Attempting to register node" node="ip-10-xxx-x-xx.us-gov-west-1.compute.internal"
E0117 07:42:08.775978   27761 kubelet.go:2412] "Error getting node" err="node \"ip-10-xxx-x-xx.us-gov-west-1.compute.internal\" not found"
E0117 07:42:08.781934   27761 controller.go:144] failed to ensure lease exists, will retry in 7s, error: leases.coordination.k8s.io "ip-10-xxx-x-xx.us-gov-west-1.compute.internal" is forbidden: User "system:node:rke2-master-sharedservices-us-gov-west-1-i-02232adc6218e3e3d.gov" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease": can only access node lease with the same name as the requesting node
E0117 07:42:08.783071   27761 kubelet_node_status.go:93] "Unable to register node with API server" err="nodes \"ip-10-xxx-x-xx.us-gov-west-1.compute.internal\" is forbidden: node \"rke2-master-sharedservices-us-gov-west-1-i-02232adc6218e3e3d.gov\" is not allowed to modify node \"ip-10-xxx-x-xx.us-gov-west-1.compute.internal\"" node="ip-10-xxx-x-xx.us-gov-west-1.compute.internal"

The server is not able register itself

kubectl get nodes
No resources found

Pods are pending to be scheduled.

Warning FailedScheduling 4m4s (x3040 over 2d2h) default-scheduler no nodes available to schedule pods

Steps To Reproduce:
Create AWS instance, attach IAM profile and policy.
Deploy custom cluster from Rancher 2.6.3

curl -fL https://rancher.xxxx.xxxx.com/system-agent-install.sh | sudo  sh -s - --server https://rancher.xxxx.xxxx.com --label 'cattle.io/os=linux' --token xxxxxxxxxxxxxxxxx --etcd --controlplane
  • Installed RKE2:

Expected behavior:
Server node provisioned

Actual behavior:
Server node is not being provisioned

Additional context / logs:
There is a possible workaround passing --node-name $(curl -s curl http://169.254.169.254/latest/meta-data/local-hostname) on server initialization in order to avoid above error described.

@stale
Copy link

stale bot commented Jul 25, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 25, 2022
@stale stale bot closed this as completed Aug 12, 2022
@tbernacchi
Copy link

same here

@tbernacchi tbernacchi reopened this Mar 29, 2023
@stale stale bot removed the status/stale label Mar 29, 2023
@brandond
Copy link
Member

@tbernacchi please don't reopen stale issues without providing any additional info. I'm going to close this out; if you are still experiencing this, please open a new issue and fill out the issue template describing what specifically you're running into.

@rancher rancher locked and limited conversation to collaborators Sep 20, 2023
@brandond
Copy link
Member

brandond commented Sep 20, 2023

E0117 07:42:08.781934 27761 controller.go:144] failed to ensure lease exists, will retry in 7s, error: leases.coordination.k8s.io "ip-10-xxx-x-xx.us-gov-west-1.compute.internal" is forbidden: User "system:node:rke2-master-sharedservices-us-gov-west-1-i-02232adc6218e3e3d.gov" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease": can only access node lease with the same name as the requesting node

For the record, this appears to be caused by the node name not being set consistently; I suspect this was overridden in the config somewhre. The kubelet certificate is using rke2-master-sharedservices-us-gov-west-1-i-02232adc6218e3e3d.gov as its name, which does not appear to be valid. The actual node name expected by the kubelet is ip-10-xxx-x-xx.us-gov-west-1.compute.internal, which matches the format expected by the AWS cloud controller.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants