Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent config retrieval can cause excessive CPU load on servers during bulk restart of nodes #8862

Closed
brandond opened this issue Nov 15, 2023 · 2 comments
Assignees

Comments

@brandond
Copy link
Member

brandond commented Nov 15, 2023

K3s tracking issue for:

Improvements could include:

  • Agent config retrieval should jitter to avoid hammering
  • Node password secret retrieval should pull from cache if possible
@brandond
Copy link
Member Author

With regards for what to test test - I think just confirming that the agent retries joins at non-fixed intervals (somewhere between 5-10 seconds, instead of exactly 5 seconds) is sufficient.

@endawkins
Copy link

endawkins commented Nov 30, 2023

Validated on branch master with commit 3f23723 / version 1.28

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-4-130 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222) x86_64 x86_64 x86_64 GNU/Linux
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL="https://documentation.suse.com/"

Cluster Configuration:

3 servers
3 agents

Additional files

N/A

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
  2. Reboot AWS worker instances
  3. Verify the worker instances come back up

Validation Results:

  • k3s version used for validation:
k3s -v
k3s version v1.28.4-rc1+k3s1 (3f237230)
go version go1.20.11
NAME               STATUS     ROLES                       AGE    VERSION            INTERNAL-IP     EXTERNAL-IP      OS-IMAGE                              KERNEL-VERSION              CONTAINER-RUNTIME
ip-172-31-0-101    Ready      control-plane,etcd,master   114m   v1.28.4-rc1+k3s1   172.31.0.101    18.118.9.118     SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-1-190    Ready      control-plane,etcd,master   117m   v1.28.4-rc1+k3s1   172.31.1.190    3.142.53.151     SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-1-241    NotReady   <none>                      114m   v1.28.4-rc1+k3s1   172.31.1.241    18.118.16.157    SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-11-162   NotReady   <none>                      113m   v1.28.4-rc1+k3s1   172.31.11.162   3.142.55.21      SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-15-22    NotReady   <none>                      113m   v1.28.4-rc1+k3s1   172.31.15.22    52.14.215.111    SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-8-134    Ready      control-plane,etcd,master   115m   v1.28.4-rc1+k3s1   172.31.8.134    18.119.235.227   SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1

NAME               STATUS   ROLES                       AGE    VERSION            INTERNAL-IP     EXTERNAL-IP      OS-IMAGE                              KERNEL-VERSION              CONTAINER-RUNTIME
ip-172-31-0-101    Ready    control-plane,etcd,master   116m   v1.28.4-rc1+k3s1   172.31.0.101    18.118.9.118     SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-1-190    Ready    control-plane,etcd,master   118m   v1.28.4-rc1+k3s1   172.31.1.190    3.142.53.151     SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-1-241    Ready    <none>                      115m   v1.28.4-rc1+k3s1   172.31.1.241    18.118.16.157    SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-11-162   Ready    <none>                      114m   v1.28.4-rc1+k3s1   172.31.11.162   3.142.55.21      SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-15-22    Ready    <none>                      114m   v1.28.4-rc1+k3s1   172.31.15.22    52.14.215.111    SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1
ip-172-31-8-134    Ready    control-plane,etcd,master   116m   v1.28.4-rc1+k3s1   172.31.8.134    18.119.235.227   SUSE Linux Enterprise Server 15 SP4   5.14.21-150400.22-default   containerd://1.7.7-k3s1

Additional context / logs:

N/A

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants