copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2024-01-18 |
kubernetes, node scaling, ca, autoscaler |
containers |
{{site.data.keyword.attribute-definition-list}}
{: #cluster-scaling-classic-vpc}
[Virtual Private Cloud]{: tag-vpc} [Classic infrastructure]{: tag-classic-inf}
With the cluster-autoscaler
add-on, you can scale the worker pools in your {{site.data.keyword.containerlong}} classic or VPC cluster automatically to increase or decrease the number of worker nodes in the worker pool based on the sizing needs of your scheduled workloads. The cluster-autoscaler
add-on is based on the Kubernetes Cluster-Autoscaler project{: external}. For a list of supported add-on versions by cluster version, see Supported cluster add-on versions.
{: shortdesc}
You can't enable the cluster autoscaler on worker pools that use reservations. {: important}
{: #ca_about}
The cluster autoscaler periodically scans the cluster to adjust the number of worker nodes within the worker pools that it manages in response to your workload resource requests and any custom settings that you configure, such as scanning intervals. Every minute, the cluster autoscaler checks for the following situations. {: shortdesc}
- Pending pods to scale up: A pod is considered pending when insufficient compute resources exist to schedule the pod on a worker node. When the cluster autoscaler detects pending pods, the autoscaler scales up your worker nodes evenly across zones to meet the workload resource requests.
- Underutilized worker nodes to scale down: By default, worker nodes that run with less than 50% of the total compute resources that are requested for 10 minutes or more and that can reschedule their workloads onto other worker nodes are considered underutilized. If the cluster autoscaler detects underutilized worker nodes, it scales down your worker nodes one at a time so that you have only the compute resources that you need. If you want, you can customize the default scale-down utilization threshold of 50% for 10 minutes.
Scanning and scaling up and down happens at regular intervals over time, and depending on the number of worker nodes might take a longer period of time to complete, such as 30 minutes.
The cluster autoscaler adjusts the number of worker nodes by considering the resource requests{: external} that you define for your deployments, not actual worker node usage. If your pods and deployments don't request appropriate amounts of resources, you must adjust their configuration files. The cluster autoscaler can't adjust them for you. Also, keep in mind that worker nodes use some compute resources for basic cluster functionality, default and custom add-ons, and resource reserves. {: note}
What does scaling up and down look like? : In general, the cluster autoscaler calculates the number of worker nodes that your cluster needs to run its workload. Scaling the cluster up or down depends on many factors, including the following. * The minimum and maximum worker node size per zone that you set. * Your pending pod resource requests and certain metadata that you associate with the workload, such as anti-affinity, labels to place pods only on certain flavors, or pod disruption budgets{: external}. * The worker pools that the cluster autoscaler manages, potentially across zones in a multizone cluster.
For more information, see the Kubernetes Cluster Autoscaler FAQs for How does scale-up work?{: external} and How does scale-down work?{: external}.
Can I change how scale-up and scale-down work? : You can customize settings or use other Kubernetes resources to affect how scaling up and down work.
Scale-up
: Customize the cluster autoscaler ConfigMap values such as scanInterval
, expander
, skipNodes
, or maxNodeProvisionTime
. Review ways to overprovision worker nodes so that you can scale up worker nodes before a worker pool runs out of resources. You can also set up Kubernetes pod budget disruptions and pod priority cutoffs to affect how scaling up works.
Scale-down
: Customize the cluster autoscaler ConfigMap values such as scaleDownUnneededTime
, scaleDownDelayAfterAdd
, scaleDownDelayAfterDelete
, or scaleDownUtilizationThreshold
.
Can I increase the minimum size per zone to trigger a scale up my cluster to that size?
: No, setting a minSize
does not automatically trigger a scale-up. The minSize
is a threshold so that the cluster autoscaler does not scale to fewer than a certain number of worker nodes per zone. If your cluster does not yet have that number per zone, the cluster autoscaler does not scale up until you have workload resource requests that require more resources. For example, if you have a worker pool with one worker node per three zones (three total worker nodes) and set the minSize
to 4
per zone, the cluster autoscaler does not immediately provision an additional three worker nodes per zone (12 worker nodes total). Instead, the scale-up is triggered by resource requests. If you create a workload that requests the resources of 15 worker nodes, the cluster autoscaler scales up the worker pool to meet this request. Now, the minSize
means that the cluster autoscaler does not scale down to fewer than four worker nodes per zone even if you remove the workload that requests the amount.
How is this behavior different from worker pools that are not managed by the cluster autoscaler? : When you create a worker pool, you specify how many worker nodes per zone it has. The worker pool maintains that number of worker nodes until you resize or rebalance it. The worker pool does not add or remove worker nodes for you. If you have more pods than can be scheduled, the pods remain in pending state until you resize the worker pool. When you enable the cluster autoscaler for a worker pool, worker nodes are scaled up or down in response to your pod spec settings and resource requests. You don't need to resize or rebalance the worker pool manually.
{: #scalable-practices}
Make the most out of the cluster autoscaler by using the following strategies for your worker node and workload deployment strategies. For more information, see the Kubernetes Cluster Autoscaler FAQs{: external}. {: shortdesc}
Try out the cluster autoscaler with a few test workloads to get a good feel for how scale-up and scale-down work, you might want to configure, and any other aspects that you might want, like overprovisioning worker nodes or limiting apps. Then, clean up your test environment and plan to include these custom values and additional settings with a fresh installation of the cluster autoscaler.
{: #scalable-practices-multiple}
Yes, after you install the cluster autoscaler, you can choose which worker pools within the cluster to autoscale in the ConfigMap. You can run only one autoscaler per cluster. Create and enable autoscaling on worker pools other than the default worker pool, because the default worker pool has system components that can prevent automatically scaling down. {: shortdesc}
{: #scalable-practices-resrequests}
The cluster autoscaler scales your cluster in response to your workload resource requests{: external}. As such, specify resource requests{: external} for all your deployments because the resource requests are what the cluster autoscaler uses to calculate how many worker nodes are needed to run the workload. Keep in mind that autoscaling is based on the compute usage that your workload configurations request, and does not consider other factors such as machine costs. {: shortdesc}
{: #scalable-practices-zero}
No, you can't set the cluster autoscaler minSize
to 0
. Additionally, unless you disable all public application load balancers (ALBs) in each zone of your cluster, you must change the minSize
to 2
worker nodes per zone so that the ALB pods can be spread for high availability. Additionally, you can taint your worker pool to achieve a scale to down a minimum of 1
.
{: shortdesc}
If your worker pool has zero (0) worker nodes, the worker pool can't be scaled. Disable cluster autoscaling for the worker pool, manually resize the worker pool to at least one, and re-enable cluster autoscaling.
{: #scalable-practices-apps}
Yes, you can add several Kubernetes features to your deployment to adjust how the cluster autoscaler considers your resource requests for scaling. {: shortdesc}
- Taint your worker pool to allow only the deployments or pods with the matching toleration to be deployed to your worker pool.
- Add a label to your worker pool other than the default worker pool. This label is used in your deployment configuration to specify
nodeAffinity
ornodeSelector
which limits the workloads that can be deployed on the worker nodes in the labeled worker pool. - Use pod disruption budgets{: external} to prevent abrupt rescheduling or deletions of your pods.
- If you're using pod priority, you can edit the priority cutoff{: external} to change what types of priority trigger scaling up. By default, the priority cutoff is zero (
0
).
{: #scalable-practices-taints}
Yes, but make sure to apply taints at the worker pool level so that all existing and future worker nodes get the same taint. Then, you must include a matching toleration in your workload configuration{: external} so that these workloads are scheduled onto your autoscaled worker pool with the matching taint. Keep in mind that if you deploy a workload that is not tolerated by the tainted worker pool, the worker nodes are not considered for scale-up and more worker nodes might be ordered even if the cluster has sufficient capacity. However, the tainted worker pool is still identified as underutilized if they have less than the threshold (by default 50%) of their resources utilized and thus are considered for scale-down. {: shortdesc}
{: #ca_update_worker_node_pool}
Before you can rebalance or resize your worker pool, you must remove the worker pool from the autoscaler configmap to disable autoscaling. {: shortdesc}
-
Edit
iks-ca-configmap
and disable the worker pool that you want to resize or rebalance by removing it from theworkerPoolsConfig.json
section.kubectl edit cm -n kube-system iks-ca-configmap
{: pre}
Example output
apiVersion: v1 data: workerPoolsConfig.json: | [ {"name": "","minSize": 1,"maxSize": 2,"enabled":false} ] kind: ConfigMap
{: screen}
-
Save the
iks-ca-configmap
. -
Optional Update your VPC worker nodes.
-
Add the worker pool to the
iks-ca-configmap
.kubectl edit cm -n kube-system iks-ca-configmap
{: pre}
Example
apiVersion: v1 data: workerPoolsConfig.json: | [ {"name": "<worker_pool>","minSize": 1,"maxSize": 2,"enabled":false} ] kind: ConfigMap
{: screen}
{: #prepare-autoscale-classic-vpc}
Before you install the {{site.data.keyword.cloud_notm}} cluster autoscaler add-on, you can set up your cluster to prepare the cluster for autoscaling. {: shortdesc}
The cluster autoscaler add-on is not supported for baremetal worker nodes. {: important}
-
Before you begin, Install the required CLI and plug-ins.
- {{site.data.keyword.cloud_notm}} CLI (
ibmcloud
) - {{site.data.keyword.containerlong_notm}} plug-in (
ibmcloud ks
) - {{site.data.keyword.registrylong_notm}} plug-in (
ibmcloud cr
) - Kubernetes (
kubectl
)
- {{site.data.keyword.cloud_notm}} CLI (
-
Confirm that your {{site.data.keyword.cloud_notm}} Identity and Access Management credentials are stored in the cluster. The cluster autoscaler uses this secret to authenticate credentials. If the secret is missing, create it by resetting credentials.
kubectl get secrets -n kube-system | grep storage-secret-store
{: pre}
-
Plan to autoscale a worker pool other than the
default
worker pool, because thedefault
worker pool has system components that can prevent automatically scaling down. Include a label for the worker pool so that you can set node affinity{: external} for the workloads that you want to deploy to the worker pool that has autoscaling enabled. For example, your label might beapp: nginx
. Choose from the following options:- Create a VPC or classic worker pool other than the
default
worker pool with the label that you want to use with the workloads to run on the autoscaled worker pool. - Add the label to an existing worker pool other than the
default
worker pool.
- Create a VPC or classic worker pool other than the
-
Confirm that your worker pool has the necessary labels for autoscaling. In the output, you see the required
ibm-cloud.kubernetes.io/worker-pool-id
label and the label that you previously created for node affinity. If you don't see these labels, add a worker pool, then add your label for node affinity.ibmcloud ks worker-pool get --cluster <cluster_name_or_ID> --worker-pool <worker_pool_name_or_ID> | grep Labels
{: pre}
Example output of a worker pool with the label.
Labels: ibm-cloud.kubernetes.io/worker-pool-id=a1aa111111b22b22cc3c3cc444444d44-4d555e5
{: screen}
-
Taint the worker pools that you want to autoscale so that the worker pool does not accept workloads except the ones that you want to run on the autoscaled worker pool. You can learn more about taints and tolerations in the community Kubernetes documentation{: external}. As an example, you might set a taint of
use=autoscale:NoExecute
. In this example, theNoExecute
taint evicts pods that don't have the toleration corresponding to this taint.
After install the add-on, update the ConfigMap values to enable autoscaling in your cluster.