workload scheduling

Related terms

Node Selector, Node affinity, Pod Affinity, Taints, Tolerations

Pod scheduling

for a example pod config:

apiVersion: v1
kind: Pod
metadata:
    name: api
spec:
    contianers:
    - name: api
      image: api:v1
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"

when we apply this config, scheduler in control plane:

Filter out node that meet the resource request -> feasible nodes
Score across feasible nodes and choose the highest one
Bind, create the pod with containers in the selected node

In a cluster, there might be:

windows nodes
high-cpu nodes
high-memory nodes
spot nodes
arm nodes
GPU nodes
Storage(SSD) nodes

And for a default cluster, it might not able to decide the best place to run the pods, so we can give k8s some extra instructions to help it.

Node selector

for example: In node config file: kubeletExtraArgs.node-labels = "key0=value0,environment=production,region=us-west"

And In deployment/stateful set/job/other objects: spec.nodeSelector: environment: production

Node affinity

In node config, taints are used to specified the key/value pairs [ref]https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-NodeRegistrationOptions

And In deployment/stateful set/job/other objects: spec.nodeSelector.{[key,value]} For node affinity, there are: affinity.nodeAffinity.requiredDuringSchedulingIgnoreDuringExecution

wont reschedule execution pods
wont schedule pods if not preferences not matched (pending)

affinity.nodeAffinity.preferredDuringSchedulingIgnoreDuringExecution

wont reschedule execution pods
still schedule pods if not preferences not matched

pod affinity

Deploy pods at node as possible to reduce latency if different pods need to communicate with each other In deployment/stateful set/job/other objects: affinity.podAffinity.requiredDuringSchedulingIgnoreDuringExecution affinity.podAffinity.preferredDuringSchedulingIgnoreDuringExecution warning: namespace

pod anti-affinity

Deploy pods at different node as possible to reduce impact of nodes going down, frequently used in deploying Nginx and other ingress controllers In deployment/stateful set/job/other objects: affinity.podAntiAffinity.requiredDuringSchedulingIgnoreDuringExecution affinity.podAntiAffinity.preferredDuringSchedulingIgnoreDuringExecution warning: all nodes should have the same key: topologyKey: "kubernetes.io/hostname"

taints & toleration

For example, when scheduling non control plane pods on control node, you may get warning:

Warning FailedScheduling 4m30s (x4 over 19m) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

indicates that the scheduler is unable to schedule pods because the only available node has a taint that prevents the pods from being scheduled on it. The taint is:

# Taints: key=value:effect
taints:
  - key: "node-role.kubernetes.io/control-plane"
    value: null
    effect: NoSchedule

To resolve this, you can remove the taint

kubectl taint nodes <control-plane-node-name> node-role.kubernetes.io/control-plane:NoSchedule-

or

add a new node

or

Edit the pod spec to add the NoSchedule toleration

In the case of deploying ArgoCD plugin, the best practice is to deploy ArgoCD pods in worker node, ensures the control-plane components are isolated and have dedicated resources.

ref

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduling.md

scheduling.md

workload scheduling

Related terms

Pod scheduling

Node selector

Node affinity

pod affinity

pod anti-affinity

taints & toleration

Files

scheduling.md

Latest commit

History

scheduling.md

File metadata and controls

workload scheduling

Related terms

Pod scheduling

Node selector

Node affinity

pod affinity

pod anti-affinity

taints & toleration