Device Resources Not Reflected in Allocatable Count After Pod Creation #11377

RonaldFletcher · 2024-11-27T11:11:15Z

Environmental Info:

K3s Version:

k3s version v1.28.14+k3s1 (3ef2bdb)

Environment="K3S_ARGS= --cluster-cidr=10.44.0.0/16 --disable-network-policy --disable-cloud-controller --disable=local-storage,servicelb,traefik --kubelet-arg=eviction-hard=memory.available<5%,nodefs.available<10% --kubelet-arg=kube-reserved=cpu=200m,memory=250Mi,ephemeral-storage=1Gi --kubelet-arg=system-reserved=cpu=200m,memory=250Mi,ephemeral-storage=1Gi --kube-proxy-arg=proxy-mode=ipvs

Describe the bug:

After creating and running a Pod with resource requests, the allocatable device count on the node does not change as expected, even though the Pod is running successfully. Additionally, when attempting to create another Pod requesting the same resources, the new Pod remains in a Pending state.

pod resources detals :

    imagePullPolicy: IfNotPresent
    name: compute
    resources:
      limits:
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        nvidia.com/TU104GL_TESLA_T4: "1"
      requests:
        cpu: 200m
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        ephemeral-storage: 50M
        memory: "5439885824"
        nvidia.com/TU104GL_TESLA_T4: "1"

node details:

    allocatable:
      cpu: 47600m
      memory: "127036041936"
      nvidia.com/TU104GL_TESLA_T4: "1"

    capacity:
      cpu: "48"
      memory: 131126984Ki
      nvidia.com/TU104GL_TESLA_T4: "1"

Steps To Reproduce:

Installed K3s:

Expected behavior:

    allocatable:
      cpu: 47600m
      memory: "127036041936"
      nvidia.com/TU104GL_TESLA_T4: "0 "

    capacity:
      cpu: "48"
      memory: 131126984Ki
      nvidia.com/TU104GL_TESLA_T4: "1"

Actual behavior:
The allocatable resources on the node did not change after the first Pod was running successfully. The second Pod, which requested the same resources, remained in a Pending state, suggesting that the resources were not available, but the allocatable count did not reflect this.

The text was updated successfully, but these errors were encountered:

brandond · 2024-11-27T11:37:18Z

I don't see any evidence that this is a k3s issue. None of the affected code lives in this project. Have you verified that this only affects k3s?

Most likely you should report this to the upstream Kubernetes project, or Nvidia operator. Probably the latter since that is what is responsible for tracking the GPU resource.

brandond · 2024-11-27T11:41:51Z

That's said, I don't think that is how the allocatable resources work. If I remember correctly, it is not a measure of what remains after resources currently in use by pods are subtracted. It the total usable by the kubelet, after the kubelet and system reserved resources are subtracted.

You should read the Kubernetes docs for more information on what these fields indicate.

github-project-automation bot added this to K3s Development Nov 27, 2024

github-project-automation bot moved this to New in K3s Development Nov 27, 2024

brandond closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024

github-project-automation bot moved this from New to Done Issue in K3s Development Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device Resources Not Reflected in Allocatable Count After Pod Creation #11377

Device Resources Not Reflected in Allocatable Count After Pod Creation #11377

RonaldFletcher commented Nov 27, 2024 •

edited

Loading

brandond commented Nov 27, 2024 •

edited

Loading

brandond commented Nov 27, 2024 •

edited

Loading

Device Resources Not Reflected in Allocatable Count After Pod Creation #11377

Device Resources Not Reflected in Allocatable Count After Pod Creation #11377

Comments

RonaldFletcher commented Nov 27, 2024 • edited Loading

brandond commented Nov 27, 2024 • edited Loading

brandond commented Nov 27, 2024 • edited Loading

RonaldFletcher commented Nov 27, 2024 •

edited

Loading

brandond commented Nov 27, 2024 •

edited

Loading

brandond commented Nov 27, 2024 •

edited

Loading