Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Timed out waiting for the condition" when binding task volumes #4050

Open
lubex96 opened this issue Feb 27, 2025 · 5 comments
Open

"Timed out waiting for the condition" when binding task volumes #4050

lubex96 opened this issue Feb 27, 2025 · 5 comments
Labels
kind/question Categorizes issue related to a new question
Milestone

Comments

@lubex96
Copy link

lubex96 commented Feb 27, 2025

Please describe your problem in detail

Hello,

I'm experiencing an issue with Volcano version 1.8.0 when a PodGroup task references a PVC that has its volume dynamically created by a CSI driver.

Environment:

  • Cluster provider: AWS / eu-central-1
  • Kubernetes version: 1.30.5
  • Volcano version: v1.8.0

The PVC is created prior the VolcanoJob is launched in the Kubernetes cluster. It uses a StorageClass that has the "WaitForFirstConsumer" policy.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
    volume.kubernetes.io/selected-node: ip-172-30-3-23.eu-central-1.compute.internal
    volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
  creationTimestamp: "2025-02-27T14:42:39Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    spark-app-selector: fca6f8d4-3cd2-44b5-9a8c-da840123d24c
  name: 8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2
  namespace: 882d154f-757f-42e3-bc40-4ffbbfb04dd3
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Pod
    name: 8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0
    uid: 1ea14a33-78f4-41ed-bd4e-21f1b9cc9403
  resourceVersion: "587063403"
  uid: 041cee8f-83a0-4ae4-b5fe-4c9ed4e94b13
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: hc-default
  volumeMode: Filesystem
  volumeName: pv-shoot--hc-canary--prod-bds-041cee8f-83a0-4ae4-b5fe-4c9ed4e94b13
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  phase: Bound

Volcano seems to have a pretty aggressive timeout parameter when waiting for the volumes to be bound. After finding a node for the task, it waits for the volume to be bound, and timeouts few seconds later. Here are the observed events:

1 - Volcano finds a node for the task, and starts the binding process:

I0227 11:17:49.455862       1 event_handlers.go:229] Added pod <882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2> into cache.

I0227 11:17:50.589825       1 allocate.go:248] Binding Task <882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2> to node <ip-172-30-1-174.eu-central-1.compute.internal>

2 - Kubernetes event for "PV provisioning" is observed few seconds after that, thanks to the "WaitForFirstConsumer" SC policy:

2025-02-27 11:18:14.000
"involvedObject": {
      "kind": "PersistentVolumeClaim",
      "namespace": "882d154f-757f-42e3-bc40-4ffbbfb04dd3",
      "name": "8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2-pvc-0",
      "uid": "41a45fee-47af-451e-ba57-1907915fcf18",
      "apiVersion": "v1",
      "resourceVersion": "586057719"
    },
    "reason": "ExternalProvisioning",
    "message": "Waiting for a volume to be created either by the external provisioner 'ebs.csi.aws.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.",
    "source": {
      "component": "persistentvolume-controller"
    },

3 - At 11:18:20, Volcano reports a timeout for the volume binding process:


2025-02-27 11:18:20.760 | E0227 11:18:20.759620       1 cache.go:1042] task 882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2 bind Volumes failed: &fmt.wrapError{msg:"binding volumes: timed out waiting for the condition", err:(*errors.errorString)(0x341f790)} |  
2025-02-27 11:18:20.760 | I0227 11:18:20.759662       1 cache.go:305] Revert assumed volumes for task 882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2 on node ip-172-30-1-174.eu-central-1.compute.internal

4 - But few seconds later, at 11:18:23, we notice that the external volume provisioner (AWS EBS) was able to provision the volume:

11:18:23
"involvedObject": {
      "kind": "PersistentVolumeClaim",
      "namespace": "882d154f-757f-42e3-bc40-4ffbbfb04dd3",
      "name": "8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2-pvc-0",
      "uid": "41a45fee-47af-451e-ba57-1907915fcf18",
      "apiVersion": "v1",
      "resourceVersion": "586057719"
    },
    "reason": "ProvisioningSucceeded",
    "message": "Successfully provisioned volume pv-shoot--hc-canary--prod-bds-41a45fee-47af-451e-ba57-1907915fcf18",
    "source": {
      "component": "ebs.csi.aws.com_csi-driver-controller-7784cc5bcc-42n4t_c107abf2-b060-49a5-9d26-4282b83a0529"
    },

If Volcano had waited a few seconds more, I believe the "binding volumes" would succeed for the task. Notice that the external provisioner doesn't even take too long to provision the PV, only ~9s.

5 - Thanks to this timeout, and also thanks to the number of VcJobs in the corresponding queue, fair scheduling, etc., Volcano "discards" the task, and only picks it up again several minutes later, at 11:43:58.

2025-02-27 11:43:58.960 | I0227 11:43:58.960498       1 allocate.go:248] Binding Task <882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2> to node <ip-172-30-1-174.eu-central-1.compute.internal> |  

2025-02-27 11:43:58.960 | I0227 11:43:58.960531       1 statement.go:266] After allocated Task <882d154f-757f-42e3-bc40-4ffbbfb04dd3/8a4e914a-0f84-4f65-b269-458f41cd2269-1740654350-driver-0-exec-2> to Node <ip-172-30-1-174.eu-central-1.compute.internal>: idle <cpu 2835.00, memory 73648390146.00, ephemeral-storage 99128253772000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, used <cpu 13085.00, memory 58710949886.00>, releasing <cpu 0.00, memory 0.00>

From what I understand, if the binding volume fails for the task, it gets unassigned from that node, and has to wait for another scheduling loop to be allocated to another node, is that correct?

6 - Finally, the pod can be started in the node after that second attempt.

My question is: can we tweak the "timeout parameter" that Volcano uses to assert the binding of task volumes? I checked the code, and it seems to be 30s by default:

30*time.Second,

Some further questions that I would like to check with Volcano experts:

  • If we increased that to a bigger period, would it help to withstand extended provisioning times from the external provisioner?
  • Do we have something to address this situation in more recent versions of Volcano? (e..g, v1.11.0)
  • Would the enablement of the CSI feature gate help in this situation as well ?
  • Finally, do you have any good practices or recommendations for the situation where a task scheduled by Volcano waits for a PV to be created?

Any other relevant information

No response

@lubex96 lubex96 added the kind/question Categorizes issue related to a new question label Feb 27, 2025
@lubex96 lubex96 changed the title "Timeout waiting for the condition" when binding task volumes "Timed out waiting for the condition" when binding task volumes Feb 27, 2025
@Monokaix
Copy link
Member

I think it's a good catch for custom bind tiemout setting as the actual bind time is relevant to the underly storage: )
Fortunately we are prepared to refactor the volumebinding logic in next version, see #3618 for more details.

Autually volcano schedules and retries every reverted task very fast, so it will not be a bottleneck when the bind time is too small, I think that just becasues the provison pv time is exactly the same as the bind timeout time, causing the task to end first, and then the task is processed in the next scheduling cycle.

Hope these information can help you.

@Monokaix
Copy link
Member

/milestone v1.12

@Monokaix Monokaix added this to the v1.12 milestone Feb 28, 2025
@lubex96
Copy link
Author

lubex96 commented Feb 28, 2025

Thanks @Monokaix !

I think that just becasues the provison pv time is exactly the same as the bind timeout time, causing the task to end first

Exactly, that was my impression as well, if the volume binding done by the external provisioner exceeds the 30s default timeout, then Volcano considers the volume binding to be unsuccessful, and retries again in the next scheduling session.

Fortunately we are prepared to refactor the volumebinding logic in next version, see #3618 for more details.

While we wait for v1.12, would it be advisable for us to change the bind timeout from 30s to a larger period? We can fork volcano in our project and manually change it while we wait for the volume binding refactoring.

@Monokaix
Copy link
Member

Monokaix commented Mar 4, 2025

Thanks @Monokaix !

I think that just becasues the provison pv time is exactly the same as the bind timeout time, causing the task to end first

Exactly, that was my impression as well, if the volume binding done by the external provisioner exceeds the 30s default timeout, then Volcano considers the volume binding to be unsuccessful, and retries again in the next scheduling session.

Fortunately we are prepared to refactor the volumebinding logic in next version, see #3618 for more details.

While we wait for v1.12, would it be advisable for us to change the bind timeout from 30s to a larger period? We can fork volcano in our project and manually change it while we wait for the volume binding refactoring.

I think it's ok.

@JesseStutler
Copy link
Member

Great scenario! volume binding has to refactor in this version, this scenario is a good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Categorizes issue related to a new question
Projects
None yet
Development

No branches or pull requests

3 participants