Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PVC Resize Fails with "juicefs[31] <FATAL>: no such file or directory [main.go:31]" #1254

Closed
traviswt opened this issue Jan 23, 2025 · 8 comments
Labels
kind/bug Something isn't working

Comments

@traviswt
Copy link

What happened: Create a PVC, the PV is provisioned and bound to the PVC. Increase the size of the PVC, apply, and the resize fails over and over with errors during the "/usr/local/bin/juicefs quota set" command that "juicefs[31] : no such file or directory [main.go:31]"

What you expected to happen: PVC resized to new quota and associated PV is larger.

How to reproduce it (as minimally and precisely as possible):

kubectl create ns testing
echo "apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: temp001
  namespace: testing
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: juicefs-delete
  volumeMode: Filesystem" | kubectl apply -f -

Wait until PV is bound to PVC
Update PVC

echo "apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: temp001
  namespace: testing
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  storageClassName: juicefs-delete
  volumeMode: Filesystem" | kubectl apply -f -

Describe the PVC and you will see the following errors:

kubectl -n testing describe pvc/temp001
Name:          temp001
Namespace:     testing
StorageClass:  juicefs-delete
Status:        Bound
Volume:        pvc-00f1aca4-80e1-49fb-8fde-7f3baca76fb7
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.juicefs.com
               volume.kubernetes.io/storage-provisioner: csi.juicefs.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       <none>
Conditions:
  Type       Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----       ------  -----------------                 ------------------                ------  -------
  Resizing   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 23 Jan 2025 15:25:47 -0800           
Events:
  Type     Reason                 Age                    From                                                                           Message
  ----     ------                 ----                   ----                                                                           -------
  Normal   Provisioning           5m53s                  csi.juicefs.com_juicefs-csi-controller-0_f565e782-8f19-4181-b44a-a68db6c1f878  External provisioner is provisioning volume for claim "testing/temp001"
  Normal   ExternalProvisioning   5m53s (x2 over 5m53s)  persistentvolume-controller                                                    Waiting for a volume to be created either by the external provisioner 'csi.juicefs.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal   ProvisioningSucceeded  5m53s                  csi.juicefs.com_juicefs-csi-controller-0_f565e782-8f19-4181-b44a-a68db6c1f878  Successfully provisioned volume pvc-00f1aca4-80e1-49fb-8fde-7f3baca76fb7
  Warning  ExternalExpanding      32s                    volume_expand                                                                  waiting for an external controller to expand this PVC
  Warning  VolumeResizeFailed     32s                    external-resizer csi.juicefs.com                                               resize volume "pvc-00f1aca4-80e1-49fb-8fde-7f3baca76fb7" by resizer "csi.juicefs.com" failed: rpc error: code = Internal desc = set quota: 2025/01/23 23:25:25.853598 juicefs[210] <INFO>: Meta address: rediss://:****@sm-master.juicefs-system.svc.cluster.local:6379/1 [interface.go:504]
2025/01/23 23:25:25.877153 juicefs[210] <INFO>: Ping redis latency: 592.717µs [redis.go:3515]
2025/01/23 23:25:25.878480 juicefs[210] <FATAL>: no such file or directory [main.go:31]
: exit status 1
  Warning  VolumeResizeFailed  32s  external-resizer csi.juicefs.com  resize volume "pvc-00f1aca4-80e1-49fb-8fde-7f3baca76fb7" by resizer "csi.juicefs.com" failed: rpc error: code = Internal desc = set quota: 2025/01/23 23:25:25.940762 juicefs[226] <INFO>: Meta address: rediss://:****@sm-master.juicefs-system.svc.cluster.local:6379/1 [interface.go:504]
2025/01/23 23:25:25.971914 juicefs[226] <INFO>: Ping redis latency: 562.143µs [redis.go:3515]
2025/01/23 23:25:25.973157 juicefs[226] <FATAL>: no such file or directory [main.go:31]
: exit status 1

Anything else we need to know?

this is the storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: juicefs-delete
parameters:
  csi.storage.k8s.io/controller-expand-secret-name: juicefs-delete-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: juicefs-system
  csi.storage.k8s.io/node-publish-secret-name: juicefs-delete-secret
  csi.storage.k8s.io/node-publish-secret-namespace: juicefs-system
  csi.storage.k8s.io/provisioner-secret-name: juicefs-delete-secret
  csi.storage.k8s.io/provisioner-secret-namespace: juicefs-system
  juicefs/mount-cpu-request: 100m
  juicefs/mount-image: docker.io/juicedata/mount:ce-v1.2.1
  juicefs/mount-memory-request: 512Mi
  pathPattern: ${.PVC.namespace}-${.PVC.name}
provisioner: csi.juicefs.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Environment:

  • JuiceFS CSI Driver version (which image tag did your CSI Driver use): v0.26.3
  • Kubernetes version (e.g. kubectl version): v1.31.2
  • Object storage (cloud provider and region): Azure westus3
  • Metadata engine info (version, cloud provider managed or self maintained): Redis 7.4.1 self-maintained, the same problem happens with Managed Postgres 15.8
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • Others:
@traviswt traviswt added the kind/bug Something isn't working label Jan 23, 2025
@zxh326
Copy link
Member

zxh326 commented Jan 24, 2025

You need to use this PVC at least once

@traviswt
Copy link
Author

That would be if the storageclass has:
volumeBindingMode: WaitForFirstConsumer

not if the storageclass has:
volumeBindingMode: Immediate

https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode

@zxh326
Copy link
Member

zxh326 commented Jan 24, 2025

The subpath is only created when it is used, so if the PVC has not been used before, it will occur such an error when resize pvc.

@traviswt
Copy link
Author

I understand what you are saying, but this makes the juicefs CSI driver not compliant with the storage class spec, thus I believe this is a bug.

The existences of a bound Persistent volume makes this a bug. The creation of the PV should probably create the dir so that subsequent processes like the resizer will work.

Would you be open to a PR like this or some solution that wouldn't require an admission controller to not allow consumers of accidentally making a change to a PVC that is actually ready and bound?

@zxh326
Copy link
Member

zxh326 commented Jan 24, 2025

yeah, we have another issue to track this problem, but it may cause some issues, such as needing to mount it before creating the directory, which will make the process of creating pv very slow.

@zxh326
Copy link
Member

zxh326 commented Jan 24, 2025

We are also considering adding an improvement for the client (juicefs), such as automatically creating subpath when setting quotas, of course, it is still under consideration.

If you have better suggestions, feel free to reply

@traviswt
Copy link
Author

yeah, we have another issue to track this problem, but it may cause some issues, such as needing to mount it before creating the directory, which will make the process of creating pv very slow.

It is expected for the binding process when the PV is created to take some time, when using EBS or other cloud disks, that can take minutes depending on the configurations and such from the cloud provider. I like the idea of tying all the bootstrapping and the mountOptions and such to that PV provisioning step, rather than tying it to the quota step. "juicefs quota" looks to be behaving the correct way, if the --path doesn't exist it is just telling you "no such file or directory".

@traviswt
Copy link
Author

I'm going to close this issue. On further testing the errors will keep happening and the PV will not increase size, even if that is for hours and hours. But once a pod actually mounts the volume, the resize occurs successfully as @zxh326 mentioned repeatedly to me. Thanks for your patience and discussing this with me. I do think that the behavior would be better o perform the steps necessary during bootstrapping, but in the end, it does work as intended when a consumer of the volume actually mounts it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants