Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: chmod <file> Operation not permitted #2645

Closed
sfrolich opened this issue Oct 31, 2024 · 21 comments
Closed

error: chmod <file> Operation not permitted #2645

sfrolich opened this issue Oct 31, 2024 · 21 comments
Assignees
Labels
known-issues Issues that are known or not supported yet. p2 P2 pending customer action

Comments

@sfrolich
Copy link

sfrolich commented Oct 31, 2024

Describe the issue
When trying to run a chmod command on a file, I will get the error:
error: chmod <file> Operation not permitted
What I was actually trying to do is git clone a repo and the error I get from that is:

git clone [email protected]:xxx/yyy.git
Cloning into 'yyy'...
error: chmod on /home/gcs_shared/yyy/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'

System & Version (please complete the following information):

  • OS: Pod is Ubuntu 22.04
  • Platform: GKE
  • Version: GCS Fuse: v2.3.2 GKE: 1.30.5-gke.1014001
  • Version: GCS Fuse CSI Driver: gke.gcr.io/gcs-fuse-csi-driver:v1.7.0-gke.0

Steps to reproduce the behavior with following information:

  1. Mount a bucket using GCS Fuse v2.3.3
  2. Touch a file
  3. chmod the file

Additional context
I was able to git clone with v2.1.0 and with at least v2.3.2 I cannot git clone.

SLO:
We strive to respond to all bug reports within 24 business hours provided the information mentioned above is included.

@sfrolich sfrolich added p2 P2 question Customer Issue: question about how to use tool labels Oct 31, 2024
@Tulsishah
Copy link
Collaborator

Tulsishah commented Nov 2, 2024

Hi @sfrolich,

The git clone feature is supported by default in gcsfuse releases from v2.3.0. I can clone other repositories using git clone , but I don't have permission to clone your( e.g. [email protected]:xxx/yyy.git) repository. Could you please share your gcsfuse logs to help me debug this issue?

Thanks,
Tulsi Shah

@sfrolich
Copy link
Author

sfrolich commented Nov 2, 2024

Hi @Tulsishah I was using a random Public Github project to clone so you should be able to pick anyone and try it. Also you can just try to chown any file and see roughly the same error.

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Nov 4, 2024

@sfrolich

I was using a random Public Github project to clone so you should be able to pick anyone and try it

Cloning a repo inside a gcsfuse mounted directory worked for me without any problem with the both latest gcsfuse version (v2.5.0) and v2.3.2.

$ git clone https://github.com/jacobsa/fuse                                              
Cloning into 'fuse'...                                                                                                                                                                    
remote: Enumerating objects: 6856, done.                                                                                                                                                  
remote: Counting objects: 100% (1256/1256), done.                                                                                                                                         
remote: Compressing objects: 100% (189/189), done.                                                                                                                                        
remote: Total 6856 (delta 1132), reused 1100 (delta 1066), pack-reused 5600 (from 1)                                                                                                      
Receiving objects: 100% (6856/6856), 1.20 MiB | 22.00 KiB/s, done.                                                                                                                        
Resolving deltas: 100% (4248/4248), done.                                                                                                                                                 
Updating files: 100% (89/89), done.

Also you can just try to chown any file and see roughly the same error.

chown, and chperm are not supported in gcsfuse mounted directories as documented here.

As in the above documentation, all the files/directories in a gcsfuse mount are owned by the UID of the process that created that mount. This ownership can be set only once i.e. at the mount-time by passing the --uid and/or --gid configs to the gcsfuse mount command. For other users to be able to access the mount, pass -o allow_other in the gcsfuse mount-command.
The default file/directory permissions can also be set only during mount using --file-mode and/or --dir-mode in gcsfuse mount command.

If git clone is somehow needing to call chown then there is some permission issue.
Please share more information about what gcsfuse command you used to mount the bucket, and the user account and permissions of the directory where you're doing the git clone. It'll be very helpful to debug the issue.

@sfrolich
Copy link
Author

sfrolich commented Nov 5, 2024

Hmmm, interesting. Read the doc you had above and my chmod is not being ignored. It does this:

(base) jovyan@scott1-0:~/gcs_shared$ ls -l
total 1
-rw-rw-r-- 1 root users 3 Apr 15  2024 foo
-rw-rw-r-- 1 root users 0 Mar 29  2024 foo2
-rw-rw-r-- 1 root users 0 Oct 30 16:57 foo3
-rw-rw-r-- 1 root users 0 Oct 30 16:57 foo4
(base) jovyan@scott1-0:~/gcs_shared$ chmod 0777 foo
chmod: changing permissions of 'foo': Operation not permitted
(base) jovyan@scott1-0:~/gcs_shared$ id
uid=1000(jovyan) gid=100(users) groups=100(users)

I'm using the GCS Fuse CSI Driver gke.gcr.io/gcs-fuse-csi-driver:v1.7.0-gke.0. I think that is my problem that chmod is not being ignored.

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Nov 5, 2024

thanks @sfrolich for sharing more info.

my chmod is not being ignored.

For me it's consistently passing without actually changing permission. I think for you it's not being ignored because the error is being generated by the Linux kernel itself because the file in question (foo) is owned by root (UID=0) and it's being chmod-ed by another user jovyan (UID=1000). So this error isn't specific to a gcsfuse mount, rather a generic Linux file-system issue where you can't change the permissions or ownership of the files that you don't own.
In a gcsfuse mount, the only difference is that it'll be silently ignored when the mount-owner user tries to chmod on one of the files.

I think that is my problem that chmod is not being ignored.

Yes, that looks like the blocker here. You definitely have the write permissions on all the files here, so I am assuming that you have the permissions to create new files/directories too in here, so ideally chmod shouldn't even come into picture.
If chown/chmod can be avoided somehow then that's the best thing. If it's unavoidable, then you need to either mount the gcsfuse file system yourself as jovyan or access the file-system as the root user. I tried other approaches like adding -o allow_other and/or --file-mode=777 and/or --dir-mode=777 on the gcsfuse mount-command, but none allows chmod to work with a non-owner user.

Also, could you share the gcsfuse command that you're using to mount your bucket (feel free to hide the real bucket-name or dir-names) ?

@gargnitingoogle gargnitingoogle added pending customer action known-issues Issues that are known or not supported yet. and removed question Customer Issue: question about how to use tool labels Nov 5, 2024
@sfrolich
Copy link
Author

sfrolich commented Nov 8, 2024

@gargnitingoogle you are right. I thought if you were part of the group on a file you could run chown but apparently you need to be user/owner. Sorry about that.

The gke-fuse-sidecar is mounting like this from the log:
I1108 19:25:05.347309 1 sidecar_mounter.go:75] gcsfuse mounting with args [--foreground --gid 100 --file-mode 664 --implicit-dirs --app-name gke-gcs-fuse-csi --uid 0 --temp-dir /gcsfuse-buffer/.volumes/kubeflow-scott-gcs-fuse-csi-shared-volume-pv/temp-dir --config-file /gcsfuse-tmp/.volumes/kubeflow-scott-gcs-fuse-csi-shared-volume-pv/config.yaml --prometheus-port 8080 --dir-mode 775 kflow-gke-dev-cakefs /dev/fd/3]...

Strange thing is something has changed in 2 separate environments that result in the same issue. Where git clone used to work, it no longer does. Suspected it was a fuse change.

Is there a way to specify the user jovyan instead of --uid=0 using the CSI driver? Any docs on changing that? It might work if we change the user.

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Nov 9, 2024

@sfrolich My bad. I missed out on the fact that you're accessing gcsfuse mounts through gke csi driver.

For passing uid, gid to gcsfuse mounts through csi driver, you need to pass them through the mountOptions attribute in the gcsfuse volume in the pod config yaml. Refer https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#consume-ephemeral-volume-pod for how/where to change gcsfuse mountOptions . You need to append uid=1000 to the value of mountOptions (username is not supported). It's a comma-separated string value e.g. uid=1000,gid=1000 .

Setting uid,gid on the mountOptions might cause other problems though e.g. Any file operations being done on the gcsfuse mount in the container's command may fail as the container would probably still be running as root user and then container run may run into a problem. I'd suggest moving the git clone (and other chmod command-sources) into your pod's container's command rather than changing the gcsfuse volume's uid/gid, if it's possible for you.

@sfrolich
Copy link
Author

@gargnitingoogle do you have any idea what has changed between the two versions? If this a CSI issue and not a driver issue? Without us changing anything on how we use the CSI driver git went from working to not working.

@gargnitingoogle
Copy link
Collaborator

@gargnitingoogle do you have any idea what has changed between the two versions? If this a CSI issue and not a driver issue? Without us changing anything on how we use the CSI driver git went from working to not working.

I am not aware of any changes in the csi driver lately which might have affected file system permissions in the mounted directory. Please share your old gke cluster version and csi version, so that I could check with the gke team.

@sfrolich
Copy link
Author

sfrolich commented Nov 15, 2024

The CSI Driver was: v1.3.2
GKE Version was around: v.1.28.12 sorry since GKE auto-upgraded itsself, I don't have the exact version but I do know that the original fix was checked into CSI 1.3.2 and 1.4.1 here: GoogleCloudPlatform/gcs-fuse-csi-driver@760a4aa and here is the driver fix: #1016
At this moment in time, git clone worked

@raj-prince
Copy link
Collaborator

raj-prince commented Nov 17, 2024

Hi @sfrolich ,

This issue is easily reproducible by running git clone or chmod via a non-root user who has read/write access over the mounted filesystem but not an owner. I was able to reproduce this error in gcsfuse versions v2.1.0, v2.3.0, and the latest build.

This behavior makes sense to me; otherwise, it would create multiple security risks, as anyone can change the permission bit. I'm not sure how it worked before. Are you sure, there are no user/group related changes in the old working system?

The recommendation would be to change the owner using the 'uid' option to get the git clone working. As the instructions provide by @gargnitingoogle in comment. In the meantime, I’ll confirm the behavior with the gcs-fuse-csi-driver team.

Regards,
Prince Kumar.

@sfrolich
Copy link
Author

sfrolich commented Nov 21, 2024

I changed my uid and gid to match the current user's uid and gid and I got further. Now I'm running into this issue where git clone throws a different error fatal: 'origin' does not appear to be a git repository It is indeed a valid repo with origin defined. If I take the same repo and cone it outside the GCS share it works.

https://superuser.com/questions/1079126/git-clone-fatal-origin-does-not-appear-to-be-a-git-repository-for-vmware-vmhg

I tried git init then git remote add origin and If I look into the .git/config file I see the definition of [origin] is missing the url. I can add the url and then git works properly but still there's an issue.

I also tried running git clone as root user and get the same error

@gargnitingoogle
Copy link
Collaborator

If I look into the .git/config file I see the definition of [origin] is missing the url

Probably the git clone process or one of its sub-processes failed to write to .git/config file, probably because of some other permission issue.

To debug this further, we need to look at your gcsfuse debug logs. Otherwise we're just shooting in the dark I think.

  1. Please enable debug-log in your mount by setting gcsfuseLoggingSeverity to trace (find details on this page)
  2. Get relevant logs from logs explorer page in your cloud console, by filtering on time, pod_name etc. Container-name to search is gke-gcsfuse-sidecar .
  3. Share the relevant debug logs with us to see what's going on.

I can add the url and then git works properly but still there's an issue.

What is this issue ?

@sfrolich
Copy link
Author

sfrolich commented Nov 22, 2024

Can't seem to get the trace logs to show. I tried the gcsfuseLoggingSeverity setting in both places below. Am I seeting it wrong?

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-shared-volume-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 500Gi
  storageClassName: gcs-storage-class
  mountOptions:
  - implicit-dirs
  - uid=1000
  - gid=100
  - file-mode=0640
  - dir-mode=0750
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: {{ .Values.gcp.gcs.cakefs }}
    gcsfuseLoggingSeverity: trace
    volumeAttributes:
      gcsfuseLoggingSeverity: trace

@vadlakondaswetha
Copy link
Collaborator

Can you try adding - log-severity=trace under mountOptions similar to other flags.
Also when i passed file-mode=777 and dir-mode=777, the git clone worked for me and got permission denier with file-mode=640 and dir-mode=750.

Can you try with 777 and check if its working?

@sfrolich
Copy link
Author

file-mode and dir-mode 777 didn't work either. root and 1000 user (to match the uid) didn't work either. I got the logs from doing a git clone attached. Thanks for going above and beyond on this.

downloaded-logs-20241126-152017.json

@kislaykishore
Copy link
Collaborator

@sfrolich could you share the complete logs? I don't see the initial few log lines that log the GCSFuse configuration.
Also, it'd be helpful to summarize the correct versions of GKE, CSI and GCSFuse since the description says that you are using CSI v1.7.0 however, the GCSFuse version that it comes with is v2.5.0 and not v2.3.2.

@sfrolich
Copy link
Author

sfrolich commented Dec 2, 2024

downloaded-logs-20241202-135927.json
At the time of this log I am running:

  • GKE: v1.30.3-gke.1969001
  • CSI: gke.gcr.io/gcs-fuse-csi-driver:v1.7.0-gke.0@sha256:049688ed328d6b90ad7311bfadebc17ed3c71f34798b978498a09db3ccc3013f
  • GCSFuse: What is the best way to definitely determine this? I was just looking at git commits before

@sfrolich
Copy link
Author

sfrolich commented Dec 2, 2024

Here's another logs download with all the containers instead of just the gcs one.
downloaded-logs-20241202-140518.json

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Dec 24, 2024

@sfrolich apologies for the long gap in the responses. Thanks for sharing logs. I looked into them, but did not find any error logs that would help move us in the direction of a diagnosis. You could help us with the following info.

  1. Logs are still missing the gcsfuse mount command in them that we asked in error: chmod <file> Operation not permitted #2645 (comment). These will help us determine if gke passed down the mount-options to gcsfuse correctly. They'd look like the following in the logs, and these would appear in first few minutes of the pods coming up in container gke-gcsfuse-sidecar:
gcsfuse config file content: map[cache-dir: logging:map[file-path:/dev/fd/1 format:json] metadata-cache:map[stat-cache-max-size-mb:-1 ttl-secs:-1 type-cache-max-size-mb:-1]]

gcsfuse mounting with args [--uid 0 --temp-dir /gcsfuse-buffer/.volumes/data-vol/temp-dir --config-file /gcsfuse-tmp/.volumes/data-vol/config.yaml --max-retry-attempts 5 --app-name gke-gcs-fuse-csi --foreground --gid 0 --client-protocol grpc --implicit-dirs --log-severity trace <bucket-name> /dev/fd/3]...
  1. Could you share if this bucket/object already has a partial git cloned directory? If yes, git itself might not be handling the clone command very well. I am asking this because in the logs shared above I don't see any calls for the creation of "test2/skypilot/.git/config", rather it already exists when read the first time. So, either the logs you uploaded aren't from the start or the bucket already had a pre-created .git folder which might cause issues with a git clone .

  2. I see that we have spent a long time on this issue and still don't have enough data or direction to triage the issue. So a logical next step could be to set up a meeting between yourself and gcsfuse engineering for a live debugging session. But for that, you need to open a support ticket.

Copy link

Closing this issue as we haven't received any response in 14 days. Please reopen if you are still experiencing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
known-issues Issues that are known or not supported yet. p2 P2 pending customer action
Projects
None yet
Development

No branches or pull requests

6 participants