-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backups stuck Deleting because of "filename too long" #8434
Comments
Who decided to use Ubuntu 22.04? is it Velero? or your workload? |
Thanks for the detailed analysis. If the reason is the backup metadata file name is too long, is it possible to shorten the backup name to resolve the issue? |
I think 255 is a common limit. |
Goal is to make sure this line is less than 255 because it is path+filename that is going to be opened. velero/pkg/archive/extractor.go Line 87 in f5c159c
Possible workaround could include, not directly writing to this filename, but to generate uuid short enough that path+uuid fit within 255. We would then lookup in a map the full file name (if file name matters at all). If file name do not matter, because it's read in but thrown away, we can just write to a file named in single digits The resulting dir with filenames is read here.
It does not appear that Parse func care for file name, just dir name. Lines 121 to 141 in 74db209
ok but looking further.. file name is used here Line 159 in 74db209
Depending on how common this is we could certainly do something about it.
|
It is the length of the filename IN the backup, not the length of the backup name itself. I will communicate with the customer to see if they can reduce the length of the names and to skip the workloads so the rest of things can be backed up normally and so the retention deletes succeed. Seems like for the long term it may be worthwhile to not add too long files to the tarball to prevent issues with restoring and retention deletes. Or perhaps change the container image to an image that has a higher filename length limit. Definitely interesting because the container image of the application we are backing up itself can support the longer filenames, but the velero pod cannot untar the long file. |
enlighten us? |
I stand corrected... The container being backed up is RHEL 8.6 and is has the same filename length limit. These files are part of kafka topics, so I think they must be less that 255 characters, but since the Velero pod adds a temp directory path; it is exceeding the limit when trying to untar. |
@ameer2rock how far are you beyond 255? ie. is this something where velero untarring into a very short mountpoint name would help? |
tempdir name came from velero/pkg/archive/extractor.go Line 70 in f5c159c
calls velero/pkg/util/filesystem/file_system.go Line 58 in f5c159c
example run produces |
Looks like the tmp path is 80 characters, and the filename is 259. I am going to check with the app team as its possible that the file is problematic within the container itself. It may be a good idea to implement a container filesystem that can handle longer filenames to prevent this sort of thing as well. This is the path: This is the filename: |
@ameer2rock try |
@ameer2rock any updates? |
Sorry for the delay getting back to you. Working with the application owner, we found Kafka topics that were at the 254 characters, basically just before the limit of their container. When Velero backs that up; we it is saved with long-file-name.json. So the .json is what is exceeding the 255 character limit. They are working on fixing those topics right now. |
@ameer2rock please close the issue if fixing the topics works for you. |
Do we still want this fix? At least the PR is done and ready when this becomes needed again. |
We can get around this by modifying Kafka, so I did not test the fix. I still think its valuable because the failure condition causes backups to get stuck deleting and can build up etcd objects. Closing the issue, thank you for your help. |
@ameer2rock thanks for confirming. We'll reopen the issue tho to track #8449 fix for 1.16 release. |
What steps did you take and what happened:
Backups fail to delete from the cluster (stuck in Deleting state) and have to be manually removed from s3 and
kubectl delete backups.velero.io <backup-name>
What did you expect to happen:
Backups maintenance to occur without getting stuck.
The following information will help us better understand what's going on:
I cannot attach the debug logs due to privacy issues, but the issues is easily repeatable by backing up a filename longer than 255 characters. The backup succeeds, but the backup will not delete. I assume a restore would fail in a similar fashion. With debug logging on while deleting the below error is in the log:
error invoking delete item actions: error extracting backup: open file name too long
Anything else you would like to add:
It appears that the container image is based on Ubuntu 22.04, and that has a limit of 255 characters which is getting triggered when Velero tries to unzip the backup bundle.
The error is coming from here:
https://github.com/vmware-tanzu/velero/blob/main/internal/delete/delete_item_action_handler.go
And from the os class here:
https://github.com/golang/go/blob/a3c068c57ae3f71a7720fe68da379143bb579362/src/os/getwd.go#L57
Ubuntu 22.04 ext4 filename limit
https://help.ubuntu.com/stable/ubuntu-help/files-rename.html.he#:~:text=This%20255%20character%20limit%20includes,and%20folder%20names%20where%20possible.
Is it possible to have a container image that supports longer filenames? If not, would it be possible to not backup files with filenames > 255 characters and skip them with an error to prevent the issue I describe?
Environment:
velero version
): v1.13.2velero client config get features
):kubectl version
): 1.25.13/etc/os-release
): Ubuntu 22.04.3 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: