-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArtifactGC fails if output artifact with non-existent file is configured with optional: true
#13583
Comments
optional: true
More info:
Logs of succeeded ArtGC pod, when non-existent artifact is commented out: $ kubectl logs -n argo test-artgc-mqhjs-artgc-wfcomp-2166136261
time="2024-09-10T07:02:51.393Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-2830858817/123artifact-2.tgz"
time="2024-09-10T07:02:51.393Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.393Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-2830858817/123artifact-2.tgz
time="2024-09-10T07:02:51.479Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-2830858817/abcartifact-2.tgz"
time="2024-09-10T07:02:51.479Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.479Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-2830858817/abcartifact-2.tgz
time="2024-09-10T07:02:51.483Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-4098831283/123artifact-3.tgz"
time="2024-09-10T07:02:51.483Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.484Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-4098831283/123artifact-3.tgz
time="2024-09-10T07:02:51.487Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-4098831283/abcartifact-3.tgz"
time="2024-09-10T07:02:51.487Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.487Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-4098831283/abcartifact-3.tgz
time="2024-09-10T07:02:51.492Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-668465731/123artifact-1.tgz"
time="2024-09-10T07:02:51.492Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.492Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-668465731/123artifact-1.tgz
time="2024-09-10T07:02:51.495Z" level=info msg="S3 Delete artifact: key: test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-668465731/abcartifact-1.tgz"
time="2024-09-10T07:02:51.495Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-09-10T07:02:51.495Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=test-artgc-mqhjs/test-artgc-mqhjs-prepare-data-668465731/abcartifact-1.tgz Logs of failed ArtGC pod when WorkFlow fails due to timeout and non-existent artifact is included in the WF YAML: $ kubectl logs -n argo test-artgc-pv7h2-artgc-wfcomp-2166136261
Error: You need to configure artifact storage. More information on how to do this can be found in the docs: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/
You need to configure artifact storage. More information on how to do this can be found in the docs: https://argo-workflows.readthedocs.io/en/latest/configure-artifact-repository/ |
I have to admit that I wasn't aware of the Now I see what you're saying, and see that my PR has some issues both with:
I assigned it to myself to address sometime soon. (Or otherwise let me know if you'd like to work on it @encigem ) |
|
I'm afraid w.r.t ArgoWF, my skills are for finding the bugs, but not fixing them :) |
You know, I'm thinking about this comment you wrote after I wrote up that PR, @agilgur5. If I did this (i.e. just ignoring the error on the Artifact GC side) and basically reverted my previous PR, then the What do you think? |
Ah I thought I had a more specific comment on this exact scenario, thanks for finding it! I still think parallelized deletion and saving would be more optimal and would force us to properly handle these scenarios instead of a premature return. Although if you're looking for a quick fix, yes, that sounds like it could handle this scenario |
Just started to work on this and realized that if I do revert the earlier change I made, it means that all artifacts would be included in the For both Optional and non-Optional artifacts, it seems that we only want to attempt deletion of whatever exists and we don't want to fail Artifact GC just because we're trying to delete some artifact that doesn't exist. If it was a non-Optional artifact, we will have Failed the Workflow itself, but that doesn't mean we should also fail Artifact GC. Therefore, I'm thinking of instead maintaining the notion that a @agilgur5 @encigem feel free to differ with anything I'm saying here if I'm mistaken, thanks. |
Thanks for going through the logic! I agree, that sounds like the most correct way to handle it. |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
What happened?:
optional: true
parameter, then the ArtGC pod which runs after the steps are terminated fails with the message:Error: You need to configure artifact storage. More information on how to do this can be found in the docs: https://argoproj.github.io/argo-workflows/configure-artifact-repository/
optional: true
config from the offending artifact and re-run, then no artifact GC pod is created when the workflow terminates from deadline exceeded.Error: You need to configure artifact storage
messages appear and the other artifacts are cleaned as expected when timeout occurs.What did I expect to happen:
Version(s)
v3.4.10, v3.5.10, latest(sha256:4f03ff7ecaef4061dddd2c08f80de4d766b253aa3a57a87e69dd3a797bb42b1e)
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: