-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeflow Pipelines: Non-transient error: failed to resolve {{
ppln-from-vsc-xkhhr}}
#13512
Comments
ppln-from-vsc-xkhhr
}}Non-transient error: failed to resolve {{
ppln-from-vsc-xkhhr}}
Non-transient error: failed to resolve {{
ppln-from-vsc-xkhhr}}
Non-transient error: failed to resolve {{\
ppln-from-vsc-xkhhr}}
Non-transient error: failed to resolve {{\
ppln-from-vsc-xkhhr}}
Non-transient error: failed to resolve {{
ppln-from-vsc-xkhhr}}
I've tried with hello-world workflow and issue is the same: apiVersion: argoproj.io/v1alpha1
kind: Workflow # new type of k8s spec
metadata:
generateName: hello-world- # name of the workflow spec
spec:
entrypoint: hello-world # invoke the hello-world template
templates:
- name: hello-world # name of the template
container:
image: busybox
command: [ echo ]
args: [ "hello world" ]
resources: # limit the resources
limits:
memory: 32Mi
cpu: 100m
tolerations:
- effect: NoSchedule
key: ComputeResources
value: reservedFor Logs: workflow-controller time="2024-08-27T13:51:18.919Z" level=info msg="Processing workflow" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="Get configmaps 404"
workflow-controller time="2024-08-27T13:51:18.924Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="Updated phase -> Running" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="Pod node hello-world-2jk42 initialized Pending" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=warning msg="Non-transient error: failed to resolve {{`hello-world-2jk42`}}"
workflow-controller time="2024-08-27T13:51:18.924Z" level=error msg="Mark error node" error="failed to resolve {{`hello-world-2jk42`}}" namespace=tomasz nodeName=hello-world-2jk42 workflow=
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="node hello-world-2jk42 phase Pending -> Error" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="node hello-world-2jk42 message: failed to resolve {{`hello-world-2jk42`}}" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=info msg="node hello-world-2jk42 finished: 2024-08-27 13:51:18.924977442 +0000 UTC" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.924Z" level=error msg="error in entry template execution" error="failed to resolve {{`hello-world-2jk42`}}" namespace=tomasz workflow=hello-wor
workflow-controller time="2024-08-27T13:51:18.924Z" level=warning msg="Non-transient error: failed to resolve {{`hello-world-2jk42`}}"
workflow-controller time="2024-08-27T13:51:18.925Z" level=info msg="Updated phase Running -> Error" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.925Z" level=info msg="Updated message -> error in entry template execution: failed to resolve {{`hello-world-2jk42`}}" namespace=tomasz workfl
workflow-controller time="2024-08-27T13:51:18.925Z" level=info msg="Marking workflow completed" namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.925Z" level=info msg="Checking daemoned children of " namespace=tomasz workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.925Z" level=info msg="Workflow to be dehydrated" Workflow Size=1254
workflow-controller time="2024-08-27T13:51:18.930Z" level=info msg="cleaning up pod" action=deletePod key=tomasz/hello-world-2jk42-1340600742-agent/deletePod
workflow-controller time="2024-08-27T13:51:18.936Z" level=info msg="Queueing Error workflow tomasz/hello-world-2jk42 for delete in 168h0m0s due to TTL"
workflow-controller time="2024-08-27T13:51:18.936Z" level=info msg="Delete pods 404"
workflow-controller time="2024-08-27T13:51:18.938Z" level=info msg="Update workflows 200"
workflow-controller time="2024-08-27T13:51:18.938Z" level=info msg="Workflow update successful" namespace=tomasz phase=Error resourceVersion=724983576 workflow=hello-world-2jk42
workflow-controller time="2024-08-27T13:51:18.939Z" level=info msg="Create events 201"
workflow-controller time="2024-08-27T13:51:18.943Z" level=info msg="DeleteCollection workflowtaskresults 200" |
3.3.8 is outdated and unsupported. KFP recently added support for Argo 3.4.x in kubeflow/pipelines#10568, which is supported
This is not an Argo image, that is a Kubeflow fork, so Argo cannot help you with that.
If it works in a different environment, that sounds like an environment issue and not an Argo bug. Every Workflow failing in one environment but not another sounds very much like an environment issue.
Similarly, Argo runs many tests in CI and has many users; the hello-world workflow certainly works, so this sounds like an environment issue as well. To be explicit, I cannot reproduce that.
The Controller is not an Executor, so that is an incorrect configuration and possibly the source of your errors. You also did not provide your Controller ConfigMap, which could also have bugs in it.
You also have some other issues popping up in your logs that are indicative of misconfigurations. |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
Cannot run any workflow via Kubeflow Pipeline. Every attemp ends with
Non-transient error: failed to resolve <name>
.Currently every pipeline/workflow ends with above error. My core version is 3.3.8, but I've also tried with the one below. We have the same version on other environments and it works there.
Workflow-controller Pod core definition:
Workflow-controller Pod logs for given pipeline:
Sample Kubeflow pipeline:
Version(s)
v3.3.8. v3.5.10
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: