Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job array causes error message on older versions of k8s #65

Open
dakoner opened this issue Jan 20, 2023 · 0 comments
Open

Job array causes error message on older versions of k8s #65

dakoner opened this issue Jan 20, 2023 · 0 comments

Comments

@dakoner
Copy link
Contributor

dakoner commented Jan 20, 2023

The k8s executor depends on a feature added in k8s v1.24: https://kubernetes.io/docs/tasks/job/indexed-parallel-processing-static/

When I run a job on my EKS cluster using defaults (where max_array_size > 1), which is running v1.21, I see these errors (warnings?):

[redun] Executor[k8s]: Pod redun-job-d64219e107664faab6f1223c52909c0a-array-888pz is missing job-completion-index: {'kubernetes.io/psp': 'rafay-privileged-psp'}

The k8s jobs are all in Error state, and the workflow never finishes because it gets that error.

We already have code that should be detecting versions less than v1.21
https://github.com/insitro/redun/blob/main/redun/executors/k8s.py#L418
but I think these code path still execute:
https://github.com/insitro/redun/blob/main/redun/executors/k8s.py#L478
and
https://github.com/insitro/redun/blob/main/redun/executors/k8s.py#L771

To repro, I think you could use minikube to install v1.23 or earlier and then run redun in it.
To fix, I think you could remove the warning at
https://github.com/insitro/redun/blob/main/redun/executors/k8s.py#L771
and properly handle tasks that are missing that field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant