Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve jobrunner wait code to retry on recoverable errors #45

Open
dsimansk opened this issue Jul 22, 2021 · 8 comments
Open

Improve jobrunner wait code to retry on recoverable errors #45

dsimansk opened this issue Jul 22, 2021 · 8 comments
Labels
kind/enhancement triage/accepted Issues which should be fixed (post-triage)

Comments

@dsimansk
Copy link
Contributor

Per PR discussion: https://github.com/knative-sandbox/kn-plugin-event/pull/42/files#r674008140

Let investigate the RetryWatcher from client-go tools and knative/client waiting loop to introduce retry on error in the waiting loop.

There are cases that may result in premature closing of ResultChan, that's caused by recoverable failure. Detecting such occasions and re-creating watcher instance might be very desired.

https://github.com/knative-sandbox/kn-plugin-event/blob/0e1fe36db37682a10548808563a38c7e5ac3814c/pkg/k8s/jobrunner.go#L68-L96

@dsimansk
Copy link
Contributor Author

/kind enhancement

@cardil
Copy link
Contributor

cardil commented Oct 15, 2021

I think #68 is directly impacted by this.

For visibility linking discussion which lead to opening this issue #42 (comment)

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2022
@cardil
Copy link
Contributor

cardil commented Jan 14, 2022

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2022
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2022
@cardil
Copy link
Contributor

cardil commented Apr 26, 2022

/remove-lifecycle stale
/triage accepted

@knative-prow knative-prow bot added triage/accepted Issues which should be fixed (post-triage) and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 26, 2022
@cardil
Copy link
Contributor

cardil commented Mar 23, 2023

@dsimansk Do you think this issue should be part of the Stable release (GA) milestone?

@dsimansk
Copy link
Contributor Author

@cardil depends if we see too much intermittent failures in tests or in the "production" use. However, my original proposal was to use client-go lib that's probably better tested and covers more recoverable failures.
As a bottom line I'd like to have wait based in kn on that lib as well. But historically it wasn't possible due to supporting Cloud Run in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

3 participants