Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new improved retry logic #1282

Merged
merged 5 commits into from
May 29, 2024
Merged

Conversation

r4victor
Copy link
Collaborator

@r4victor r4victor commented May 29, 2024

Closes #1200

This PR introduces retry property to profiles and run configurations as a replacement for retry_policy. It allows specifying different events that should be handled with retry. For example, users may retry provisioning the compute if there is no capacity:

type: dev-environment
ide: vscode
retry:
  on_events: [no-capacity]
  duration: 5m

Supported events are no-capacity, interruption, and error.

The duration is now calculated as a run age for no-capacity and as a time passed since the last interruption and error for interruption and error events. For example, when specifying

retry:
  on_events: [interruption]
  duration: 5m

dstack will try provisioning a new instance for 5m since the interruption. If there is no capacity for >5m, the run fails.

TODO:

  • Update CLI run attach mode to support retrying runs – currently CLI exits after run errors.
  • Update CLI retry syntax to support specifying retry events.

@r4victor r4victor marked this pull request as ready for review May 29, 2024 07:08
@r4victor r4victor merged commit 042f12c into master May 29, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve retry policy
1 participant