-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46137: Enforce timeouts in GitHub Checks runs #81
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This minor refactoring is just to reduce the length of the method.
jonathansick
added a commit
to lsst-sqre/phalanx
that referenced
this pull request
Sep 6, 2024
If a noteburst job is still in progress after 5 minutes in a check run then we report a new timeout failure. This ought to be configurable but demonstrates a pattern for making sure we don't get stuck with hanging notebook executions.
Rather than checking the runtime time of each notebook, which doesn't work well because we may not get the in-progress status of the notebook we can instead use wait_for to put a timeout on the noteburst polling call.
jonathansick
force-pushed
the
tickets/DM-46137
branch
from
September 6, 2024 20:42
79e53fa
to
2d8ff1f
Compare
We want these check runs to be slightly longer than the configuration so that the worker's code has a change to report the timeout before arq kills the job itself and we lose the opportunity to report the check run as failed to GitHub. It seems that there are typing issues with using arq.worker.func, so we'll try to ignore those for now.
jonathansick
force-pushed
the
tickets/DM-46137
branch
from
September 6, 2024 20:59
2d8ff1f
to
286678e
Compare
jonathansick
added a commit
to lsst-sqre/phalanx
that referenced
this pull request
Sep 9, 2024
Previously if the notebook ran at all, but raised an exception, we still marked it as a success. By looking at the ipynb_error field we now can indicate if the notebook itself is broken.
This uses the new "error" field in the noteburst response that classifies the source of the noteburst execution error for better user feedback. This will show the user whether the error was a result of a timeout or an issue with noteburst itself.
jonathansick
force-pushed
the
tickets/DM-46137
branch
from
September 12, 2024 16:50
d6aa088
to
84ad1ab
Compare
This configuration sets an app-wide default configuration for the timeout to set when executing a notebook with noteburst (either during a check run for a PR or in actual use). Using a check run should help ensure that we don't have hanging executions or check runs and can also encourage authors to keep pages fast. In the future we'll add per-page customizations to the timeout so authors can set longer timeouts when needed, or even set shorter timeouts to fail fast in quick notebooks.
This is now handled by noteburst when we set the notebook execution timeout in the execution request. This is a better approach because we don't actually see the in_progress state for a notebook execution (again, because arq doesn't expose that to the job metadata, only to the job result it seems(?)
This results in better printing e.g. in error messages or logs.
jonathansick
force-pushed
the
tickets/DM-46137
branch
from
September 12, 2024 18:54
8439b41
to
74c2136
Compare
jonathansick
added a commit
to lsst-sqre/phalanx
that referenced
this pull request
Sep 12, 2024
jonathansick
added a commit
to lsst-sqre/phalanx
that referenced
this pull request
Sep 12, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Times Square now specifies a timeout for notebook executions with Noteburst. This provides better feedback for notebook executions that have hung, either when rendering new pages for users, for when doing a GitHub Check Run. Currently this timeout is set app-wide with the
TS_DEFAULT_EXECUTION_TIMEOUT
environment variable. The default timeout is 60 seconds. In the future we intend to add per-notebook timeout configuration.Times Square enforces a timeout when polling for Noteburst results during a GitHub Check run. This prevents the Check Run from hanging indefinitely if the Noteburst service is unable to time out a notebook execution or fails for any reason. This timeout is configurable via the
TS_CHECK_RUN_TIMEOUT
environment variable. The default timeout is 600 seconds.