Skip to content

Commit

Permalink
Merge pull request #81 from lsst-sqre:tickets/DM-46137
Browse files Browse the repository at this point in the history
DM-46137: Enforce timeouts in GitHub Checks runs
  • Loading branch information
jonathansick authored Sep 12, 2024
2 parents 4f77663 + 74c2136 commit a52110f
Show file tree
Hide file tree
Showing 13 changed files with 739 additions and 464 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ repos:
- id: check-toml

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.6.3
rev: v0.6.4
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ Collect fragments into this file with: scriv collect --version X.Y.Z

<!-- scriv-insert-here -->

<a id='changelog-0.13.0'></a>

## 0.13.0 (2024-09-12)

### New features

- Times Square now specifies a timeout for notebook executions with Noteburst. This provides better feedback for notebook executions that have hung, either when rendering new pages for users, for when doing a GitHub Check Run. Currently this timeout is set app-wide with the `TS_DEFAULT_EXECUTION_TIMEOUT` environment variable. The default timeout is 60 seconds. In the future we intend to add per-notebook timeout configuration.

- Times Square enforces a timeout when polling for Noteburst results during a GitHub Check run. This prevents the Check Run from hanging indefinitely if the Noteburst service is unable to time out a notebook execution or fails for any reason. This timeout is configurable via the `TS_CHECK_RUN_TIMEOUT` environment variable. The default timeout is 600 seconds.

<a id='changelog-0.12.0'></a>

## 0.12.0 (2024-09-04)
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# - Runs a non-root user.
# - Sets up the entrypoint and port.

FROM python:3.12.5-slim-bookworm as base-image
FROM python:3.12.6-slim-bookworm as base-image

Check warning on line 17 in Dockerfile

View workflow job for this annotation

GitHub Actions / build

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/

# Update system packages
COPY scripts/install-base-packages.sh .
Expand Down
304 changes: 155 additions & 149 deletions requirements/dev.txt

Large diffs are not rendered by default.

494 changes: 249 additions & 245 deletions requirements/main.txt

Large diffs are not rendered by default.

38 changes: 19 additions & 19 deletions requirements/tox.txt
Original file line number Diff line number Diff line change
Expand Up @@ -192,25 +192,25 @@ urllib3==2.2.2 \
# -c requirements/dev.txt
# docker
# requests
uv==0.4.1 \
--hash=sha256:0e1c65e266bbac763d63b7d80542b12d6cf841f2db56a069ef9583e6ef1d9fab \
--hash=sha256:0f0daa94e6d95b9ebd62420aa576a6a5792e03716763f27fa3430a57606de122 \
--hash=sha256:132fa34c5813b4a2970d3ab74b700a02eb59d14bdea322a7037be1c0bd2b81c7 \
--hash=sha256:1a253ed3d03a4a9a3d4c54feb55f3cda93d75a4038b6d9665c9b5c05e51ea594 \
--hash=sha256:35992e42c1d52cbdc3db10af147818f5a440be1a2be3dfe042dfae7f0a05d71a \
--hash=sha256:3a26d911b72848062af1d0525dda85d538c57002c10c5965d591bdc774e701a2 \
--hash=sha256:596b820cf452cb4a43df886043d9b881f894efbeb076184c29f5b50222164926 \
--hash=sha256:6068ad80a23011ef508351ddc377172d98cebfdd0e9c88d06d5b13c67d2053c3 \
--hash=sha256:6af4e8a605aefabee333e110c93a198e1fb304f054bee822912952bf546a4896 \
--hash=sha256:7c129dc549be0a36bc0ce9540ebc981755af5a7c07076cce6564a2f10d193220 \
--hash=sha256:7eaeeda1ee71671981274fbacac3258e8f878a6de254e6c38d4d8f0e7055aa2a \
--hash=sha256:ad566cd468f98ae33ea475c1bf47ec6ea6dd3be0b6b9065c4a23def54b837919 \
--hash=sha256:bc11b55a15255deea1541dfe81455df52ad74483539a20b6515db753ec721f52 \
--hash=sha256:cf17b70464b4e632cb3f3193206a231d94be98e520a254515f28abcb4110c738 \
--hash=sha256:d09a1888dc9580c29c8deeeaf8a991e051c1b73650e897df606ef06b8622f544 \
--hash=sha256:d6eb1d759a6a23d6b9d8ccb900a49814bdeeee32dbcd41bd0ed312373b8b2beb \
--hash=sha256:f81f89a2c0cce1480816f5172edc39c3a80ff668404a72c3d6b7a7cdd90cd5ac \
--hash=sha256:fcdc6503100e86fecf6a727d3149e54581e8e9ad6c10e814fc5d22c1e80fab8d
uv==0.4.5 \
--hash=sha256:04463593c881b3b723cc5b7d5fafe1c2e173ed82b7f91a58de3331c3cf99abed \
--hash=sha256:16f74a04de2e91fdcea10a452d033d951a9475dd3ba0ea27da6b39d0514edfbd \
--hash=sha256:1e6a642a2c7336ce8d11e6619820ad28018f2c1e3fb195ea725026e58fc80304 \
--hash=sha256:294a62feeb97778163b4dda410c8057c5d995fbe8083c903456d9be14531e814 \
--hash=sha256:31baeda83a270ff3a3be0f0c422d47668fc89e85a770b58ed7cb4ed4e3b3113e \
--hash=sha256:3693abedfd3e2dcb770eebaa0c3329e87e2ba2fcc8fc634fa0aa9bf221d553e9 \
--hash=sha256:58582b81d6534a6c514fb00d52b28c817ef21e4597501db6cc31c15d743fb3da \
--hash=sha256:77486c10b0881cb9ad298170df2874f97642e6fc3e3a1eed9d64db076bf3f556 \
--hash=sha256:85dc5f40cb0915f0980df5620d7759d48fc5bd0dcfe375923236e37544aa1c31 \
--hash=sha256:985a84122d8cfc09d04fbbe58e93ef2dfe8951c950deb362b7b10609f54b1afd \
--hash=sha256:a364d561d40757d3cc386441f289fc6ed36a35bc6c0e7dfe26cfee803f00fd56 \
--hash=sha256:b658ca04e82e58d60bfaa6e537bc217c4d466b9caec941b1d2173ca8d95260d1 \
--hash=sha256:c04216f0e92caaa6587e1429d7e6ed4677e38f560634a592018211b902e18503 \
--hash=sha256:d293f7276f484dfbc805c7f8880c8cd64769dad8dee540bb832c03146aacf167 \
--hash=sha256:d7545f93eee535a3b5cd79cea96184f8d68221021f13adf865fb4f383612dbf0 \
--hash=sha256:de61baceb95f1ea00224492a74ce5aec45a0566dc720dc0934e5d251f0cf25e5 \
--hash=sha256:eb2109dca9cc20f3540a02619783358a10ec776bac47e44b143ba4cff9ff633e \
--hash=sha256:fbe541ba6f462b10925438769080d1487fa84c2647610cb9f57f11ee190a5658
# via tox-uv
virtualenv==20.26.3 \
--hash=sha256:4c43a2a236279d9ea36a0d76f98d84bd6ca94ac4e0f4a3b9d46d05e10fea542a \
Expand Down
24 changes: 24 additions & 0 deletions src/timessquare/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,30 @@ class Config(BaseSettings):
),
] = "lsst-sqre"

github_checkrun_timeout: Annotated[
int,
Field(
gt=0,
alias="TS_CHECK_RUN_TIMEOUT",
description=(
"The maximum time in seconds to wait for a check run to "
"complete."
),
),
] = 600

default_execution_timeout: Annotated[
int,
Field(
gt=0,
alias="TS_DEFAULT_EXECUTION_TIMEOUT",
description=(
"The default execution timeout for notebook execution jobs, "
"in seconds."
),
),
] = 60

redis_queue_url: EnvRedisDsn = Field(
alias="TS_REDIS_QUEUE_URL",
description=("URL for the redis instance, used by the worker queue."),
Expand Down
71 changes: 68 additions & 3 deletions src/timessquare/domain/githubcheckrun.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@
from timessquare.config import config
from timessquare.exceptions import PageJinjaError

from ..storage.noteburst import NoteburstJobResponseModel, NoteburstJobStatus
from ..storage.noteburst import (
NoteburstErrorCodes,
NoteburstJobResponseModel,
NoteburstJobStatus,
)
from .githubcheckout import (
GitHubRepositoryCheckout,
NotebookSidecarFile,
Expand Down Expand Up @@ -493,15 +497,76 @@ def report_noteburst_completion(
raise RuntimeError("Page execution has no notebook source path")
self.notebook_paths_checked.append(notebook_path)
if not job_result.success:
if job_result.error:
if (
job_result.error.code == NoteburstErrorCodes.timeout
and job_result.timeout
):
title = "Notebook execution timeout"
message = (
"The notebook execution timed out "
f"({job_result.timeout:.0f} sec.)."
)
else:
title = f"Noteburst error: {job_result.error.code}"
message = (
job_result.error.message
or "We couldn't run this notebook successfully."
)
else:
title = "Noteburst error"
message = "We couldn't run this notebook successfully."
annotation = Annotation(
path=notebook_path,
start_line=1,
message="We couldn't run this notebook successfully.",
title="Notebook execution error",
message=message,
title=title,
annotation_level=GitHubCheckRunAnnotationLevel.failure,
)
self.annotations.append(annotation)

if job_result.ipynb_error:
# An exception occured in the notebook execution
annotation = Annotation(
path=notebook_path,
start_line=1,
message=job_result.ipynb_error.message,
title=f"Notebook exception: {job_result.ipynb_error.name}",
annotation_level=GitHubCheckRunAnnotationLevel.failure,
)

def report_noteburst_timeout(
self,
*,
page_execution: PageExecutionInfo,
job_result: NoteburstJobResponseModel | None = None,
) -> None:
"""Report that the notebook execution failed to complete in time."""
path = page_execution.page.repository_source_path
if path is None:
raise RuntimeError("Page execution has no notebook path")
message = "The notebook execution timed out."
if job_result:
if job_result.status == NoteburstJobStatus.in_progress:
message += (
" The notebook execution is still in progress "
f"after {job_result.runtime.total_seconds()} seconds."
)
elif job_result.status == NoteburstJobStatus.queued:
message += (
" The notebook execution is still in the Noteburst queue."
f"after {job_result.runtime.total_seconds()} seconds."
)
annotation = Annotation(
path=path,
start_line=1,
message=message,
title="Noteburst timeout",
annotation_level=GitHubCheckRunAnnotationLevel.failure,
)
self.annotations.append(annotation)
self.notebook_paths_checked.append(path)

@property
def summary(self) -> str:
notebooks_count = len(self.notebook_paths_checked)
Expand Down
16 changes: 15 additions & 1 deletion src/timessquare/domain/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from base64 import b64encode
from collections.abc import Mapping
from dataclasses import asdict, dataclass, field
from datetime import UTC, datetime
from datetime import UTC, datetime, timedelta
from pathlib import PurePosixPath
from typing import Any
from uuid import uuid4
Expand All @@ -29,6 +29,7 @@
ParameterSchemaError,
)

from ..config import config
from ..storage.noteburst import NoteburstJobModel

NB_VERSION = 4
Expand Down Expand Up @@ -332,6 +333,19 @@ def repository_sidecar_path(self) -> str | None:
)
)

@property
def execution_timeout(self) -> timedelta:
"""The execution timeout for the page, in seconds.
Note
----
Currently this timeout defaults to the `default_execution_timeout`
configuration value. In the future, this setting may be configurable
per-page. At that time, this property will use the per-page setting,
falling back to the Times Square default if not set.
"""
return timedelta(seconds=config.default_execution_timeout)

@staticmethod
def read_ipynb(source: str) -> nbformat.NotebookNode:
"""Parse Jupyter Notebook source into `~NotebookNode` for
Expand Down
Loading

0 comments on commit a52110f

Please sign in to comment.