Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD-838: timeouts #34

Merged
merged 8 commits into from
Oct 28, 2024
Merged

SD-838: timeouts #34

merged 8 commits into from
Oct 28, 2024

Conversation

hnnsgstfssn
Copy link
Contributor

@hnnsgstfssn hnnsgstfssn commented Oct 24, 2024

Description

Recently some promotion pull requests have logged that they are reaching
the timeout of handling the request, causing updates of the Github
checks to get stuck in a pending state.

Follow commits for details.

It may be easier reviewing with an alternative diff tool, like difftastic (brew install difftastic and GIT_EXTERNAL_DIFF=difft git diff main).

Be sure to remove whitespace changes in the diff view.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

@hnnsgstfssn
Copy link
Contributor Author

hnnsgstfssn commented Oct 28, 2024

It's not the best end state, but it was the smallest change I could come up with on short notice that should do what we wanted. More refactoring is needed in future.

When the event handling takes to long the previous context is canceled
and the request fails and the commit status ends up in a pending state.

Using a separate context will allow the status to always be set,
regardless of the event handling timing out.
Since it is very specific, might as well make it operate directly on
labels. This should make it slightly clearer and easier to read.

This reverts commit fcd2aeffd5b2752c7274514b7c78be7fc1bc60fd.
@hnnsgstfssn hnnsgstfssn marked this pull request as ready for review October 28, 2024 11:34
@hnnsgstfssn
Copy link
Contributor Author

After a call we decided to make a few additional changes to log levels and returning and error, skipping continued execution when the config can not be read.

@hnnsgstfssn
Copy link
Contributor Author

hnnsgstfssn commented Oct 28, 2024

Three test cases were tried manually:

Oded-B
Oded-B previously approved these changes Oct 28, 2024
This change refactors the event handling logic such that a deferred
panic handler can log panics in the downstream handler logic. This
should avoid crashing when such panics occur, and instead it would log
the panic using the logger.

Additionally, the parsing of the event payload to determine which
handling logic to invoke is separated out, and now also indicates
whether a match was found. This is to allow PR status updates to be
applied once, when appropriate, and to enable ensuring that the
success/failure update is always applied.

To achieve this the individual downstream logic is factored out into
separate functions, and errors encountered in them are returned where
prHandleError were previously set.

Getting the default branch and Telefonistka config is duplicated in each
handler as needed.
The message seems to be informative only to developers i.e. better
suited as a debug message.

Moving it to the function that it is actually logging for ensures that
it is always logged, instead of putting this burden on the caller.
Now that the earlier error is returned, the else is not needed and can
be dropped, reducing the indentation of the happy path [1, 2].

[1] https://maelvls.dev/go-happy-line-of-sight/
[2] https://medium.com/@matryer/line-of-sight-in-code-186dd7cdea88
* Change metric type log level to debug

  Logging metric style info is better to handle properly; since this
  is one of very few instances it is changed to debug level for now.
  Future goal is to add tracing and metric instrumentation for such
  information.

* Log event type once at start of handle function

* Drop now duplicate log lines of the event type

* Consistently add event type into PR logger

  Once PR logger includes the event_type. For consistency add the same
  field to the other PR loggers.

* Remove now unused eventType argument
Prior when the configuration was not successfully fetched, the error was
only logged but execution continued. Not fetching the configuration is
an unrecoverable error that should result in upstream failure.

Instead return the error to caller and let them log it.
@hnnsgstfssn hnnsgstfssn merged commit b2bd6de into main Oct 28, 2024
5 checks passed
@hnnsgstfssn hnnsgstfssn deleted the SD-838-timeouts branch January 13, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants