Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add post prod deploy smoke test and alert #7057

Merged
merged 24 commits into from
Dec 21, 2023
Merged

Conversation

fzhao99
Copy link
Contributor

@fzhao99 fzhao99 commented Dec 8, 2023

DEVOPS PULL REQUEST

Related Issue

Changes Proposed

  • Adds a backend health actuator endpoint that returns
    • Down if a db table call or an Okta client health call errors
    • Up otherwise
  • Adds a frontend page that pings that endpoint and displays up/down accordingly
  • A workflow that triggers on a prod deploy that sends a Slack alert if things error

Additional Information

Setting up the Pagerduty integration for this was a bit complicated from the ops / infra side, so we opted to use a Slack alert instead in the near term. Followup ticket to swap this out for an eventual Pagerduty alert is here

Testing

The alert is set up in this branch to be triggered post-prod deploy, but workflows can't be triggered by a deploy until the branch defining them gets into main. I've put up this branch
with a push trigger / hard coded env var for dev6 to prove that the script invocation / failure state works for a hard-coded environment variable.

There's not a great way for us to test the "prod deploy" trigger part until this branch gets in, so after merging, I'll make sure to keep an eye on the next prod deploy to make sure everything's working.


@fzhao99 fzhao99 changed the title get custom health actuator working post-prod deploy smoke test Dec 8, 2023
frontend/prod-smoke.js Fixed Show fixed Hide fixed
prod-smoke.js Fixed Show fixed Hide fixed
@@ -78,6 +78,7 @@ management:
endpoint.health.probes.enabled: true
endpoint.info.enabled: true
endpoints.web.exposure.include: health, info
endpoint.health.show-components: always
Copy link
Contributor Author

@fzhao99 fzhao99 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this tweak since the new actuator endpoint is a component of the health actuator. Without it, the API endpoint (very frustratingly) doesn't load

https://www.baeldung.com/spring-boot-health-indicators#customhealthindicators

try {
_repo.findAll();
return Health.up().build();
} catch (JDBCConnectionException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure if there was a better error to throw for "database didn't connect". If folks have suggestions happy to change this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if I mentioned this or not while we were pairing, but my thought was to grab the most recent entry from the databasechangelog table and report something from that table (maybe the md5checksum?) back to the frontend that might be useful to know. Ideally some value that is useful but also definitely not sensitive information.

cc @alismx for the idea of having the checksum be visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the db call to the checksum table, but if we want to pass that info back to the frontend, we'll need to configure the show-details flag on the health check config to always since we'd be hitting it unauthed. Those details include info about the filepath of the application, state of DB, etc that someone would theoretically be able to see if they found that endpoint.

Don't think there's anything too sensitive here, but since passing data back to the frontend was more of a nice to have, I decided to lean against flipping the flag. If we feel strongly that the checksum is worth having on the frontend / we get a security consult that the extra info isn't an issue, happy to make the relevant changes.

Let me know what you think!

@fzhao99 fzhao99 force-pushed the bob/7019-prod-e2e-health-check branch from 3d2a0a0 to 7f0bd63 Compare December 18, 2023 16:55
@fzhao99 fzhao99 marked this pull request as ready for review December 18, 2023 17:11
@fzhao99 fzhao99 requested a review from emyl3 December 19, 2023 15:42
Copy link
Collaborator

@emyl3 emyl3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM code-wise! Thank you for all your work on this and adding test coverage as well! :D

Copy link

sonarcloud bot commented Dec 19, 2023

Quality Gate Failed Quality Gate failed

Failed conditions

41.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Copy link
Collaborator

@mpbrown mpbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding the tests!

@fzhao99 fzhao99 added this pull request to the merge queue Dec 21, 2023
Merged via the queue into main with commit 00a691b Dec 21, 2023
36 of 37 checks passed
@fzhao99 fzhao99 deleted the bob/7019-prod-e2e-health-check branch December 21, 2023 20:55
emyl3 added a commit that referenced this pull request Dec 21, 2023
github-merge-queue bot pushed a commit that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create prod E2E health check
4 participants