add post prod deploy smoke test and alert #7057

fzhao99 · 2023-12-08T16:23:13Z

DEVOPS PULL REQUEST

Related Issue

Resolves Create prod E2E health check #7019

Changes Proposed

Adds a backend health actuator endpoint that returns
- Down if a db table call or an Okta client health call errors
- Up otherwise
Adds a frontend page that pings that endpoint and displays up/down accordingly
A workflow that triggers on a prod deploy that sends a Slack alert if things error

Additional Information

Setting up the Pagerduty integration for this was a bit complicated from the ops / infra side, so we opted to use a Slack alert instead in the near term. Followup ticket to swap this out for an eventual Pagerduty alert is here

Testing

The alert is set up in this branch to be triggered post-prod deploy, but workflows can't be triggered by a deploy until the branch defining them gets into main. I've put up this branch
with a push trigger / hard coded env var for dev6 to prove that the script invocation / failure state works for a hard-coded environment variable.

There's not a great way for us to test the "prod deploy" trigger part until this branch gets in, so after merging, I'll make sure to keep an eye on the next prod deploy to make sure everything's working.

frontend/prod-smoke.js

prod-smoke.js

fzhao99 · 2023-12-12T14:11:15Z

backend/src/main/resources/application.yaml

@@ -78,6 +78,7 @@ management:
  endpoint.health.probes.enabled: true
  endpoint.info.enabled: true
  endpoints.web.exposure.include: health, info
+  endpoint.health.show-components: always


Need this tweak since the new actuator endpoint is a component of the health actuator. Without it, the API endpoint (very frustratingly) doesn't load

https://www.baeldung.com/spring-boot-health-indicators#customhealthindicators

fzhao99 · 2023-12-12T14:11:48Z

...rc/main/java/gov/cdc/usds/simplereport/api/heathcheck/BackendAndDatabaseHealthIndicator.java

+    try {
+      _repo.findAll();
+      return Health.up().build();
+    } catch (JDBCConnectionException e) {


Wasn't sure if there was a better error to throw for "database didn't connect". If folks have suggestions happy to change this!

I forget if I mentioned this or not while we were pairing, but my thought was to grab the most recent entry from the databasechangelog table and report something from that table (maybe the md5checksum?) back to the frontend that might be useful to know. Ideally some value that is useful but also definitely not sensitive information.

cc @alismx for the idea of having the checksum be visible.

Happy to change the db call to the checksum table, but if we want to pass that info back to the frontend, we'll need to configure the show-details flag on the health check config to always since we'd be hitting it unauthed. Those details include info about the filepath of the application, state of DB, etc that someone would theoretically be able to see if they found that endpoint.

Don't think there's anything too sensitive here, but since passing data back to the frontend was more of a nice to have, I decided to lean against flipping the flag. If we feel strongly that the checksum is worth having on the frontend / we get a security consult that the extra info isn't an issue, happy to make the relevant changes.

Let me know what you think!

.github/actions/post-deploy-smoke-test/action.yml

frontend/src/app/HealthChecks.tsx

...rc/main/java/gov/cdc/usds/simplereport/api/heathcheck/BackendAndDatabaseHealthIndicator.java

.github/workflows/smokeTestDeployProd.yml

emyl3

LGTM code-wise! Thank you for all your work on this and adding test coverage as well! :D

sonarcloud · 2023-12-19T16:05:47Z

Quality Gate failed

Failed conditions

41.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

mpbrown

LGTM! Thanks for adding the tests!

This reverts commit 00a691b.

This reverts commit 00a691b. Co-authored-by: elisa lee <[email protected]>

fzhao99 changed the title ~~get custom health actuator working~~ post-prod deploy smoke test Dec 8, 2023

fzhao99 temporarily deployed to dev2 December 8, 2023 17:25 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev2 December 8, 2023 17:31 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev2 December 8, 2023 17:36 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev2 December 8, 2023 17:41 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Dec 11, 2023

View reviewed changes

frontend/prod-smoke.js Fixed Show fixed Hide fixed

fzhao99 temporarily deployed to dev4 December 11, 2023 15:02 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev4 December 11, 2023 15:10 — with GitHub Actions Inactive

fzhao99 had a problem deploying to dev4 December 11, 2023 15:12 — with GitHub Actions Failure

github-advanced-security bot found potential problems Dec 11, 2023

View reviewed changes

prod-smoke.js Fixed Show fixed Hide fixed

fzhao99 temporarily deployed to dev4 December 11, 2023 16:59 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev4 December 11, 2023 17:10 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev4 December 11, 2023 17:45 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev4 December 11, 2023 17:51 — with GitHub Actions Inactive

fzhao99 commented Dec 12, 2023

View reviewed changes

fzhao99 temporarily deployed to dev7 December 12, 2023 14:51 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev7 December 12, 2023 15:00 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev7 December 12, 2023 15:06 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev7 December 12, 2023 15:23 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev7 December 12, 2023 15:46 — with GitHub Actions Inactive

fzhao99 temporarily deployed to dev7 December 12, 2023 15:55 — with GitHub Actions Inactive

fzhao99 force-pushed the bob/7019-prod-e2e-health-check branch from 9a7e3f8 to e754b8d Compare December 13, 2023 15:23

alismx mentioned this pull request Dec 14, 2023

Update slack alert to a Pagerduty alert #7089

Open

DanielSass reviewed Dec 14, 2023

View reviewed changes

.github/actions/post-deploy-smoke-test/action.yml Outdated Show resolved Hide resolved

DanielSass reviewed Dec 14, 2023

View reviewed changes

frontend/src/app/HealthChecks.tsx Outdated Show resolved Hide resolved

DanielSass reviewed Dec 14, 2023

View reviewed changes

...rc/main/java/gov/cdc/usds/simplereport/api/heathcheck/BackendAndDatabaseHealthIndicator.java Outdated Show resolved Hide resolved

fzhao99 force-pushed the bob/7019-prod-e2e-health-check branch from 03a42d6 to 67e9a1c Compare December 15, 2023 15:34

fzhao99 temporarily deployed to dev7 December 15, 2023 15:36 — with GitHub Actions Inactive

fzhao99 added 8 commits December 18, 2023 10:54

use existing status check instead

c3b8098

string format and equality

a69cb21

move literal to left

a83b4e5

lol it's friday alright

647a3b1

add comment to document workflow

6e1c448

better comment

993b21a

use base domain env var instead

14ecdf0

set env var

7f0bd63

fzhao99 force-pushed the bob/7019-prod-e2e-health-check branch from 3d2a0a0 to 7f0bd63 Compare December 18, 2023 16:55

fzhao99 marked this pull request as ready for review December 18, 2023 17:11

fzhao99 requested review from DanielSass, emyl3, mehansen, mpbrown and alismx December 18, 2023 17:11

emyl3 reviewed Dec 19, 2023

View reviewed changes

.github/workflows/smokeTestDeployProd.yml Outdated Show resolved Hide resolved

don't hard code node version

281798a

fzhao99 requested a review from emyl3 December 19, 2023 15:42

emyl3 approved these changes Dec 19, 2023

View reviewed changes

emyl3 mentioned this pull request Dec 21, 2023

Azure alert for when frontend not getting response from backend #6888

Closed

mpbrown approved these changes Dec 21, 2023

View reviewed changes

fzhao99 added this pull request to the merge queue Dec 21, 2023

Merged via the queue into main with commit 00a691b Dec 21, 2023
36 of 37 checks passed

fzhao99 deleted the bob/7019-prod-e2e-health-check branch December 21, 2023 20:55

emyl3 added a commit that referenced this pull request Dec 21, 2023

Revert "add post prod deploy smoke test and alert (#7057)"

00b4f39

This reverts commit 00a691b.

fzhao99 mentioned this pull request Dec 21, 2023

Revert "add post prod deploy smoke test and alert" #7119

Merged

github-merge-queue bot pushed a commit that referenced this pull request Dec 21, 2023

Revert "add post prod deploy smoke test and alert (#7057)" (#7119)

3df3157

This reverts commit 00a691b. Co-authored-by: elisa lee <[email protected]>

fzhao99 mentioned this pull request Jan 10, 2024

Bob/7019 prod e2e health check #7145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add post prod deploy smoke test and alert #7057

add post prod deploy smoke test and alert #7057

fzhao99 commented Dec 8, 2023 •

edited

Loading

fzhao99 Dec 12, 2023 •

edited

Loading

fzhao99 Dec 12, 2023

DanielSass Dec 14, 2023

fzhao99 Dec 15, 2023

emyl3 left a comment

sonarcloud bot commented Dec 19, 2023

mpbrown left a comment

add post prod deploy smoke test and alert #7057

add post prod deploy smoke test and alert #7057

Conversation

fzhao99 commented Dec 8, 2023 • edited Loading

DEVOPS PULL REQUEST

Related Issue

Changes Proposed

Additional Information

Testing

fzhao99 Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

fzhao99 Dec 12, 2023

Choose a reason for hiding this comment

DanielSass Dec 14, 2023

Choose a reason for hiding this comment

fzhao99 Dec 15, 2023

Choose a reason for hiding this comment

emyl3 left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Dec 19, 2023

Quality Gate failed

mpbrown left a comment

Choose a reason for hiding this comment

fzhao99 commented Dec 8, 2023 •

edited

Loading

fzhao99 Dec 12, 2023 •

edited

Loading