-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add post prod deploy smoke test and alert #7057
Conversation
@@ -78,6 +78,7 @@ management: | |||
endpoint.health.probes.enabled: true | |||
endpoint.info.enabled: true | |||
endpoints.web.exposure.include: health, info | |||
endpoint.health.show-components: always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need this tweak since the new actuator endpoint is a component of the health
actuator. Without it, the API endpoint (very frustratingly) doesn't load
https://www.baeldung.com/spring-boot-health-indicators#customhealthindicators
try { | ||
_repo.findAll(); | ||
return Health.up().build(); | ||
} catch (JDBCConnectionException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure if there was a better error to throw for "database didn't connect". If folks have suggestions happy to change this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget if I mentioned this or not while we were pairing, but my thought was to grab the most recent entry from the databasechangelog
table and report something from that table (maybe the md5checksum?) back to the frontend that might be useful to know. Ideally some value that is useful but also definitely not sensitive information.
cc @alismx for the idea of having the checksum be visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change the db call to the checksum table, but if we want to pass that info back to the frontend, we'll need to configure the show-details
flag on the health check config to always
since we'd be hitting it unauthed. Those details include info about the filepath of the application, state of DB, etc that someone would theoretically be able to see if they found that endpoint.
Don't think there's anything too sensitive here, but since passing data back to the frontend was more of a nice to have, I decided to lean against flipping the flag. If we feel strongly that the checksum is worth having on the frontend / we get a security consult that the extra info isn't an issue, happy to make the relevant changes.
Let me know what you think!
9a7e3f8
to
e754b8d
Compare
...rc/main/java/gov/cdc/usds/simplereport/api/heathcheck/BackendAndDatabaseHealthIndicator.java
Outdated
Show resolved
Hide resolved
03a42d6
to
67e9a1c
Compare
3d2a0a0
to
7f0bd63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM code-wise! Thank you for all your work on this and adding test coverage as well! :D
Quality Gate failedFailed conditions 41.4% Coverage on New Code (required ≥ 80%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding the tests!
This reverts commit 00a691b. Co-authored-by: elisa lee <[email protected]>
DEVOPS PULL REQUEST
Related Issue
Changes Proposed
Additional Information
Setting up the Pagerduty integration for this was a bit complicated from the ops / infra side, so we opted to use a Slack alert instead in the near term. Followup ticket to swap this out for an eventual Pagerduty alert is here
Testing
The alert is set up in this branch to be triggered post-prod deploy, but workflows can't be triggered by a deploy until the branch defining them gets into main. I've put up this branch
with a push trigger / hard coded env var for dev6 to prove that the script invocation / failure state works for a hard-coded environment variable.
There's not a great way for us to test the "prod deploy" trigger part until this branch gets in, so after merging, I'll make sure to keep an eye on the next prod deploy to make sure everything's working.