-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure alert for when frontend not getting response from backend #6888
Comments
run the query in app insights prod and see what number we get back |
The following query below (in the issue description) assumed that the url would be improperly set to
However, I had changed that query to be more permissive for the following scenarios:
However, I did not re-check this more permissive query in prod and it looks like the feature-flags endpoint fails pretty frequently. See this search of our logs We will have to revisit what query we can reliably create an alert from. 😓 |
haha this would work: https://github.com/CDCgov/prime-simplereport/pull/7057/files 🤩 |
Closing in favor of the more robust solution #7057 |
so much for our shorter term fix while we work on #7019 lol |
may not be necessary if we can create a probe that will alert us via PagerDuty, waiting on outcome of #6890Background
During our recent production outage, our frontend could not connect to our backend due to a misconfigured environment variable. We received no alerts about this. There is an action item to create a test that acts as more of a health check after a prod deploy and alerts us if it fails regardless of whether someone is using the app. This ticket is for a shorter-term fix.
Action requested
Create an Azure alert based on logs that will trigger a page when the frontend is not able to connect to the backend while users are using the app.
Potential query for alerts:
Count of failed API retrievals
^ check to see if results are cached, we should see successful responses every 1-2 min
Acceptance criteria
Additional notes
suggestion for testing: set
REACT_APP_BASE_URL
to an incorrect URL in a lower, attempt to access the app, and verify the alert is triggeredincident should be created in PagerDuty (but it will be low-urgency since it's in a lower env)
written in Terraform - use existing alerts we've written for context
The text was updated successfully, but these errors were encountered: