Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve incident management in Instatus check script #346

Merged
merged 3 commits into from
Dec 5, 2024

Conversation

matifali
Copy link
Member

  • Change incident naming for better clarity.
  • Ensure new incidents are not created if unresolved ones exist.

- Change incident naming for better clarity.
- Ensure new incidents are not created if unresolved ones exist.
@matifali matifali self-assigned this Nov 22, 2024
@matifali matifali requested a review from bcpeinhardt November 22, 2024 11:25
@matifali matifali enabled auto-merge (squash) November 27, 2024 06:47
@matifali matifali disabled auto-merge November 27, 2024 06:47
@matifali matifali requested a review from Parkreiner November 27, 2024 06:48
Copy link
Contributor

@Parkreiner Parkreiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bash seemed straightforward. The only thing I'm wondering about (and this might be because I don't know InStatus yet, and haven't dealt with the old version of the registry much) is whether there would be situations when we would want multiple incidents?

As in, if a service is experiencing multiple problems, wouldn't we want reports on each category of problem we're experiencing? If I'm understanding the current code right, if one problem gets reported, that also silences any future problems, even if they're not directly related

I'm definitely out of my element here, though. If you think that's not a problem, I can go ahead and approve

@matifali
Copy link
Member Author

matifali commented Dec 3, 2024

As in, if a service is experiencing multiple problems, wouldn't we want reports on each category of problem we're experiencing? If I'm understanding the current code right, if one problem gets reported, that also silences any future problems, even if they're not directly related

Yes, you are right. Currently, the flow is to create a new incident when the registry is down every 15 minutes. I am here trying to prevent that and update an existing one. But yes there is a chance that subsequent events could be unrelated.

InStatus has a very flexible API that we can explore to improve this flow and should probably move into the rgsitry repo.

@matifali matifali merged commit cbd06b1 into main Dec 5, 2024
2 checks passed
@matifali matifali deleted the atif/registry-status-imrprovements branch December 5, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants