You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
👋 My team recently started using this project, and we're definitely enjoying it so far. One UX issue we do have though, is that our workflow can't quite be expressed using the current status values (resolved and unresolved). I think that another possible state, "ignored", with the ability to filter the errors page to omit these, would solve this for us. A bit more detail on our workflow below 👇
We integrate with the telemetry to notify ourselves when a new or regressed (resolved -> unresolved) error occurs. We don't notify on subsequent instances of known errors. Zooming out, we use error-tracker to quickly alert us to new kinds of error, for triage, in conjunction with metrics-based alerting on a moving-average error ratio (e.g. "alert when >1% of HTTP responses on an endpoint have status 5XX in the last 30 minutes"). We believe balances MTTD with a workable signal:noise ratio.
As such, we tend to leave error-tracker entries related to transient, retryable errors (e.g. network flakes) open. This isn't because we don't care about them as such, but rather that a very low ratio of such errors, below some low percentage threshold, doesn't meet the urgency criteria to alert. Leaving the error-tracker entry open (unresolved) results in the notification behaviour we want (i.e. not notifying us).
I'd be interested in marking such entries as "ignored" in order to de-clutter the front page of error-tracker, which would make triaging of new entries easier.
Is this something you'd be interested in as a maintainer? I'd be happy to send a PR if so.
Cheers!
Craig
The text was updated successfully, but these errors were encountered:
👋 My team recently started using this project, and we're definitely enjoying it so far. One UX issue we do have though, is that our workflow can't quite be expressed using the current status values (resolved and unresolved). I think that another possible state, "ignored", with the ability to filter the errors page to omit these, would solve this for us. A bit more detail on our workflow below 👇
We integrate with the telemetry to notify ourselves when a new or regressed (resolved -> unresolved) error occurs. We don't notify on subsequent instances of known errors. Zooming out, we use error-tracker to quickly alert us to new kinds of error, for triage, in conjunction with metrics-based alerting on a moving-average error ratio (e.g. "alert when >1% of HTTP responses on an endpoint have status 5XX in the last 30 minutes"). We believe balances MTTD with a workable signal:noise ratio.
As such, we tend to leave error-tracker entries related to transient, retryable errors (e.g. network flakes) open. This isn't because we don't care about them as such, but rather that a very low ratio of such errors, below some low percentage threshold, doesn't meet the urgency criteria to alert. Leaving the error-tracker entry open (unresolved) results in the notification behaviour we want (i.e. not notifying us).
I'd be interested in marking such entries as "ignored" in order to de-clutter the front page of error-tracker, which would make triaging of new entries easier.
Is this something you'd be interested in as a maintainer? I'd be happy to send a PR if so.
Cheers!
Craig
The text was updated successfully, but these errors were encountered: