-
Notifications
You must be signed in to change notification settings - Fork 56
Alerting Condition Server
The alerting condition server is used to store user specifications to create evaluation rules on data held in Alerting datasources.
The alerting condition server also handles querying the state of the specifications and returning it in a human readable format.
- Architecture
- Condition CRUD
- Condition Status
- Scale and performance
- Security
- High availability
- Testing
Create, Read, Update and Delete user configurations for data evaluations that should send alerts.
The condition server accepts AlertCondition specs even if the Alerting Backend is not enabled, but does not run any evaluations from datasources until the Alerting Backend is installed.
Create, Read, Update, Delete Alerting Condition specs from user inputs. If Alerting backend is installed, create dependencies needed to evaluate conditions.
- Alerting/Alarms page
- (K,V) stores that maintain many/all revisions (like etcd) can lead to update performance issues
Determine Status based on downstream cluster dependencies, datasource dependencies & active state of the condition in the Alerting Backend.
An invalidated state means that the specification can no longer be reliably evaluated or evaluated at all.
Delegate API calls to management server, external dependencies & alerting cluster, in order to determine the state that best matches the condition.
- Alerting/Alarms admin UI page : state badge next to alarm name
- Status is evaluated on a per condition basis, but much of the information on the state queried by opni alerting is batched on all dependencies/ all active states of conditions in the Alerting Backend
- Scale and performance concerns are delegated to the Alerting Backend and Datasources
Tied to Opni Gateway High Availability.
- Alerting storage clientset covers persisting alerting conditions user configurations
- Covers CRUD Alerting Condition APIs, Silence APIs & Status APIs.
N/A
In Kubernetes cluster verify :
-
that we receive a notification when clicking
test endpoint
on a valid endpoint. This verifies that the entire alerting + gateway logic is functional. -
Install Alerting & Monitoring with at least 1 metrics agent:
- create a prometheus query alarm with a query
sum(scrape_samples_scraped) != 0
-- after a couple minutes this should switch to firing state. This verifies the entire alerting + metrics logic is functional.
- create a prometheus query alarm with a query
Architecture
- Backends
- Core Components