Alerting Condition Server

Summary:

The alerting condition server is used to store user specifications to create evaluation rules on data held in Alerting datasources.

The alerting condition server also handles querying the state of the specifications and returning it in a human readable format.

Architecture:

Alerting Gateway

Description

Create, Read, Update and Delete user configurations for data evaluations that should send alerts.

The condition server accepts AlertCondition specs even if the Alerting Backend is not enabled, but does not run any evaluations from datasources until the Alerting Backend is installed.

Condition CRUD

Alerting APIs Dataflow - Conditions CRUD

Responsibilities

Create, Read, Update, Delete Alerting Condition specs from user inputs. If Alerting backend is installed, create dependencies needed to evaluate conditions.

Corresponding UI element(s)

Description

Alerting/Alarms page

Screenshots

Alerting Alarms

Performance Issues

(K,V) stores that maintain many/all revisions (like etcd) can lead to update performance issues

Condition Status

Description

Determine Status based on downstream cluster dependencies, datasource dependencies & active state of the condition in the Alerting Backend.

An invalidated state means that the specification can no longer be reliably evaluated or evaluated at all.

Dataflow

Alerting APIs Dataflow - Condition Status

Responsibilities

Delegate API calls to management server, external dependencies & alerting cluster, in order to determine the state that best matches the condition.

Corresponding UI element(s)

Description

Alerting/Alarms admin UI page : state badge next to alarm name

Screenshots

Alerting Alarms State

Performance Issues

Status is evaluated on a per condition basis, but much of the information on the state queried by opni alerting is batched on all dependencies/ all active states of conditions in the Alerting Backend

Scale and performance:

Scale and performance concerns are delegated to the Alerting Backend and Datasources

High availability:

Tied to Opni Gateway High Availability.

Testing:

Testplan

Unit tests

Alerting storage clientset covers persisting alerting conditions user configurations

Integration tests

Covers CRUD Alerting Condition APIs, Silence APIs & Status APIs.

e2e tests

N/A

Manual testing

In Kubernetes cluster verify :

that we receive a notification when clicking test endpoint on a valid endpoint. This verifies that the entire alerting + gateway logic is functional.
Install Alerting & Monitoring with at least 1 metrics agent:
- create a prometheus query alarm with a query sum(scrape_samples_scraped) != 0 -- after a couple minutes this should switch to firing state. This verifies the entire alerting + metrics logic is functional.

Architecture

Backends
Core Components
- Opni Gateway
- Opni Agent

How Tos

Releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting Condition Server

Alerting Condition Server

Summary:

Table of contents

Architecture:

Description

Condition CRUD

Responsibilities

Corresponding UI element(s)

Description

Screenshots

Performance Issues

Condition Status

Description

Dataflow

Responsibilities

Corresponding UI element(s)

Description

Screenshots

Performance Issues

Scale and performance:

High availability:

Testing:

Testplan

Unit tests

Integration tests

e2e tests

Manual testing

Clone this wiki locally