Skip to content

Commit

Permalink
Merge pull request #25 from nixwiz/keepalives
Browse files Browse the repository at this point in the history
Add special case for keepalives
  • Loading branch information
Todd Campbell authored Jul 27, 2020
2 parents 7eeab2b + 62aac93 commit c2f28c6
Show file tree
Hide file tree
Showing 5 changed files with 549 additions and 34 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Special case handling for keepalive checks

### Changed
- Reformatted and cleaned up README

## [0.6.2] - 2020-05-10

### Changed
Expand Down
144 changes: 113 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
## Sensu Go Fatigue Check Filter

- [Overview](#overview)
- [Files](#files)
- [Usage examples](#usage-examples)
- [Configuration](#configuration)
- [Sensu Go](#sensu-go)
Expand All @@ -16,18 +15,22 @@
- [Handler definition](#handler-definition)
- [Check definition](#check-definition)
- [Entity definition](#entity-definition)
- [Keepalives](#keepalives)
- [Annotations](#annotations)
- [Arguments](#arguments)
- [Sensu Core](#sensu-core)
- [Non-repeating alerts](#non-repeating-alerts)
- [Installation from source](#installation-from-source)
- [Additional notes](#additional-notes)
- [Contributing](#contributing)

### Overview

The Sensu Go Fatigue Check Filter is a [Sensu Event Filter][1] for managing alert fatigue.
The Sensu Go Fatigue Check Filter is a [Sensu Event Filter][1] for managing
alert fatigue.

A typical use of filters is to reduce [alert fatigue][2]. One of the most typical examples of this is create the following filter that only passes through events on their first occurrence and every hour after that.
A typical use of filters is to reduce [alert fatigue][2]. One of the most
typical examples of this is create the following filter that only passes
through events on their first occurrence and every hour after that.

```yml
---
Expand All @@ -44,13 +47,16 @@ spec:
runtime_assets: []
```
However, the use of the filter above creates some limitations. Suppose you have one check in particular that you want to change to only alert after three (3) occurrences. Typically that might mean creating another handler and filter pair to assign to that check. If you have to do this often enough and you start to have an unwieldy mass of handlers and filters.
However, the use of the filter above creates some limitations. Suppose you
have one check in particular that you want to change to only alert after three
(3) occurrences. Typically that might mean creating another handler and filter
pair to assign to that check. If you have to do this often enough and you
start to have an unwieldy mass of handlers and filters.
That's where this Fatigue Check Filter comes in. Using annotations, it makes the number of occurrences and the interval tunable on a per-check or per-entity basis. It also allows you to control whether or not resolution events are passed through.
### Files
N/A
That's where this Fatigue Check Filter comes in. Using annotations, it makes
the number of occurrences and the interval tunable on a per-check or per-entity
basis. It also allows you to control whether or not resolution events are
passed through.
## Usage examples
Expand All @@ -60,15 +66,21 @@ N/A
### Sensu Go
#### Asset registration
Assets are the best way to make use of this plugin. If you're not using an asset, please consider doing so! If you're using sensuctl 5.13 or later, you can use the following command to add the asset:
Assets are the best way to make use of this plugin. If you're not using an
asset, please consider doing so! If you're using sensuctl 5.13 or later, you
can use the following command to add the asset:
`sensuctl asset add nixwiz/sensu-go-fatigue-check-filter --rename fatigue-check-filter`

Note that the `--rename` is not necessary, but references to the runtime asset in the filter definition as in the example below would need to be updated to match.
Note that the `--rename` is not necessary, but references to the runtime asset
in the filter definition as in the example below would need to be updated to
match.

If you're using an earlier version of sensuctl, you can download the asset definition from [this project's Bonsai asset index page][7].
If you're using an earlier version of sensuctl, you can download the asset
definition from [this project's Bonsai asset index page][7].

You can create your own [asset][3] by creating a tar file containing `lib/fatigue_check.js` and creating your asset definition accordingly.
You can create your own [asset][3] by creating a tar file containing
`lib/fatigue_check.js` and creating your asset definition accordingly.

#### Asset definition

Expand Down Expand Up @@ -167,24 +179,48 @@ Via the agent.yml:
annotations:
fatigue_check/occurrences: "3"
fatigue_check/interval: "900"
fatigue_check/keepalive_occurrences: "1"
fatigue_check/keepalive_interval: "300"
fatigue_check/allow_resolution: "false"
[...]
```
#### Keepalives
Keepalives do not have check resources with annotations that can be used to tune
this filter. Using standard entity annotations would override the settings for
**all** other checks. To address this specific case, two additional tunables
exist for customizing this filter for keepalive events. These can be set as
arguments to the `fatigue_check()` function in the filter definition or as
entity annotations to override the defaults on a per entity basis.

#### Annotations
The Fatigue Check Filter makes use of four annotations within the check and/or entity metadata, with the entity annotations taking precedence.
The Fatigue Check Filter makes use of four annotations within the check and/or
entity metadata for normal checks with an additional two keepalive annotations
availalbe in the entity metadata. The entity annotations taking precedence over
check annotations. All annotations take precedence of the `fatigue_check()`
function arguments and defaults.

|Annotation|Default|Usage|
|----------|-------|-----|
|fatigue_check/occurrences|1|On which occurrence to allow the initial event to pass through|
|fatigue_check/interval|1800|In seconds, at what interval to allow subsequent events to pass through, ideally a multiple of the check interval|
|fatigue_check/occurrences|1|On which occurrence to allow the initial event to pass through for normal checks|
|fatigue_check/interval|1800|In seconds, at what interval to allow subsequent events to pass through, ideally a multiple of the check interval for normal checks|
|fatigue_check/allow_resolution|true|Determines whether or not a resolution event is passed through|
|fatigue_check/suppress_flapping|true|Determines whether or not to suppress events for checks that are marked as flapping|
|fatigue_check/keepalive_occurrences|1|On which occurrence to allow the initial event to pass through for keepalives (**entity only**)|
|fatigue_check/keepalive_interval|1800|In seconds, at what interval to allow subsequent events to pass through, ideally a multiple of the check interval for keepalives (**entity only**)|

#### Arguments
The `fatigue_check()` function can take up to three arguments, the first one is the event and is required. The optional second and
third arguments allow you to override the built-in defaults for occurrences and interval, respectively. For example, if you'd
like a version of the filter that, by default, matches on the second occurrence instead of the first you could create a filter
The `fatigue_check()` function can take up to five arguments.

```javascript
fatigue_check(event, occurrences, interval, keepalive_occurrences, keepalive_interval)
```

The first one is the event and is required. The remaining four are optional and
allow you to override the built-in defaults for occurrences, interval,
keepalive_occurrences, and keepalive_interval, respectively. For example, if
you'd like a version of the filter that, by default on non-keepalive checks,
matches on the second occurrence instead of the first you could create a filter
similar to below:

```yml
Expand All @@ -202,8 +238,10 @@ spec:
- fatigue-check-filter
```

If you'd like one that overrides the default 30 minute interval with a 10 minute one you could create one similar to below
(note that in order to specify the third argument, you have to provide the second):
If you'd like one that overrides the default 30 minute interval for
non-keepalive checks with a 10 minute one you could create one similar to
below (note that in order to specify the third argument, you have to provide
the second):

```yml
---
Expand All @@ -220,29 +258,72 @@ spec:
- fatigue-check-filter
```

### Sensu Core
If you'd like one that overrides the default occurrences for keeaplives
and alerts on the second occurrence rather than the first you could create one
similar to below (note that in order to specify the fourth argument, you have to
provide the second and third):

N/A
```yml
---
type: EventFilter
api_version: core/v2
metadata:
name: fatigue_check_two_occurrences
namespace: default
spec:
action: allow
expressions:
- fatigue_check(event, 1, 1800, 2)
runtime_assets:
- fatigue-check-filter
```

If you'd like one that overrides the default 30 minute interval for
keepalives with a 10 minute one you could create one similar to below
(note that in order to specify the fifth argument, you have to provide
the second, third, and fourth as well, even if you want the defaults):

```yml
---
type: EventFilter
api_version: core/v2
metadata:
name: fatigue_check_10m_interval
namespace: default
spec:
action: allow
expressions:
- fatigue_check(event, 1, 1800, 1, 600)
runtime_assets:
- fatigue-check-filter
```

#### Non-repeating alerts

If you need to have alerts which will not repeat, meaning the alert is only ever
sent on the first occurrence and none after (aside from the resolution, if
`allow_resolution` is true, which is the default), then you will need to set the
`interval` (or `keepalive_interval`) to zero (0) via an annotation.

## Installation from source

### Sensu Go

See the instructions above for [asset registration][6].

### Sensu Core

Install and setup plugins on [Sensu Core][5].

## Additional notes

* This filter makes use of the occurrences_watermark attribute that was buggy up until Sensu Go 5.9. Your mileage may vary on prior versions.
* This filter makes use of the occurrences_watermark attribute that was buggy
up until Sensu Go 5.9. Your mileage may vary on prior versions.

* If the interval is not a multiple of the check's interval, then the actual interval is computed by rounding up the result of dividing the interval by the check's interval. For example, an interval of 180s with a check interval of 25s would pass the event through on every 8 occurrences (200s).
* If the interval is not a multiple of the check's interval, then the actual
interval is computed by rounding up the result of dividing the interval by the
check's interval. For example, an interval of 180s with a check interval of
25s would pass the event through on every 8 occurrences (200s).

## Contributing

N/A
Please submit an [issue][8] if you have problems or suggestions.

[1]: https://docs.sensu.io/sensu-go/latest/reference/filters/
[2]: https://docs.sensu.io/sensu-go/latest/guides/reduce-alert-fatigue/
Expand All @@ -251,3 +332,4 @@ N/A
[5]: https://docs.sensu.io/sensu-core/latest/installation/installing-plugins/
[6]: #asset-registration
[7]: https://bonsai.sensu.io/assets/nixwiz/sensu-go-fatigue-check-filter
[8]: https://github.com/nixwiz/sensu-go-fatigue-check-filter/issues
25 changes: 22 additions & 3 deletions lib/fatigue_check.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
function fatigue_check(event, occurrences, interval) {
function fatigue_check(event, occurrences, interval, keepalive_occurrences, keepalive_interval) {

// my defaults
var occurrences = occurrences || 1; // only the first occurrence
var interval = interval || 1800; // and every 30 minutes thereafter
var allow_resolution = true; // allow resolution events through
var suppress_flapping = true; // suppress when flapping

// set keepalive defaults to match initially, if not provided
var keepalive_occurrences = keepalive_occurrences || occurrences;
var keepalive_interval = keepalive_interval || interval;

// Use the above variable name with the check annotations fatigue_check/
// (e.g. fatigue_check/occurrences) to override the defaults above

Expand Down Expand Up @@ -36,6 +40,7 @@ function fatigue_check(event, occurrences, interval) {
}

// Entity annotations second, to take precedence over check annotations
// keepalive overrides only exist here
try {
if (event.entity.hasOwnProperty("annotations")) {
if (event.entity.annotations.hasOwnProperty("fatigue_check/occurrences")) {
Expand All @@ -44,6 +49,12 @@ function fatigue_check(event, occurrences, interval) {
if (event.entity.annotations.hasOwnProperty("fatigue_check/interval")) {
interval = parseInt(event.entity.annotations["fatigue_check/interval"], 10);
}
if (event.entity.annotations.hasOwnProperty("fatigue_check/keepalive_occurrences")) {
keepalive_occurrences = parseInt(event.entity.annotations["fatigue_check/keepalive_occurrences"], 10);
}
if (event.entity.annotations.hasOwnProperty("fatigue_check/keepalive_interval")) {
keepalive_interval = parseInt(event.entity.annotations["fatigue_check/keepalive_interval"], 10);
}
if (event.entity.annotations.hasOwnProperty("fatigue_check/allow_resolution")) {
// anything other than explicitly false == true
allow_resolution = !(/false/i).test(event.entity.annotations["fatigue_check/allow_resolution"]);
Expand Down Expand Up @@ -78,12 +89,20 @@ function fatigue_check(event, occurrences, interval) {
} else if (event.is_resolution) { // allow_resolution must be false, don't allow
return false;
}
if (event.check.occurrences === occurrences) {
if (event.check.name == 'keepalive' && event.check.occurrences == keepalive_occurrences) {
return true;
}
if (event.check.name != 'keepalive' && event.check.occurrences === occurrences) {
return true;
}
// The Math.ceil rounds up in the event that the interval requested is not
// multiple of the check interval
if (event.check.occurrences > occurrences && (event.check.occurrences % (Math.ceil(interval / event.check.interval))) === 0) {
// Keeaplives
if (event.check.name == 'keepalive' && event.check.occurrences > keepalive_occurrences && (event.check.occurrences % (Math.ceil(keepalive_interval / event.check.interval))) === 0) {
return true;
}
// All other checks
if (event.check.name != 'keepalive' && event.check.occurrences > occurrences && (event.check.occurrences % (Math.ceil(interval / event.check.interval))) === 0) {
return true;
}

Expand Down
Loading

0 comments on commit c2f28c6

Please sign in to comment.