Skip to content

Latest commit

 

History

History
98 lines (65 loc) · 4.45 KB

nagios.md

File metadata and controls

98 lines (65 loc) · 4.45 KB

nagios

Monitor Type: nagios (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

Wrapper to run existing nagios status check scripts through SignalFx agent which will play the role of NRPE or SNMP exec.

It will run the script set in command parameter and send the state of the check depending on the exit code of the command.

It is very similar to telegraf/exec monitor configured with dataFormat: nagios but:

  • it does not retrieve perfdata metrics, only the state of the script for alerting purpose.
  • it will override the state if exit code == 0 but output string starts with warn, crit or unkn (not case sensitive).

Also the main advantage and purpose of this monitor is to add more context to this status check state thougth SignalFx events. Indeed, in addition to the state metric, it will send an event which includes the output and the error caught from the command execution.

This should make the troubleshooting more efficient and allow the user to remain in SignalFx without to have to connect to the machine in case of anormal state to understand what is happening. It will also give the ability to create a dashboard similar to what nagios user are accustomed to.

Note: the last sent event is cached into memory to avoid sending the same event for each collection interval over and over but already sent event will be send again when you restart the agent erasing its cache. If your check always "normally" produces a different output for each run like the uptime check does so you can use the FilterStdOut: true parameter to ignore it in comparison.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: nagios
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
command yes string The command to exec with any arguments like: "LC_ALL=\"en_US.utf8\" /usr/lib/nagios/plugins/check_ntp_time -H pool.ntp.typhon.net -w 0.5 -c 1"
service yes string Corresponds to the nagios service column and allows to aggregate all instances of the same service (when calling the same check script with different arguments)
timeout no integer The max execution time allowed in seconds before sending SIGKILL (default: 9)
ignoreStdOut no bool If false and change is detected on stdout compared to the last event it will send a new one (default: false)
ignoreStdErr no bool If false and change is detected on stderr compared to the last event it will send a new one (default: false)

Metrics

These are the metrics available for this monitor. All of the metrics emitted from this monitor are categorized as custom but the ones that are emitted by default from the monitor are in bold and italics in the list below.

  • nagios.state (gauge)
    Nagios status check state.

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Dimensions

The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.

Name Description
command The configured command for this monitor.
plugin The name of this monitor: nagios.
service The configured service for this monitor.