collectd/cassandra

Monitor Type: collectd/cassandra (Source)

Accepts Endpoints: Yes

Multiple Instances Allowed: Yes

Overview

Monitors Cassandra using the Collectd GenericJMX plugin. This is essentially a wrapper around the collectd-genericjmx monitor that comes with a set of predefined MBean definitions that a standard Cassandra deployment will expose.

Use this integration to monitor the following types of information from Cassandra nodes:

read/write/range-slice requests
read/write/range-slice errors (timeouts and unavailable)
read/write/range-slice latency (median, 99th percentile, maximum)
compaction activity
hint activity

Supports Cassandra 2.0.10+.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: collectd/cassandra
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option	Required	Type	Description
`host`	yes	`string`	Host to connect to -- JMX must be configured for remote access and accessible from the agent
`port`	yes	`integer`	JMX connection port (NOT the RMI port) on the application. This correponds to the `com.sun.management.jmxremote.port` Java property that should be set on the JVM when running the application.
`name`	no	`string`
`serviceName`	no	`string`	This is how the service type is identified in the SignalFx UI so that you can get built-in content for it. For custom JMX integrations, it can be set to whatever you like and metrics will get the special property `sf_hostHasService` set to this value.
`serviceURL`	no	`string`	The JMX connection string. This is rendered as a Go template and has access to the other values in this config. NOTE: under normal circumstances it is not advised to set this string directly - setting the host and port as specified above is preferred. (default: `service:jmx:rmi:///jndi/rmi://{{.Host}}:{{.Port}}/jmxrmi`)
`instancePrefix`	no	`string`	Prefixes the generated plugin instance with prefix. If a second `instancePrefix` is specified in a referenced MBean block, the prefix specified in the Connection block will appear at the beginning of the plugin instance, and the prefix specified in the MBean block will be appended to it
`username`	no	`string`	Username to authenticate to the server
`password`	no	`string`	User password to authenticate to the server
`customDimensions`	no	`map of strings`	Takes in key-values pairs of custom dimensions at the connection level.
`mBeansToCollect`	no	`list of strings`	A list of the MBeans defined in `mBeanDefinitions` to actually collect. If not provided, then all defined MBeans will be collected.
`mBeansToOmit`	no	`list of strings`	A list of the MBeans to omit. This will come handy in cases where only a few MBeans need to omitted from the default list
`mBeanDefinitions`	no	`map of objects (see below)`	Specifies how to map JMX MBean values to metrics. If using a specific service monitor such as cassandra, kafka, or activemq, they come pre-loaded with a set of mappings, and any that you add in this option will be merged with those. See collectd GenericJMX for more details.

The nested mBeanDefinitions config object has the following fields:

Config option	Required	Type	Description
`objectName`	no	`string`	Sets the pattern which is used to retrieve MBeans from the MBeanServer. If more than one MBean is returned you should use the `instanceFrom` option to make the identifiers unique
`instancePrefix`	no	`string`	Prefixes the generated plugin instance with prefix
`instanceFrom`	no	`list of strings`	The object names used by JMX to identify MBeans include so called "properties" which are basically key-value-pairs. If the given object name is not unique and multiple MBeans are returned, the values of those properties usually differ. You can use this option to build the plugin instance from the appropriate property values. This option is optional and may be repeated to generate the plugin instance from multiple property values
`values`	no	`list of objects (see below)`	The `value` blocks map one or more attributes of an MBean to a value list in collectd. There must be at least one `value` block within each MBean block
`dimensions`	no	`list of strings`

The nested values config object has the following fields:

Config option	Required	Type	Description
`type`	no	`string`	Sets the data set used within collectd to handle the values of the MBean attribute
`table`	no	`bool`	Set this to true if the returned attribute is a composite type. If set to true, the keys within the composite type is appended to the type instance. (default: `false`)
`instancePrefix`	no	`string`	Works like the option of the same name directly beneath the MBean block, but sets the type instance instead
`instanceFrom`	no	`list of strings`	Works like the option of the same name directly beneath the MBean block, but sets the type instance instead
`attribute`	no	`string`	Sets the name of the attribute from which to read the value. You can access the keys of composite types by using a dot to concatenate the key name to the attribute name. For example: “attrib0.key42”. If `table` is set to true, path must point to a composite type, otherwise it must point to a numeric type.
`attributes`	no	`list of strings`	The plural form of the `attribute` config above. Used to derive multiple metrics from a single MBean.

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

This monitor will also emit by default any metrics that are not listed below.

counter.cassandra.ClientRequest.CASRead.Latency.Count (cumulative)
Count of transactional read operations since server start.
counter.cassandra.ClientRequest.CASRead.TotalLatency.Count (cumulative)
The total number of microseconds elapsed in servicing client transactional read requests.

It can be devided by counter.cassandra.ClientRequest.CASRead.Latency.Count to find the real time transactional read latency.
counter.cassandra.ClientRequest.CASWrite.Latency.Count (cumulative)
Count of transactional write operations since server start.
counter.cassandra.ClientRequest.CASWrite.TotalLatency.Count (cumulative)
The total number of microseconds elapsed in servicing client transactional write requests.

It can be devided by counter.cassandra.ClientRequest.CASWrite.Latency.Count to find the real time transactional write latency.
counter.cassandra.ClientRequest.RangeSlice.Latency.Count (cumulative)
Count of range slice operations since server start. This typically indicates a server overload condition.

If this value is increasing across the cluster then the cluster is too small for the application range slice load.

If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.RangeSlice.Timeouts.Count (cumulative)
Count of range slice timeouts since server start. This typically indicates a server overload condition.

If this value is increasing across the cluster then the cluster is too small for the application range slice load.

If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.RangeSlice.TotalLatency.Count (cumulative)
The total number of microseconds elapsed in servicing range slice requests.
counter.cassandra.ClientRequest.RangeSlice.Unavailables.Count (cumulative)
Count of range slice unavailables since server start. A non-zero value means that insufficient replicas were available to fulfil a range slice request at the requested consistency level.

This typically means that one or more nodes are down. To fix this condition, any down nodes must be restarted, or removed from the cluster.
counter.cassandra.ClientRequest.Read.Latency.Count (cumulative)
Count of read operations since server start.
counter.cassandra.ClientRequest.Read.Timeouts.Count (cumulative)
Count of read timeouts since server start. This typically indicates a server overload condition.

If this value is increasing across the cluster then the cluster is too small for the application read load.

If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.Read.TotalLatency.Count (cumulative)
The total number of microseconds elapsed in servicing client read requests.

It can be devided by counter.cassandra.ClientRequest.Read.Latency.Count to find the real time read latency.
counter.cassandra.ClientRequest.Read.Unavailables.Count (cumulative)
Count of read unavailables since server start. A non-zero value means that insufficient replicas were available to fulfil a read request at the requested consistency level. This typically means that one or more nodes are down. To fix this condition, any down nodes must be restarted, or removed from the cluster.
counter.cassandra.ClientRequest.Write.Latency.Count (cumulative)
Count of write operations since server start.
counter.cassandra.ClientRequest.Write.Timeouts.Count (cumulative)
Count of write timeouts since server start. This typically indicates a server overload condition.

If this value is increasing across the cluster then the cluster is too small for the application write load.

If this value is increasing for a single server in a cluster, then one of the following conditions may be true:
- one or more clients are directing more load to this server than the others
- the server is experiencing hardware or software issues and may require maintenance.
counter.cassandra.ClientRequest.Write.TotalLatency.Count (cumulative)
The total number of microseconds elapsed in servicing client write requests.

It can be devided by counter.cassandra.ClientRequest.Write.Latency.Count to find the real time write latency.
counter.cassandra.ClientRequest.Write.Unavailables.Count (cumulative)
Count of write unavailables since server start. A non-zero value means that insufficient replicas were available to fulfil a write request at the requested consistency level.

This typically means that one or more nodes are down. To fix this condition, any down nodes must be restarted, or removed from the cluster.
counter.cassandra.Compaction.TotalCompactionsCompleted.Count (cumulative)
Number of compaction operations since node start. If this value does not increase steadily over time then the node may be experiencing problems completing compaction operations.
counter.cassandra.Storage.Exceptions.Count (cumulative)
Number of internal exceptions caught. Under normal exceptions this should be zero.
counter.cassandra.Storage.Load.Count (cumulative)
Storage used for Cassandra data in bytes. Use this metric to see how much storage is being used for data by a Cassandra node.

The value of this metric is influenced by:
- Total data stored into the database
- compaction behavior
counter.cassandra.Storage.TotalHints.Count (cumulative)
Total hints since node start. Indicates that write operations cannot be delivered to a node, usually because a node is down. If this value is increasing and all nodes are up then there may be some connectivity issue between nodes in the cluster.
counter.cassandra.Storage.TotalHintsInProgress.Count (cumulative)
Total pending hints. Indicates that write operations cannot be delivered to a node, usually because a node is down. If this value is increasing and all nodes are up then there may be some connectivity issue between nodes in the cluster.
gauge.cassandra.ClientRequest.CASRead.Latency.50thPercentile (gauge)
50th percentile (median) of Cassandra transactional read latency.
gauge.cassandra.ClientRequest.CASRead.Latency.99thPercentile (gauge)
99th percentile of Cassandra transactional read latency.
gauge.cassandra.ClientRequest.CASRead.Latency.Max (gauge)
Maximum Cassandra transactional read latency.
gauge.cassandra.ClientRequest.CASWrite.Latency.50thPercentile (gauge)
50th percentile (median) of Cassandra transactional write latency.
gauge.cassandra.ClientRequest.CASWrite.Latency.99thPercentile (gauge)
99th percentile of Cassandra transactional write latency.
gauge.cassandra.ClientRequest.CASWrite.Latency.Max (gauge)
Maximum Cassandra transactional write latency.
gauge.cassandra.ClientRequest.RangeSlice.Latency.50thPercentile (gauge)
50th percentile (median) of Cassandra range slice latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.RangeSlice.Latency.99thPercentile (gauge)
99th percentile of Cassandra range slice latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.RangeSlice.Latency.Max (gauge)
Maximum Cassandra range slice latency.
gauge.cassandra.ClientRequest.Read.Latency.50thPercentile (gauge)
50th percentile (median) of Cassandra read latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Read.Latency.99thPercentile (gauge)
99th percentile of Cassandra read latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Read.Latency.Max (gauge)
Maximum Cassandra read latency.
gauge.cassandra.ClientRequest.Write.Latency.50thPercentile (gauge)
50th percentile (median) of Cassandra write latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Write.Latency.99thPercentile (gauge)
99th percentile of Cassandra write latency. This value should be similar across all nodes in the cluster. If some nodes have higher values than the rest of the cluster then they may have more connected clients or may be experiencing heavier than usual compaction load.
gauge.cassandra.ClientRequest.Write.Latency.Max (gauge)
Maximum Cassandra write latency
gauge.cassandra.Compaction.PendingTasks.Value (gauge)
Number of compaction operations waiting to run. If this value is continually increasing then the node may be experiencing problems completing compaction operations.

Group jvm

All of the following metrics are part of the jvm metric group. All of the non-default metrics below can be turned on by adding jvm to the monitor config option extraGroups:

gauge.jvm.threads.count (gauge)
Number of JVM threads
gauge.loaded_classes (gauge)
Number of classes loaded in the JVM
invocations (cumulative)
Total number of garbage collection events
jmx_memory.committed (gauge)
Amount of memory guaranteed to be available in bytes
jmx_memory.init (gauge)
Amount of initial memory at startup in bytes
jmx_memory.max (gauge)
Maximum amount of memory that can be used in bytes
jmx_memory.used (gauge)
Current memory usage in bytes
total_time_in_ms.collection_time (cumulative)
Amount of time spent garbage collecting in milliseconds

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collectd-cassandra.md

collectd-cassandra.md

collectd/cassandra

Overview

Configuration

Metrics

Group jvm

Non-default metrics (version 4.7.0+)

Files

collectd-cassandra.md

Latest commit

History

collectd-cassandra.md

File metadata and controls

collectd/cassandra

Overview

Configuration

Metrics

Group jvm

Non-default metrics (version 4.7.0+)