docker-container-stats

Monitor Type: docker-container-stats (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

This monitor reads container stats from a Docker API server. It is meant as a metric-compatible replacement of our docker-collectd plugin, which scales rather poorly against a large number of containers.

This currently does not support CPU share/quota metrics.

For more information on block IO metrics, see the Linux cgroup block io controller doc.

If you are running the agent directly on a host (outside of a container itself) and you are using the default Docker UNIX socket URL, you will probably need to add the signalfx-agent user to the docker group in order to have permission to access the Docker API via the socket.

Requires Docker API version 1.22+.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: docker-container-stats
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option	Required	Type	Description
`enableExtraBlockIOMetrics`	no	`bool`	Whether it will send all extra block IO metrics as well. (default: `false`)
`enableExtraCPUMetrics`	no	`bool`	Whether it will send all extra CPU metrics as well. (default: `false`)
`enableExtraMemoryMetrics`	no	`bool`	Whether it will send all extra memory metrics as well. (default: `false`)
`enableExtraNetworkMetrics`	no	`bool`	Whether it will send all extra network metrics as well. (default: `false`)
`dockerURL`	no	`string`	The URL of the docker server (default: `unix:///var/run/docker.sock`)
`timeoutSeconds`	no	`integer`	The maximum amount of time to wait for docker API requests (default: `5`)
`cacheSyncInterval`	no	`int64`	The time to wait before resyncing the list of containers the monitor maintains through the docker event listener example: cacheSyncInterval: "20m" (default: `60m`)
`labelsToDimensions`	no	`map of strings`	A mapping of container label names to dimension names. The corresponding label values will become the dimension value for the mapped name. E.g. `io.kubernetes.container.name: container_spec_name` would result in a dimension called `container_spec_name` that has the value of the `io.kubernetes.container.name` container label.
`envToDimensions`	no	`map of strings`	A mapping of container environment variable names to dimension names. The corresponding env var values become the dimension values on the emitted metrics. E.g. `APP_VERSION: version` would result in datapoints having a dimension called `version` whose value is the value of the `APP_VERSION` envvar configured for that particular container, if present.
`excludedImages`	no	`list of strings`	A list of filters of images to exclude. Supports literals, globs, and regex.

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

Group blkio

All of the following metrics are part of the blkio metric group. All of the non-default metrics below can be turned on by adding blkio to the monitor config option extraGroups:

blkio.io_merged_recursive.async (cumulative)
blkio.io_merged_recursive.read (cumulative)
blkio.io_merged_recursive.sync (cumulative)
blkio.io_merged_recursive.total (cumulative)
blkio.io_merged_recursive.write (cumulative)
blkio.io_queue_recursive.async (cumulative)
blkio.io_queue_recursive.read (cumulative)
blkio.io_queue_recursive.sync (cumulative)
blkio.io_queue_recursive.total (cumulative)
blkio.io_queue_recursive.write (cumulative)
blkio.io_service_bytes_recursive.async (cumulative)
Volume, in bytes, of asynchronous block I/O
blkio.io_service_bytes_recursive.read (cumulative)
Volume, in bytes, of reads from block devices
blkio.io_service_bytes_recursive.sync (cumulative)
Volume, in bytes, of synchronous block I/O
blkio.io_service_bytes_recursive.total (cumulative)
Total volume, in bytes, of all block I/O
blkio.io_service_bytes_recursive.write (cumulative)
Volume, in bytes, of writes to block devices
blkio.io_service_time_recursive.async (cumulative)
blkio.io_service_time_recursive.read (cumulative)
blkio.io_service_time_recursive.sync (cumulative)
blkio.io_service_time_recursive.total (cumulative)
blkio.io_service_time_recursive.write (cumulative)
blkio.io_serviced_recursive.async (cumulative)
Number of asynchronous block I/O requests
blkio.io_serviced_recursive.read (cumulative)
Number of reads requests from block devices
blkio.io_serviced_recursive.sync (cumulative)
Number of synchronous block I/O requests
blkio.io_serviced_recursive.total (cumulative)
Total number of block I/O requests
blkio.io_serviced_recursive.write (cumulative)
Number of write requests to block devices
blkio.io_time_recursive.async (cumulative)
blkio.io_time_recursive.read (cumulative)
blkio.io_time_recursive.sync (cumulative)
blkio.io_time_recursive.total (cumulative)
blkio.io_time_recursive.write (cumulative)
blkio.io_wait_time_recursive.async (cumulative)
blkio.io_wait_time_recursive.read (cumulative)
blkio.io_wait_time_recursive.sync (cumulative)
blkio.io_wait_time_recursive.total (cumulative)
blkio.io_wait_time_recursive.write (cumulative)

Group cpu

All of the following metrics are part of the cpu metric group. All of the non-default metrics below can be turned on by adding cpu to the monitor config option extraGroups:

cpu.percent (gauge)
Percentage of host CPU resources used by the container
cpu.percpu.usage (cumulative)
Jiffies of CPU time spent by the container, per CPU core
cpu.throttling_data.periods (cumulative)
Number of periods
cpu.throttling_data.throttled_periods (cumulative)
Number of periods throttled
cpu.throttling_data.throttled_time (cumulative)
Throttling time in nano seconds
cpu.usage.kernelmode (cumulative)
Jiffies of CPU time spent in kernel mode by the container
cpu.usage.system (cumulative)
Jiffies of CPU time used by the system
cpu.usage.total (cumulative)
Jiffies of CPU time used by the container
cpu.usage.usermode (cumulative)
Jiffies of CPU time spent in user mode by the container

Group memory

All of the following metrics are part of the memory metric group. All of the non-default metrics below can be turned on by adding memory to the monitor config option extraGroups:

memory.percent (gauge)
Percent of memory (0-100) used by the container relative to its limit (excludes page cache usage)
memory.stats.active_anon (gauge)
Amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.
memory.stats.active_file (gauge)
Amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs
memory.stats.cache (gauge)
The amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.
memory.stats.dirty (gauge)
The amount of memory waiting to get written to disk
memory.stats.hierarchical_memory_limit (gauge)
The memory limit in place by the hierarchy cgroup
memory.stats.hierarchical_memsw_limit (gauge)
The memory+swap limit in place by the hierarchy cgroup
memory.stats.inactive_anon (gauge)
Amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.
memory.stats.inactive_file (gauge)
Amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs
memory.stats.mapped_file (gauge)
Indicates the amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.
memory.stats.pgfault (cumulative)
Number of times that a process of the cgroup triggered a page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.
memory.stats.pgmajfault (cumulative)
Number of times that a process of the cgroup triggered a major page fault. Page faults occur when a process accesses part of its virtual memory space which is nonexistent or protected. See https://docs.docker.com/config/containers/runmetrics for more info.
memory.stats.pgpgin (cumulative)
Number of charging events to the memory cgroup. Charging events happen each time a page is accounted as either mapped anon page(RSS) or cache page to the cgroup.
memory.stats.pgpgout (cumulative)
Number of uncharging events to the memory cgroup. Uncharging events happen each time a page is unaccounted from the cgroup.
memory.stats.rss (gauge)
The amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
memory.stats.rss_huge (gauge)
Amount of memory due to anonymous transparent hugepages.
memory.stats.shmem (gauge)
Amount of Shared Memory used by the container, in bytes.
memory.stats.swap (gauge)
Bytes of swap memory used by container
memory.stats.total_active_anon (gauge)
Total amount of memory that has been identified as active by the kernel. Anonymous memory is memory that is not linked to disk pages.
memory.stats.total_active_file (gauge)
Total amount of active file cache memory. Cache memory = active_file + inactive_file + tmpfs
memory.stats.total_cache (gauge)
Total amount of memory used by the processes of this control group that can be associated with a block on a block device. Also accounts for memory used by tmpfs.
memory.stats.total_dirty (gauge)
Total amount of memory waiting to get written to disk
memory.stats.total_inactive_anon (gauge)
Total amount of memory that has been identified as inactive by the kernel. Anonymous memory is memory that is not linked to disk pages.
memory.stats.total_inactive_file (gauge)
Total amount of inactive file cache memory. Cache memory = active_file + inactive_file + tmpfs
memory.stats.total_mapped_file (gauge)
Total amount of memory mapped by the processes in the control group. It doesn’t give you information about how much memory is used; it rather tells you how it is used.
memory.stats.total_pgfault (cumulative)
Total number of page faults
memory.stats.total_pgmajfault (cumulative)
Total number of major page faults
memory.stats.total_pgpgin (cumulative)
Total number of charging events
memory.stats.total_pgpgout (cumulative)
Total number of uncharging events
memory.stats.total_rss (gauge)
Total amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps.
memory.stats.total_rss_huge (gauge)
Total amount of memory due to anonymous transparent hugepages.
memory.stats.total_shmem (gauge)
Available amount of Shared Memory used by the container, in bytes.
memory.stats.total_swap (gauge)
Total amount of swap memory available to this container
memory.stats.total_unevictable (gauge)
Total amount of memory that can not be reclaimed
memory.stats.total_writeback (gauge)
Total amount of memory from file/anon cache that are queued for syncing to the disk
memory.stats.unevictable (gauge)
The amount of memory that cannot be reclaimed.
memory.stats.writeback (gauge)
The amount of memory from file/anon cache that are queued for syncing to the disk
memory.usage.limit (gauge)
Memory usage limit of the container, in bytes
memory.usage.max (gauge)
Maximum measured memory usage of the container, in bytes
memory.usage.total (gauge)
Bytes of memory used by the container. Note that this excludes the buffer cache accounted to the process by the kernel from files that have been read by processes in the container, as well as tmpfs usage. If you want to count that when monitoring containers, enable the metric memory.stats.total_cache and add it to this metric in SignalFlow.

Group network

All of the following metrics are part of the network metric group. All of the non-default metrics below can be turned on by adding network to the monitor config option extraGroups:

network.usage.rx_bytes (cumulative)
Bytes received by the container via its network interface
network.usage.rx_dropped (cumulative)
Number of inbound network packets dropped by the container
network.usage.rx_errors (cumulative)
Errors receiving network packets
network.usage.rx_packets (cumulative)
Network packets received by the container via its network interface
network.usage.tx_bytes (cumulative)
Bytes sent by the container via its network interface
network.usage.tx_dropped (cumulative)
Number of outbound network packets dropped by the container
network.usage.tx_errors (cumulative)
Errors sending network packets
network.usage.tx_packets (cumulative)
Network packets sent by the container via its network interface

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker-container-stats.md

docker-container-stats.md

docker-container-stats

Overview

Configuration

Metrics

Group blkio

Group cpu

Group memory

Group network

Non-default metrics (version 4.7.0+)

Files

docker-container-stats.md

Latest commit

History

docker-container-stats.md

File metadata and controls

docker-container-stats

Overview

Configuration

Metrics

Group blkio

Group cpu

Group memory

Group network

Non-default metrics (version 4.7.0+)