kubernetes-cluster

Monitor Type: kubernetes-cluster (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

If you are using OpenShift there is an openshift-cluster monitor to be used instead of this monitor that contains additional OpenShift metrics.

Collects cluster-level metrics from the Kubernetes API server. It uses the watch functionality of the K8s API to listen for updates about the cluster and maintains a cache of metrics that get sent on a regular interval.

Since the agent is generally running in multiple places in a K8s cluster and since it is generally more convenient to share the same configuration across all agent instances, this monitor by default makes use of a leader election process to ensure that it is the only agent sending metrics in a cluster. All of the agents running in the same namespace that have this monitor configured will decide amongst themselves which should send metrics for this monitor, and the rest will stand by ready to activate if the leader agent dies. You can override leader election by setting the config option alwaysClusterReporter to true, which will make the monitor always report metrics.

This monitor is similar to kube-state-metrics, and sends many of the same metrics, but in a way that is less verbose and better fitted for the SignalFx backend.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: kubernetes-cluster
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option	Required	Type	Description
`alwaysClusterReporter`	no	`bool`	If `true`, leader election is skipped and metrics are always reported. (default: `false`)
`namespace`	no	`string`	If specified, only resources within the given namespace will be monitored. If omitted (blank) all supported resources across all namespaces will be monitored.
`kubernetesAPI`	no	`object (see below)`	Config for the K8s API client
`nodeConditionTypesToReport`	no	`list of strings`	A list of node status condition types to report as metrics. The metrics will be reported as datapoints of the form `kubernetes.node_<type_snake_cased>` with a value of `0` corresponding to "False", `1` to "True", and `-1` to "Unknown". (default: `[Ready]`)

The nested kubernetesAPI config object has the following fields:

Config option	Required	Type	Description
`authType`	no	`string`	How to authenticate to the K8s API server. This can be one of `none` (for no auth), `tls` (to use manually specified TLS client certs, not recommended), `serviceAccount` (to use the standard service account token provided to the agent pod), or `kubeConfig` to use credentials from `~/.kube/config`. (default: `serviceAccount`)
`skipVerify`	no	`bool`	Whether to skip verifying the TLS cert from the API server. Almost never needed. (default: `false`)
`clientCertPath`	no	`string`	The path to the TLS client cert on the pod's filesystem, if using `tls` auth.
`clientKeyPath`	no	`string`	The path to the TLS client key on the pod's filesystem, if using `tls` auth.
`caCertPath`	no	`string`	Path to a CA certificate to use when verifying the API server's TLS cert. Generally this is provided by K8s alongside the service account token, which will be picked up automatically, so this should rarely be necessary to specify.

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

This monitor will also emit by default any metrics that are not listed below.

kubernetes.container_cpu_limit (gauge)
Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
kubernetes.container_cpu_request (gauge)
CPU requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
kubernetes.container_ephemeral_storage_limit (gauge)
Maximum ephemeral storage set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details.
kubernetes.container_ephemeral_storage_request (gauge)
Ephemeral storage requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details
kubernetes.container_memory_limit (gauge)
Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
kubernetes.container_memory_request (gauge)
Memory requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
kubernetes.container_ready (gauge)
Whether a container has passed its readiness probe (0 for no, 1 for yes)
kubernetes.container_restart_count (gauge)
How many times the container has restarted in the recent past. This value is pulled directly from the K8s API and the value can go indefinitely high and be reset to 0 at any time depending on how your kubelet is configured to prune dead containers. It is best to not depend too much on the exact value but rather look at it as either == 0, in which case you can conclude there were no restarts in the recent past, or > 0, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.
kubernetes.cronjob.active (gauge)
The number of actively running jobs for a cronjob.
kubernetes.daemon_set.current_scheduled (gauge)
The number of nodes that are running at least 1 daemon pod and are supposed to run the daemon pod
kubernetes.daemon_set.desired_scheduled (gauge)
The total number of nodes that should be running the daemon pod (including nodes currently running the daemon pod)
kubernetes.daemon_set.misscheduled (gauge)
The number of nodes that are running the daemon pod, but are not supposed to run the daemon pod
kubernetes.daemon_set.ready (gauge)
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready
kubernetes.daemon_set.updated (gauge)
The total number of nodes that are running updated daemon pod
kubernetes.deployment.available (gauge)
Total number of available pods (ready for at least minReadySeconds) targeted by this deployment.
kubernetes.deployment.desired (gauge)
Number of desired pods in this deployment
kubernetes.deployment.updated (gauge)
Total number of non-terminated pods targeted by this deployment that have the desired template spec
kubernetes.job.active (gauge)
The number of actively running pods for a job.
kubernetes.job.completions (gauge)
The desired number of successfully finished pods the job should be run with.
kubernetes.job.failed (cumulative)
The number of pods which reased phase Failed for a job.
kubernetes.job.parallelism (gauge)
The max desired number of pods the job should run at any given time.
kubernetes.job.succeeded (cumulative)
The number of pods which reached phase Succeeded for a job.
kubernetes.namespace_phase (gauge)
The current phase of namespaces (1 for active and 0 for terminating)
kubernetes.node_allocatable_cpu (gauge)
How many CPU cores remaining that the node can allocate to pods
kubernetes.node_allocatable_ephemeral_storage (gauge)
How many bytes of ephemeral storage remaining that the node can allocate to pods
kubernetes.node_allocatable_memory (gauge)
How many bytes of RAM memory remaining that the node can allocate to pods
kubernetes.node_allocatable_storage (gauge)
How many bytes of storage remaining that the node can allocate to pods
kubernetes.node_ready (gauge)
Whether this node is ready (1), not ready (0) or in an unknown state (-1)
kubernetes.pod_phase (gauge)
Current phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown)
kubernetes.replica_set.available (gauge)
Total number of available pods (ready for at least minReadySeconds) targeted by this replica set
kubernetes.replica_set.desired (gauge)
Number of desired pods in this replica set
kubernetes.replication_controller.available (gauge)
Total number of available pods (ready for at least minReadySeconds) targeted by this replication controller.
kubernetes.replication_controller.desired (gauge)
Number of desired pods (the spec.replicas field)
kubernetes.resource_quota_hard (gauge)
The upper limit for a particular resource in a specific namespace. Will only be sent if a quota is specified. CPU requests/limits will be sent as millicores.
kubernetes.resource_quota_used (gauge)
The usage for a particular resource in a specific namespace. Will only be sent if a quota is specified. CPU requests/limits will be sent as millicores.
kubernetes.stateful_set.current (gauge)
The number of pods created by the StatefulSet controller from the StatefulSet version indicated by current_revision property on the kubernetes_uid dimension for this StatefulSet.
kubernetes.stateful_set.desired (gauge)
Number of desired pods in the stateful set (the spec.replicas field)
kubernetes.stateful_set.ready (gauge)
Number of pods created by the stateful set that have the Ready condition
kubernetes.stateful_set.updated (gauge)
The number of pods created by the StatefulSet controller from the StatefulSet version indicated by the update_revision property on the kubernetes_uid dimension for this StatefulSet.

Group hpa

All of the following metrics are part of the hpa metric group. All of the non-default metrics below can be turned on by adding hpa to the monitor config option extraGroups:

kubernetes.hpa.spec.max_replicas (gauge)
The upper limit for the number of replicas to which the autoscaler can scale up. It cannot be less that minReplicas.
kubernetes.hpa.spec.min_replicas (gauge)
The lower limit for the number of replicas to which the autoscaler can scale down. It defaults to 1 pod.
kubernetes.hpa.status.condition.able_to_scale (gauge)
A status value that indicates the autoscaler status in reference to the AbleToScale condition. A value of 1 means that the autoscaler is in the AbleToScale condition, a 0 value means that it is not, and -1 means that the status of the AbleToScale condition is unknown. AbleToScale indicates a lack of transient issues which prevent scaling from occurring, such as being in a backoff window, or being unable to access/update the target scale.
kubernetes.hpa.status.condition.scaling_active (gauge)
A status value that indicates the autoscaler status in reference to the ScalingActive condition. A value of 1 means that the autoscaler is in the ScalingActive condition, a 0 value means that it is not, and -1 means that the status of the ScalingActive condition is unknown. ScalingActive indicates that the HPA controller is able to scale if necessary.
kubernetes.hpa.status.condition.scaling_limited (gauge)
A status value that indicates the autoscaler status in reference to the ScalingLimited condition. A value of 1 means that the autoscaler is in the ScalingLimited condition, a 0 value means that it is not, and -1 means that the status of the ScalingLimited condition is unknown. ScalingLimited indicates that the calculated scale based on metrics would be above or below the range for the HPA, and has thus been capped.
kubernetes.hpa.status.current_replicas (gauge)
The current number of pod replicas managed by this autoscaler.
kubernetes.hpa.status.desired_replicas (gauge)
The desired number of pod replicas managed by this autoscaler.

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Dimensions

The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.

Name	Description
`kubernetes_name`	The name of the resource that the metric describes
`kubernetes_namespace`	The namespace of the resource that the metric describes
`kubernetes_node_uid`	The UID of the node, as defined by the `uid` field of the node resource.
`kubernetes_pod_uid`	The UID of the pod that the metric describes
`machine_id`	The machine ID from /etc/machine-id. This should be unique across all nodes in your cluster, but some cluster deployment tools don't guarantee this. This will not be sent if the `useNodeName` config option is set to true.
`metric_source`	This is always set to `kubernetes`
`quota_name`	The name of the k8s ResourceQuota object that the quota is part of
`resource`	The k8s resource that the quota applies to

Properties

The following properties are set on the dimension values of the dimension specified.

Name	Dimension	Description
`<node label>`	`kubernetes_node_uid`	All non-blank labels on a given node will be synced as properties to the `kubernetes_node_uid` dimension value for that node. Any blank values will be synced as tags on that same dimension.
`<pod label>`	`kubernetes_pod_uid`	Any labels with non-blank values on the pod will be synced as properties to the `kubernetes_pod_uid` dimension. Any blank labels will be synced as tags on that same dimension.
`container_status`	`container_id`	Status of the container such as `running`, `waiting` or `terminated` are synced to the `container_id` dimension.
`container_status_reason`	`container_id`	Reason why a container is in a particular state. This property is synced to `container_id` only if the value of `cotnainer_status` is either `waiting` or `terminated`.
`cronjob_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the cron job was created and is in UTC. This property is synced onto `kubernetes_uid`.
`daemonset_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the daemon set was created and is in UTC. This property is synced onto `kubernetes_uid`.
`deployment_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the deployment was created and is in UTC. This property is synced onto `kubernetes_uid`.
`job_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the job was created and is in UTC. This property is synced onto `kubernetes_uid`.
`node_creation_timestamp`	`kubernetes_node_uid`	CreationTimestamp is a timestamp representing the server time when the node was created and is in UTC. This property is synced onto `kubernetes_node_uid`.
`pod_creation_timestamp`	`kubernetes_pod_uid`	Timestamp (in RFC3339 format) representing the server time when the pod was created and is in UTC. This property is synced onto `kubernetes_pod_uid`.
`replicaset_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the replica set was created and is in UTC. This property is synced onto `kubernetes_uid`.
`statefulset_creation_timestamp`	`kubernetes_uid`	Timestamp (in RFC3339 format) representing the server time when the stateful set was created and is in UTC. This property is synced onto `kubernetes_uid`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes-cluster.md

kubernetes-cluster.md

kubernetes-cluster

Overview

Configuration

Metrics

Group hpa

Non-default metrics (version 4.7.0+)

Dimensions

Properties

Files

kubernetes-cluster.md

Latest commit

History

kubernetes-cluster.md

File metadata and controls

kubernetes-cluster

Overview

Configuration

Metrics

Group hpa

Non-default metrics (version 4.7.0+)

Dimensions

Properties