Skip to content

Latest commit

 

History

History
130 lines (101 loc) · 6.16 KB

File metadata and controls

130 lines (101 loc) · 6.16 KB

Troubleshooting Panel User Guide

The troubleshooting panel displays a graph of resources and observability signals related to whatever is shown in the main console window. Nodes in the graph represent a type of resource or signal, edges represent relationships.

Clicking on a node in the graph opens the console page showing details of that resource or signal. Clicking the "Focus" button re-calculates the graph starting from the current contents of the main window.

The panel provides a map of related information to help you navigate more quickly to relevant data, or to discover relevant data you may not have been aware of.

We will show an example of troubleshooting an Alert.

Note
You can re-create this example alert on your own cluster by following the instructions here. You can also experiment by using the panel with existing resources in your own cluster.

Opening the panel

Open the troubleshooting panel with the "Signal Correlation" entry in the troubleshooting section of the "launcher" menu, found at top right of the screen:

launcher

Opening the panel shows a neighbourhood of the resource currently displayed in the console. A neighbourhood is a graph that starts at the current resource, and includes related objects up to 3 steps away from the starting point.

Note
Not all resource types are currently supported, more will be added in future. For an unsupported resource, the panel will be empty.

For example here the panel for a KubeContainerWaiting alert.

panel graph
  1. Alert(1): This node represents the starting point, a KubeContainerWaiting alert that was displayed in the console.

  2. Pod(1): This node indicates there is a single Pod resource associated with this alert. Clicking on this node will show the pod details in the console.

  3. Event(2): There are two kuberenetes events associated with the Pod, and you can see them by clicking this node.

  4. Logs(74): The pod has emitted 74 lines of logs. Click to show them.

  5. Metrics(105): There are always many metrics associated with every Pod.

  6. Network(6): There are network events associated with the pod, which means it has communicated with other resources in the cluster. The remaining Service, Deployment and DaemonSet nodes are the resources that the pod has communicated with.

  7. Focus: Clicking this button will re-calculate the graph starting from the current contents of the main console window. This may have changed by clicking nodes in the graph, or by using any other links, menus or navigation features of the console.

  8. Show Query: enables experimental features detailed below.

Note
Clicking on a node may sometimes show fewer results than are indicated on the graph. This is a known issue that will be addressed in future.

Experimental features

query details
  1. Hide Query hides the experimental features.

  2. The query that identifies the starting point for the graph. This is normally derived automatically from the contents of the main console window. You can enter queries manually, but the format of this query language is experimental and likely to change in future. [1] The "Focus" button updates the query to match the resources in the main console window.

  3. Neighbourhood depth: increase or decrease to see a smaller or larger neighbourhood. Note: setting a large value in a large cluster may cause the query to fail if the number of results is too big.

  4. Goal class: Selecting this option will do a goal directed search instead of a neighbourhood search. A goal directed search will show all paths from the starting point to the goal class , which indicates a type of resource or signal.

The format of the goal class is experimental and may change. Currently the valid goal classes are:

k8s:resource[.version.[group]]

Kind of Kuberenetes resource. For example k8s:Pod or k8s:Deployment.apps.v1.

alert:alert

Any alert.

metric:metric

Any metric.

netflow:network

Any network observability event.

log:log_type

Stored logs, log_type must be application, infrastructure or audit

Optional signal stores

The troubleshooting panel relies on the observability signal stores installed in your cluster. Kuberenetes resources, alerts and metrics are available by default in an OCP cluster.

Other types of signal require optional components to be installed:

  • Logs: "Red Hat Openshift Logging" (collection) and "Loki Operator provided by Red Hat" (store)

  • Network Events: "Network Observability provided by Red Hat" (collection) and "Loki Operator provided by Red Hat" (store)

Creating the example alert

You can reproduce the example alert shown above as follows.

Procedure
  1. Run the following command to create a broken deployment in a system namespace:

    kubectl apply -f - << EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: bad-deployment
      namespace: default (1)
    spec:
      selector:
        matchLabels:
          app: bad-deployment
      template:
        metadata:
          labels:
            app: bad-deployment
        spec:
          containers: (2)
          - name: bad-deployment
          	image: quay.io/openshift-logging/vector:5.8
    1. The deployment must be in a system namespace (such as default) to cause the desired alerts.

    2. This container deliberately tries to start a vector server with no configuration file. The server will log a few messages, and then exit with an error. Any container could be used for this.

  2. View the alerts:

    1. Go to ObserveAlerting and click clear all filters. View the Pending alerts.

      Important

      Alerts first appear in the Pending state. They do not start Firing until the container has been crashing for some time. By showing Pending alerts you can see them much more quickly.

    2. Look for KubeContainerWaiting, KubePodCrashLooping, or KubePodNotReady alerts.

    3. Select one such alert and open the troubleshooting panel, or click the "Focus" button if it is already open.


1. This query language is part of Korrel8r, the correlation engine used to create the graphs