Skip to content

Commit

Permalink
Split tutorial for CAPI, add capi troubleshooting pages
Browse files Browse the repository at this point in the history
  • Loading branch information
berkayoz committed Jan 31, 2025
1 parent 12f659e commit 341b7c0
Show file tree
Hide file tree
Showing 5 changed files with 364 additions and 80 deletions.
2 changes: 2 additions & 0 deletions docs/src/capi/howto/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@ Overview <self>
:glob:
:titlesonly:
Provision a Canonical Kubernetes cluster <provision>
Install custom Canonical Kubernetes <custom-ck8s>
Use external etcd <external-etcd.md>
Upgrade the Kubernetes version <rollout-upgrades>
Perform an in-place upgrade <in-place-upgrades>
Upgrade the providers of a management cluster <upgrade-providers>
Migrate the management cluster <migrate-management>
Refresh workload cluster certificates <refresh-certs>
Troubleshooting <troubleshooting>
```

---
Expand Down
100 changes: 100 additions & 0 deletions docs/src/capi/howto/provision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Provisioning a {{product}} cluster with CAPI

This guide covers how to deploy a {{product}} multi-node cluster
using Cluster API (CAPI).

## Prerequisites

This guide assumes the following:
- A CAPI management cluster initialised with the infrastructure, bootstrap and
control plane providers of your choice. Please refer to the
[getting-started guide] for instructions.

## Generate a cluster spec manifest

You can generate a cluster manifest for a selected set of commonly used
infrastructures via templates provided by the {{product}} team.
Ensure you have initialized the desired infrastructure provider and fetch
the {{product}} provider repository:

```
git clone https://github.com/canonical/cluster-api-k8s
```

Review the list of variables needed for the cluster template:

```
cd cluster-api-k8s
export CLUSTER_NAME=yourk8scluster
clusterctl generate cluster ${CLUSTER_NAME} --from ./templates/<infrastructure-provider>/cluster-template.yaml --list-variables
```

Set the respective environment variables by editing the rc file as needed
before sourcing it. Then generate the cluster manifest:

```
source ./templates/<infrastructure-provider>/template-variables.rc
clusterctl generate cluster ${CLUSTER_NAME} --from ./templates/<infrastructure-provider>/cluster-template.yaml > cluster.yaml
```

Each provisioned node is associated with a `CK8sConfig`, through which you can
set the cluster’s properties. Available configuration fields can be listed in detail with:

```
sudo k8s kubectl explain CK8sConfig.spec
```

Review the available options in the respective
definitions file and edit the cluster manifest (`cluster.yaml` above) to match
your needs.

## Deploy the cluster

To deploy the cluster, run:

```
sudo k8s kubectl apply -f cluster.yaml
```

For an overview of the cluster status, run:

```
clusterctl describe cluster ${CLUSTER_NAME}
```

To get the list of provisioned clusters:

```
sudo k8s kubectl get clusters
```

To see the deployed machines:

```
sudo k8s kubectl get machine
```

After the first control plane node is provisioned, you can get the kubeconfig
of the workload cluster:

```
clusterctl get kubeconfig ${CLUSTER_NAME} > ./${CLUSTER_NAME}-kubeconfig
```

You can then see the workload nodes using:

```
KUBECONFIG=./${CLUSTER_NAME}-kubeconfig sudo k8s kubectl get node
```

## Delete the cluster

To delete a cluster, run:

```
sudo k8s kubectl delete cluster ${CLUSTER_NAME}
```

<!-- LINKS -->

[getting-started guide]: ../tutorial/getting-started
252 changes: 252 additions & 0 deletions docs/src/capi/howto/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# How to troubleshoot {{product}}

Identifying issues in a Kubernetes cluster can be difficult, especially to new
users. With {{product}} we aim to make deploying and managing your cluster as
easy as possible. This how-to guide will walk you through the steps to
troubleshoot your {{product}} cluster.

## Check the cluster status

Verify that the cluster status is ready by running:

```
sudo k8s kubectl get cluster,ck8scontrolplane,machinedeployment,machine
```

You should see a command output similar to the following:

```
NAME CLUSTERCLASS PHASE AGE VERSION
cluster.cluster.x-k8s.io/my-cluster Provisioned 16m
NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE
ck8scontrolplane.controlplane.cluster.x-k8s.io/my-cluster-control-plane true true v1.32.1 1 1 1
NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION
machinedeployment.cluster.x-k8s.io/my-cluster-worker-md-0 my-cluster 1 1 1 0 Running 16m v1.32.1
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
machine.cluster.x-k8s.io/my-cluster-control-plane-j7w6m my-cluster my-cluster-cp-my-cluster-control-plane-j7w6m <provider-id> Running 16m v1.32.1
machine.cluster.x-k8s.io/my-cluster-worker-md-0-8zlzv-7vff7 my-cluster my-cluster-wn-my-cluster-worker-md-0-8zlzv-7vff7 <provider-id> Running 80s v1.32.1
```

## Check providers status

{{product}} cluster provisioning failures could happen in multiple providers used in CAPI.

Check the {{product}} bootstrap provider logs:

```
k8s kubectl logs -n cabpck-system deployment/cabpck-bootstrap-controller-manager
```

Examine the {{product}} control-plane provider logs:

```
k8s kubectl logs -n cacpck-system deployment/cacpck-controller-manager
```

Review the CAPI controller logs:

```
k8s kubectl logs -n capi-system deployment/capi-controller-manager
```

Check the logs for the infrastructure provider by running:

```
k8s kubectl logs -n <infrastructure-provider-namespace> <infrastructure-provider-deployment>
```

## Test the API server health

Fetch the kubeconfig file for a {{product}} cluster provisioned through CAPI by running:

```
clusterctl get kubeconfig ${CLUSTER_NAME} > ./${CLUSTER_NAME}-kubeconfig.yaml
```

Verify that the API server is healthy and reachable by running:

```
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get all
```

This command lists resources that exist under the default namespace. If the API
server is healthy you should see a command output similar to the following:

```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 29m
```

A typical error message may look like this if the API server can not be reached:

```
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
```

A failure can mean that:

* The API server is not reachable due to network issues or firewall limitations
* The API server on the particular node is unhealthy
* All control plane nodes are down

## Check the cluster nodes' health

Confirm that the nodes in the cluster are healthy by looking for the `Ready`
status:

```
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get nodes
```

You should see a command output similar to the following:

```
NAME STATUS ROLES AGE VERSION
my-cluster-cp-my-cluster-control-plane-j7w6m Ready control-plane,worker 17m v1.32.1
my-cluster-wn-my-cluster-worker-md-0-8zlzv-7vff7 Ready worker 2m14s v1.32.1
```

## Troubleshoot an unhealthy node

Every healthy {{ product }} node has certain services up and running. The
required services depend on the type of node.

Services running on both the control plane and worker nodes:

* `k8sd`
* `kubelet`
* `containerd`
* `kube-proxy`

Services running only on the control-plane nodes:

* `kube-apiserver`
* `kube-controller-manager`
* `kube-scheduler`
* `k8s-dqlite`

Services running only on the worker nodes:

* `k8s-apiserver-proxy`

Make the necessary adjustments for SSH access depending on your infrastructure provider and SSH into the unhealthy node with:

```
ssh <user>@<node>
```

Check the status of the services on the failing node by running:

```
sudo systemctl status snap.k8s.<service>
```

Check the logs of a failing service by executing:

```
sudo journalctl -xe -u snap.k8s.<service>
```

If the issue indicates a problem with the configuration of the services on the
node, examine the arguments used to run these services.

The arguments of a service on the failing node can be examined by reading the
file located at `/var/snap/k8s/common/args/<service>`.

## Investigate system pods' health

Check whether all of the cluster's pods are `Running` and `Ready`:

```
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get pods -n kube-system
```

The pods in the `kube-system` namespace belong to {{product}}' features such as
`network`. Unhealthy pods could be related to configuration issues or nodes not
meeting certain requirements.

## Troubleshoot a failing pod

Look at the events on a failing pod by running:

```
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml describe pod <pod-name> -n <namespace>
```

Check the logs on a failing pod by executing:

```
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml logs <pod-name> -n <namespace>
```

You can check out the upstream [debug pods documentation][] for more
information.

## Use the built-in inspection script

{{product}} ships with a script to compile a complete report on {{product}} and
its underlying system. This is an essential tool for bug reports and for
investigating whether a system is (or isn’t) working.

The inspection script can be executed on a specific node by running the following
commands:

```
ssh -t <user>@<node> -- sudo k8s inspect /home/<user>/inspection-report.tar.gz
scp <user>@<node>:/home/<user>/inspection-report.tar.gz ./
```

The command output is similar to the following:

```
Collecting service information
Running inspection on a control-plane node
INFO: Service k8s.containerd is running
INFO: Service k8s.kube-proxy is running
INFO: Service k8s.k8s-dqlite is running
INFO: Service k8s.k8sd is running
INFO: Service k8s.kube-apiserver is running
INFO: Service k8s.kube-controller-manager is running
INFO: Service k8s.kube-scheduler is running
INFO: Service k8s.kubelet is running
Collecting registry mirror logs
Collecting service arguments
INFO: Copy service args to the final report tarball
Collecting k8s cluster-info
INFO: Copy k8s cluster-info dump to the final report tarball
Collecting SBOM
INFO: Copy SBOM to the final report tarball
Collecting system information
INFO: Copy uname to the final report tarball
INFO: Copy snap diagnostics to the final report tarball
INFO: Copy k8s diagnostics to the final report tarball
Collecting networking information
INFO: Copy network diagnostics to the final report tarball
Building the report tarball
SUCCESS: Report tarball is at /home/ubuntu/inspection-report.tar.gz
```

Use the report to ensure that all necessary services are running and dive into
every aspect of the system.

## Report a bug

If you cannot solve your issue and believe that the fault may lie in
{{product}}, please [file an issue on the project repository][].

Help us deal effectively with issues by including the report obtained from the
inspect script, any additional logs, and a summary of the issue.

You can check out the upstream [debug documentation][] for more details on
troubleshooting a Kubernetes cluster.

<!-- Links -->

[file an issue on the project repository]: https://github.com/canonical/cluster-api-k8s/issues/new/choose
[capi-troubleshooting-reference]: ../reference/troubleshooting
[systemd]: https://systemd.io
[debug pods documentation]: https://kubernetes.io/docs/tasks/debug/debug-application/debug-pods
[debug documentation]: https://kubernetes.io/docs/tasks/debug
1 change: 1 addition & 0 deletions docs/src/capi/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ annotations
Ports and services <ports-and-services>
Community <community>
configs
troubleshooting
```

Expand Down
Loading

0 comments on commit 341b7c0

Please sign in to comment.