-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Split tutorial for CAPI, add capi troubleshooting pages
- Loading branch information
Showing
5 changed files
with
364 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Provisioning a {{product}} cluster with CAPI | ||
|
||
This guide covers how to deploy a {{product}} multi-node cluster | ||
using Cluster API (CAPI). | ||
|
||
## Prerequisites | ||
|
||
This guide assumes the following: | ||
- A CAPI management cluster initialised with the infrastructure, bootstrap and | ||
control plane providers of your choice. Please refer to the | ||
[getting-started guide] for instructions. | ||
|
||
## Generate a cluster spec manifest | ||
|
||
You can generate a cluster manifest for a selected set of commonly used | ||
infrastructures via templates provided by the {{product}} team. | ||
Ensure you have initialized the desired infrastructure provider and fetch | ||
the {{product}} provider repository: | ||
|
||
``` | ||
git clone https://github.com/canonical/cluster-api-k8s | ||
``` | ||
|
||
Review the list of variables needed for the cluster template: | ||
|
||
``` | ||
cd cluster-api-k8s | ||
export CLUSTER_NAME=yourk8scluster | ||
clusterctl generate cluster ${CLUSTER_NAME} --from ./templates/<infrastructure-provider>/cluster-template.yaml --list-variables | ||
``` | ||
|
||
Set the respective environment variables by editing the rc file as needed | ||
before sourcing it. Then generate the cluster manifest: | ||
|
||
``` | ||
source ./templates/<infrastructure-provider>/template-variables.rc | ||
clusterctl generate cluster ${CLUSTER_NAME} --from ./templates/<infrastructure-provider>/cluster-template.yaml > cluster.yaml | ||
``` | ||
|
||
Each provisioned node is associated with a `CK8sConfig`, through which you can | ||
set the cluster’s properties. Available configuration fields can be listed in detail with: | ||
|
||
``` | ||
sudo k8s kubectl explain CK8sConfig.spec | ||
``` | ||
|
||
Review the available options in the respective | ||
definitions file and edit the cluster manifest (`cluster.yaml` above) to match | ||
your needs. | ||
|
||
## Deploy the cluster | ||
|
||
To deploy the cluster, run: | ||
|
||
``` | ||
sudo k8s kubectl apply -f cluster.yaml | ||
``` | ||
|
||
For an overview of the cluster status, run: | ||
|
||
``` | ||
clusterctl describe cluster ${CLUSTER_NAME} | ||
``` | ||
|
||
To get the list of provisioned clusters: | ||
|
||
``` | ||
sudo k8s kubectl get clusters | ||
``` | ||
|
||
To see the deployed machines: | ||
|
||
``` | ||
sudo k8s kubectl get machine | ||
``` | ||
|
||
After the first control plane node is provisioned, you can get the kubeconfig | ||
of the workload cluster: | ||
|
||
``` | ||
clusterctl get kubeconfig ${CLUSTER_NAME} > ./${CLUSTER_NAME}-kubeconfig | ||
``` | ||
|
||
You can then see the workload nodes using: | ||
|
||
``` | ||
KUBECONFIG=./${CLUSTER_NAME}-kubeconfig sudo k8s kubectl get node | ||
``` | ||
|
||
## Delete the cluster | ||
|
||
To delete a cluster, run: | ||
|
||
``` | ||
sudo k8s kubectl delete cluster ${CLUSTER_NAME} | ||
``` | ||
|
||
<!-- LINKS --> | ||
|
||
[getting-started guide]: ../tutorial/getting-started |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,252 @@ | ||
# How to troubleshoot {{product}} | ||
|
||
Identifying issues in a Kubernetes cluster can be difficult, especially to new | ||
users. With {{product}} we aim to make deploying and managing your cluster as | ||
easy as possible. This how-to guide will walk you through the steps to | ||
troubleshoot your {{product}} cluster. | ||
|
||
## Check the cluster status | ||
|
||
Verify that the cluster status is ready by running: | ||
|
||
``` | ||
sudo k8s kubectl get cluster,ck8scontrolplane,machinedeployment,machine | ||
``` | ||
|
||
You should see a command output similar to the following: | ||
|
||
``` | ||
NAME CLUSTERCLASS PHASE AGE VERSION | ||
cluster.cluster.x-k8s.io/my-cluster Provisioned 16m | ||
NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE | ||
ck8scontrolplane.controlplane.cluster.x-k8s.io/my-cluster-control-plane true true v1.32.1 1 1 1 | ||
NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION | ||
machinedeployment.cluster.x-k8s.io/my-cluster-worker-md-0 my-cluster 1 1 1 0 Running 16m v1.32.1 | ||
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION | ||
machine.cluster.x-k8s.io/my-cluster-control-plane-j7w6m my-cluster my-cluster-cp-my-cluster-control-plane-j7w6m <provider-id> Running 16m v1.32.1 | ||
machine.cluster.x-k8s.io/my-cluster-worker-md-0-8zlzv-7vff7 my-cluster my-cluster-wn-my-cluster-worker-md-0-8zlzv-7vff7 <provider-id> Running 80s v1.32.1 | ||
``` | ||
|
||
## Check providers status | ||
|
||
{{product}} cluster provisioning failures could happen in multiple providers used in CAPI. | ||
|
||
Check the {{product}} bootstrap provider logs: | ||
|
||
``` | ||
k8s kubectl logs -n cabpck-system deployment/cabpck-bootstrap-controller-manager | ||
``` | ||
|
||
Examine the {{product}} control-plane provider logs: | ||
|
||
``` | ||
k8s kubectl logs -n cacpck-system deployment/cacpck-controller-manager | ||
``` | ||
|
||
Review the CAPI controller logs: | ||
|
||
``` | ||
k8s kubectl logs -n capi-system deployment/capi-controller-manager | ||
``` | ||
|
||
Check the logs for the infrastructure provider by running: | ||
|
||
``` | ||
k8s kubectl logs -n <infrastructure-provider-namespace> <infrastructure-provider-deployment> | ||
``` | ||
|
||
## Test the API server health | ||
|
||
Fetch the kubeconfig file for a {{product}} cluster provisioned through CAPI by running: | ||
|
||
``` | ||
clusterctl get kubeconfig ${CLUSTER_NAME} > ./${CLUSTER_NAME}-kubeconfig.yaml | ||
``` | ||
|
||
Verify that the API server is healthy and reachable by running: | ||
|
||
``` | ||
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get all | ||
``` | ||
|
||
This command lists resources that exist under the default namespace. If the API | ||
server is healthy you should see a command output similar to the following: | ||
|
||
``` | ||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 29m | ||
``` | ||
|
||
A typical error message may look like this if the API server can not be reached: | ||
|
||
``` | ||
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port? | ||
``` | ||
|
||
A failure can mean that: | ||
|
||
* The API server is not reachable due to network issues or firewall limitations | ||
* The API server on the particular node is unhealthy | ||
* All control plane nodes are down | ||
|
||
## Check the cluster nodes' health | ||
|
||
Confirm that the nodes in the cluster are healthy by looking for the `Ready` | ||
status: | ||
|
||
``` | ||
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get nodes | ||
``` | ||
|
||
You should see a command output similar to the following: | ||
|
||
``` | ||
NAME STATUS ROLES AGE VERSION | ||
my-cluster-cp-my-cluster-control-plane-j7w6m Ready control-plane,worker 17m v1.32.1 | ||
my-cluster-wn-my-cluster-worker-md-0-8zlzv-7vff7 Ready worker 2m14s v1.32.1 | ||
``` | ||
|
||
## Troubleshoot an unhealthy node | ||
|
||
Every healthy {{ product }} node has certain services up and running. The | ||
required services depend on the type of node. | ||
|
||
Services running on both the control plane and worker nodes: | ||
|
||
* `k8sd` | ||
* `kubelet` | ||
* `containerd` | ||
* `kube-proxy` | ||
|
||
Services running only on the control-plane nodes: | ||
|
||
* `kube-apiserver` | ||
* `kube-controller-manager` | ||
* `kube-scheduler` | ||
* `k8s-dqlite` | ||
|
||
Services running only on the worker nodes: | ||
|
||
* `k8s-apiserver-proxy` | ||
|
||
Make the necessary adjustments for SSH access depending on your infrastructure provider and SSH into the unhealthy node with: | ||
|
||
``` | ||
ssh <user>@<node> | ||
``` | ||
|
||
Check the status of the services on the failing node by running: | ||
|
||
``` | ||
sudo systemctl status snap.k8s.<service> | ||
``` | ||
|
||
Check the logs of a failing service by executing: | ||
|
||
``` | ||
sudo journalctl -xe -u snap.k8s.<service> | ||
``` | ||
|
||
If the issue indicates a problem with the configuration of the services on the | ||
node, examine the arguments used to run these services. | ||
|
||
The arguments of a service on the failing node can be examined by reading the | ||
file located at `/var/snap/k8s/common/args/<service>`. | ||
|
||
## Investigate system pods' health | ||
|
||
Check whether all of the cluster's pods are `Running` and `Ready`: | ||
|
||
``` | ||
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml get pods -n kube-system | ||
``` | ||
|
||
The pods in the `kube-system` namespace belong to {{product}}' features such as | ||
`network`. Unhealthy pods could be related to configuration issues or nodes not | ||
meeting certain requirements. | ||
|
||
## Troubleshoot a failing pod | ||
|
||
Look at the events on a failing pod by running: | ||
|
||
``` | ||
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml describe pod <pod-name> -n <namespace> | ||
``` | ||
|
||
Check the logs on a failing pod by executing: | ||
|
||
``` | ||
kubectl --kubeconfig ${CLUSTER_NAME}-kubeconfig.yaml logs <pod-name> -n <namespace> | ||
``` | ||
|
||
You can check out the upstream [debug pods documentation][] for more | ||
information. | ||
|
||
## Use the built-in inspection script | ||
|
||
{{product}} ships with a script to compile a complete report on {{product}} and | ||
its underlying system. This is an essential tool for bug reports and for | ||
investigating whether a system is (or isn’t) working. | ||
|
||
The inspection script can be executed on a specific node by running the following | ||
commands: | ||
|
||
``` | ||
ssh -t <user>@<node> -- sudo k8s inspect /home/<user>/inspection-report.tar.gz | ||
scp <user>@<node>:/home/<user>/inspection-report.tar.gz ./ | ||
``` | ||
|
||
The command output is similar to the following: | ||
|
||
``` | ||
Collecting service information | ||
Running inspection on a control-plane node | ||
INFO: Service k8s.containerd is running | ||
INFO: Service k8s.kube-proxy is running | ||
INFO: Service k8s.k8s-dqlite is running | ||
INFO: Service k8s.k8sd is running | ||
INFO: Service k8s.kube-apiserver is running | ||
INFO: Service k8s.kube-controller-manager is running | ||
INFO: Service k8s.kube-scheduler is running | ||
INFO: Service k8s.kubelet is running | ||
Collecting registry mirror logs | ||
Collecting service arguments | ||
INFO: Copy service args to the final report tarball | ||
Collecting k8s cluster-info | ||
INFO: Copy k8s cluster-info dump to the final report tarball | ||
Collecting SBOM | ||
INFO: Copy SBOM to the final report tarball | ||
Collecting system information | ||
INFO: Copy uname to the final report tarball | ||
INFO: Copy snap diagnostics to the final report tarball | ||
INFO: Copy k8s diagnostics to the final report tarball | ||
Collecting networking information | ||
INFO: Copy network diagnostics to the final report tarball | ||
Building the report tarball | ||
SUCCESS: Report tarball is at /home/ubuntu/inspection-report.tar.gz | ||
``` | ||
|
||
Use the report to ensure that all necessary services are running and dive into | ||
every aspect of the system. | ||
|
||
## Report a bug | ||
|
||
If you cannot solve your issue and believe that the fault may lie in | ||
{{product}}, please [file an issue on the project repository][]. | ||
|
||
Help us deal effectively with issues by including the report obtained from the | ||
inspect script, any additional logs, and a summary of the issue. | ||
|
||
You can check out the upstream [debug documentation][] for more details on | ||
troubleshooting a Kubernetes cluster. | ||
|
||
<!-- Links --> | ||
|
||
[file an issue on the project repository]: https://github.com/canonical/cluster-api-k8s/issues/new/choose | ||
[capi-troubleshooting-reference]: ../reference/troubleshooting | ||
[systemd]: https://systemd.io | ||
[debug pods documentation]: https://kubernetes.io/docs/tasks/debug/debug-application/debug-pods | ||
[debug documentation]: https://kubernetes.io/docs/tasks/debug |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.