Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds first draft at a cluster debugging page #2736

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions website/docs/cluster-management/debugging.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Debugging Clusters
sidebar_position: 42
---

Lets work through some common steps when a cluster is not behaving as expected

## Pull request did not appear in github/gitlab
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could github/gitlab permissions also affect this?


Make sure you're looking at the correct repository

## Cluster does not appear in the UI after merging the PR

- Check that the path the cluster definition was merged to is being reconciled by flux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide instructions here?

- Check that there were no errors in the kustomization resource that applied the cluster definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide an appropriate kubectl command for users to run?

- There may be a k8s validation error like a bad namespace in the cluster definition
- The CAPI provider may not be installed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we link to the documentation that explains how to install CAPI providers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The CAPI provider may not be installed
- The CAPI provider may not be installed resulting in a missing CRD error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would they find this?

- Check that the template created a `GitopsCluster` resource, this is what the UI looks for.

## Cluster does not transition to ready

- Check the logs of the capi controllers, it may be failing to create the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include an appropriate kubectl command to get these logs?

- Some providers like CAPD can be quite sensitive to your docker state
- make sure you don't have a lot of other old clusters running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we provide instructions?

- Try a different cluster name, some old resources may not been cleaned up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can they check this?

- No CNI may have been installed on your clusters. Make sure a ClusterResourceSet is configured to do this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can link to the documentation for this?


## Cluster resources do not appear in Applications/sources

- Bootstrapping may have failed
- No ClusterBootstrapConfiguration may be loaded into the cluster
- Check the github repo to see if flux has made a commit to bootstrap the new cluster
- Check the logs of the pods of the bootstrap job. They are named `default/run-gitops-${cluster-name}`, flux may have failed to clone the repo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstrap jobs are created in the same name space as of CAPI cluster namespace

- Check a GITHUB_TOKEN is available to flux

## x509: certificate signed by unknown authority error on Applications/Sources page

- You may have an old load balancer from a previous cluster, delete it and recreate the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recreate the cluster" is a pretty harsh fix?