Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Monitoring in airgapped environment #25

Open
valentin-nasta opened this issue Oct 22, 2024 · 12 comments
Open

Enabling Monitoring in airgapped environment #25

valentin-nasta opened this issue Oct 22, 2024 · 12 comments

Comments

@valentin-nasta
Copy link
Contributor

I was checking the possibility to enable the Monitoring Application in an airgapped environment according to this documentation:
https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/monitoring-alerting-guides/enable-monitoring

Is this a separate Helm chart, or is it part of the existing stack?
How would this setup look when integrated inside the rke_airgap_install script?
Any tips or guidance on configuring this in an airgapped environment would be greatly appreciated.

Thank you!

@valentin-nasta valentin-nasta changed the title Enabling Monitoring in airgapped anvironment Enabling Monitoring in airgapped environment Oct 22, 2024
@clemenko
Copy link
Owner

By default or script? I think the images may already be there. I need to test a cluster tonight/tomorrow. Actually it should bne fairly easy to add to the script.

@clemenko
Copy link
Owner

So good news. All the images are included already. I was able to go into rancher and use the catalog for Monitoring and everything worked.

From https://github.com/clemenko/rke_airgap_install/blob/main/hauler_all_the_things.sh#L489 --set useBundledSystemChart=true tells rancher to use the charts locally. And since all the images are already stored in hauler everything works.

is there something more that you are looking for?

@valentin-nasta
Copy link
Contributor Author

valentin-nasta commented Oct 23, 2024

Thank you for the quick reply.

By default or script?

By default would be nice, if there is some kind of Rancher activation of the monitoring similar to the govmessage. Otherwise, adding it to the script would also work fine. The scenario is to have the system already prepared and delivered to the customer without needing to fiddle with the setup afterward.

I also discovered which Helm chart is actually being used by inspecting the UI (rancher-monitoring-103.1.1-up45.31.1.tgz). Initially, I thought it was this one: kube-prometheus-stack.

I tried installing it "manually," but it fails. Do you have any idea why this might happen?

helm upgrade --install=true --namespace=cattle-monitoring-system --timeout=10m0s --values=/home/shell/helm/values-rancher-monitoring-103.1.1-up45.31.1.yaml --version=103.1.1+up45.31.1 --wait=true rancher-monitoring /home/shell/helm/rancher-monitoring-103.1.1-up45.31.1.tgz
Release "rancher-monitoring" does not exist. Installing it now.
Starting delete for "rancher-monitoring-admission" ServiceAccount
Ignoring delete failure for "rancher-monitoring-admission" /v1, Kind=ServiceAccount: serviceaccounts "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" ClusterRole
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" ClusterRoleBinding
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" Role
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=Role: roles.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" RoleBinding
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=RoleBinding: rolebindings.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission-create" Job
Ignoring delete failure for "rancher-monitoring-admission-create" batch/v1, Kind=Job: jobs.batch "rancher-monitoring-admission-create" not found
creating 1 resource(s)
Watching for changes to Job rancher-monitoring-admission-create with timeout of 10m0s
Add/Modify event for rancher-monitoring-admission-create: ADDED
rancher-monitoring-admission-create: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
Add/Modify event for rancher-monitoring-admission-create: MODIFIED
rancher-monitoring-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed pre-install: 1 error occurred:
        * timed out waiting for the condition

full log:
helm-operation-v68w4_undefined.log

@valentin-nasta
Copy link
Contributor Author

I think found the root cause on the error:

kubectl -n cattle-monitoring-system get job rancher-monitoring-admission-create
NAME                                  COMPLETIONS   DURATION   AGE
rancher-monitoring-admission-create   0/1           93m        93m

kubectl -n cattle-monitoring-system get pod --selector=job-name=rancher-monitoring-admission-create
NAME                                        READY   STATUS             RESTARTS   AGE
rancher-monitoring-admission-create-snvlv   0/1     ImagePullBackOff   0          91m

kubectl -n cattle-monitoring-system get pod --selector=job-name=rancher-monitoring-admission-create -oyaml | grep image
      image: 192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
      imagePullPolicy: IfNotPresent
    - image: 192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
      imageID: ""
          message: Back-off pulling image "192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"

@clemenko
Copy link
Owner

If you want the https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack you will have to add the images itself.
Right now out of the box, all the images you need is there for the Rancher Monitoring App. You can deploy Rancher and install from the catalog. I will look into adding it from a curl shortly.

@clemenko
Copy link
Owner

After looking into this there is no easy way to do this. The charts they are using are backed in. The chart versions are also hard coded. The simplest way is to use the GUI for deploying it.

@valentin-nasta
Copy link
Contributor Author

Thank you for taking a look on it!
Even using the GUI it felt short with the error from the previous comment.
I need to troubleshoot it and make sure to load the images beforehand.

@clemenko
Copy link
Owner

I was not able to reproduce the error. Did you deploy rancher with the script?

@valentin-nasta
Copy link
Contributor Author

Yes, I deployed rancher with the script, with these versions:

export RKE_VERSION=1.28.12
export CERT_VERSION=v1.15.3
export RANCHER_VERSION=v2.8.5
export LONGHORN_VERSION=v1.7.0
export NEU_VERSION=2.7.7

I think am getting closer, there is some version mismatch somewhere:

hauler store info | grep mirrored-ingress-nginx-kube

| rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20230312-helm-chart-4.5.2-28-g66a760794 | image | linux/amd64 |        2 | 20.1 MB  |

vs

message: Back-off pulling image "192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"

@clemenko
Copy link
Owner

I think I know what is going on. Updating the script now for it.

clemenko added a commit that referenced this issue Oct 24, 2024
@clemenko
Copy link
Owner

Take a look at 64054ec

@clemenko
Copy link
Owner

@valentin-nasta did you get a chance to look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants