Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kapp-controller did not find change in cronjob initcontainer #1637

Open
qeqar opened this issue Oct 24, 2024 · 13 comments
Open

Kapp-controller did not find change in cronjob initcontainer #1637

qeqar opened this issue Oct 24, 2024 · 13 comments

Comments

@qeqar
Copy link

qeqar commented Oct 24, 2024

What steps did you take:
We install some Cronjobs in remote clusters via the kapp-controller.

Our Setup: oci-bundle -> appCR -> kapp-controller -> remote cluster

What happened:
We updates spec.jobTemplate.spec.template.spec.initContainers[0] i added .securityContext.runAsUser
But the controller did not find the change, and said no diffs.

For testing purpose we did the same with spec.jobTemplate.spec.template.spec.containers[0], which just worked fine

What did you expect:
Cronjob gets updated

Anything else you would like to add:
[Additional information that will assist in solving the issue.]

Environment:
K8S 1.31.1
Kapp-controller: v0.53.1

kbld.k14s.io/images:                                                                                                                                                                                                                                                                         
- origins:                                                                                                                                                                                                                                                                                 
   - local:                                                                                                                                                                                                                                                                                 
     path: /home/runner/work/kapp-controller/kapp-controller                                                                                                                                                                                                                              
     - git:                                                                                                                                                                                                                                                                                   
        dirty: true                                                                                                                                                                                                                                                                          
        remoteURL: https://github.com/carvel-dev/kapp-controller                                                                                                                                                                                                                             
        sha: 00aa728d6823620c03e3f4917cd565119b17c7d2                                                                                                                                                                                                                                        
        tags:                                                                                                                                                                                                                                                                                
        - v0.53.1                                                                                                                                                                                                                                                                            
    url: ghcr.io/carvel-dev/kapp-controller@sha256:da1ac76b07c0961ec0a1573615cb8c121fd0a4c443a0bb7f73780242d05161f0                  

That is the used Template for the bundle:

#@ load("@ytt:data", "data")
#@ if data.values.k8s_version.startswith("v1.31."):
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup-restic
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  schedule: '0,30 * * * *'
  successfulJobsHistoryLimit: 0
  suspend: false
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            effect: NoSchedule
            operator: Exists
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
            operator: Exists
          restartPolicy: OnFailure
          volumes:
          - name: etcd-backup
            emptyDir: {}
          - name: host-pki
            hostPath:
              path: /etc/kubernetes/pki
          initContainers:
          - name: snapshoter
            image: #@ data.values.oci_registry_1 + "/bitnami/etcd:3.5.16"
            securityContext:
              runAsUser: 0
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          containers:
          - name: uploader
            image: #@ data.values.oci_registry_2 + "/restic/restic:0.17.1"
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --group-by tag --keep-daily 3 --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: #@ "s3:" + str(data.values.s3_endpoint) + "/" + str(data.values.bucket_name)
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: s3-restic-credentials
                  key: restic_password
            - name: AWS_DEFAULT_REGION
              value: #@ str(data.values.default_region)
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-restic-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-restic-credentials
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
#@ end

@qeqar qeqar added bug This issue describes a defect or unexpected behavior carvel-triage This issue has not yet been reviewed for validity labels Oct 24, 2024
@100mik
Copy link
Contributor

100mik commented Oct 24, 2024

To narrow down on the issue, what happens when you run something along the lines of ytt -f config | kbld -f - | kapp deploy - f - -a <app name> Can you reproduce this using the CLIs as well?

@qeqar
Copy link
Author

qeqar commented Oct 24, 2024

Tried it with kapp too, same thing.

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+)
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb

Changes

Namespace  Name  Kind  Age  Op  Op st.  Wait to  Rs  Ri  

Op:      0 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 0 noop

Succeeded

But no runs as in cronjob

@100mik
Copy link
Contributor

100mik commented Oct 24, 2024

Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play.
I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?

@mamachanko
Copy link
Contributor

Definitely suspicious, I am going to try and mock this, but looks like a rebase rule at play. I do not think we have default rebase rules that would do this. However, could you confirm that you do not have any additional rebase ules that are causing this?

@100mik no custom rebase rules are involved according to @qeqar

@cppforlife
Copy link
Contributor

@mamachanko is there any other section of custom kapp config that might be at play? is it possible to share?

@qeqar
Copy link
Author

qeqar commented Oct 25, 2024

I can add some more files, but i don't see a place were i changed the rules.

the appCR:

#@ load("@ytt:data", "data")
#@ if not data.values.backup_bucket_name == "myBucket":
---
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: gks-cluster-backup
  namespace: #@ str(data.values.cluster_namespace)
spec:
  paused: false
  cluster:
    kubeconfigSecretRef:
      name: #@ str(data.values.cluster_name) + "-kubeconfig"
      key: value
  fetch:
  - imgpkgBundle:
      image: #@ str(data.values.oci_bundle_registry) + "/gks/bundles/gks-cluster-backup-bundle:" + str(data.values.gks_cluster_backup_version)
      secretRef:
          name: artifactory
  template:
    - ytt:
        paths:
          - templates
          - schemas
          - values
        inline:
          paths:
            #@yaml/text-templated-strings
            config/inline.yaml: |
              #@data/values
              ---
              access_key_id: "(@= data.values.backup_access_key_id @)"
              secret_access_key: "(@= data.values.backup_secret_access_key @)"
              restic_password: "(@= data.values.backup_restic_password @)"
              default_region: "(@= data.values.backup_region @)"
              s3_endpoint: "(@= data.values.backup_s3_endpoint @)"
              bucket_name: "(@= data.values.backup_bucket_name @)"
        valuesFrom:
          - secretRef:
              name: #@ str(data.values.cluster_name) + "-kapp-val-k8s-version"
    - kbld:
        paths:
          - '-'
          - .imgpkg/images.yml
  deploy:
    - kapp:
        rawOptions: ["--diff-changes=true"]
#@ end

We don't have any special config for the kapp-controller

      containers:
      - args:
        - -packaging-global-namespace=kapp-controller-packaging-global
        - -enable-api-priority-and-fairness=True
        - -tls-cipher-suites=

And thats all.

Tell me if i should look into more special places.

@mamachanko
Copy link
Contributor

Tried it with kapp too, same thing.

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes
Target cluster 'https://xxx:6443' (nodes: provision-test-me-provision-test-me-ix1-md-jwhhm-m78b9, 11+)
resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:27d447e33d5788dac3367ee170667ef6a2113f8bf8cfdf8b98308bce6d5894cc
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb

Changes

Namespace  Name  Kind  Age  Op  Op st.  Wait to  Rs  Ri  

Op:      0 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 0 noop

Succeeded

But no runs as in cronjob

@qeqar this either suggests that - as you mentioned - kapp(-controller) is dropping the change or the change is already applied. The latter might be possible, b/c kapp-controller may already have applied it.

Can you assert whether the respective CronJob on your live cluster is really missing spec.jobTemplate.spec.template.spec.initContainers[0].securityContext.runAsUser?

Use any tool of choice, but this may lead you right to it I think:

kapp inspect -a gks-cluster-backup.app --filter-kind CronJob --filter-name etcd-backup-restic --raw

@qeqar
Copy link
Author

qeqar commented Oct 25, 2024

I looked more then once in the target cluster, and it only get added when the CronJob resource is newly created.

@renuy renuy moved this to To Triage in Carvel Oct 25, 2024
@renuy renuy added helping with an issue and removed bug This issue describes a defect or unexpected behavior carvel-triage This issue has not yet been reviewed for validity labels Oct 25, 2024
@praveenrewar
Copy link
Member

@qeqar Would you be able to share the output of the following?

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -

(Just trying to ensure that there's no kapp config that might have seeped in the bundle config.)

@qeqar
Copy link
Author

qeqar commented Nov 11, 2024

@praveenrewar


ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f -

resolve | final: xxx/bitnami/etcd:3.5.16 -> xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
resolve | final: xxx/restic/restic:0.17.1 -> xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
---
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    kbld.k14s.io/images: |
      - origins:
        - resolved:
            tag: 3.5.16
            url: xxx/bitnami/etcd:3.5.16
        url: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
      - origins:
        - resolved:
            tag: 0.17.1
            url: xxx/restic/restic:0.17.1
        url: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
  name: etcd-backup-restic
  namespace: kube-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - command:
            - /bin/sh
            - -c
            - |-
              set -euf
              restic snapshots -q || restic init -q
              restic backup --tag=etcd --host=${ETCD_HOSTNAME} /backup
              restic forget --prune --group-by tag --keep-daily 3 --keep-last 48
            env:
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: RESTIC_REPOSITORY
              value: s3:/
            - name: RESTIC_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: restic_password
                  name: s3-restic-credentials
            - name: AWS_DEFAULT_REGION
              value: ""
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: s3-restic-credentials
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: s3-restic-credentials
            image: xxx/restic/restic@sha256:424a4e1fcc6fe2557b5614239dc71a2c793acb33a83ea217171bd7edc1862dcb
            imagePullPolicy: IfNotPresent
            name: uploader
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
          dnsPolicy: ClusterFirstWithHostNet
          hostNetwork: true
          initContainers:
          - command:
            - /bin/sh
            - -c
            - |-
              set -euf
              mkdir -p /backup/pki/kubernetes
              mkdir -p /backup/pki/etcd
              cp -a /etc/kubernetes/pki/etcd/ca.crt /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/etcd/ca.key /backup/pki/etcd/
              cp -a /etc/kubernetes/pki/ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.crt /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/front-proxy-ca.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.key /backup/pki/kubernetes
              cp -a /etc/kubernetes/pki/sa.pub /backup/pki/kubernetes
              etcdctl snapshot save /backup/etcd-snapshot.db
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_DIAL_TIMEOUT
              value: 3s
            - name: ETCDCTL_CACERT
              value: /etc/kubernetes/pki/etcd/ca.crt
            - name: ETCDCTL_CERT
              value: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            - name: ETCDCTL_KEY
              value: /etc/kubernetes/pki/etcd/healthcheck-client.key
            - name: ETCD_HOSTNAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            image: xxx/bitnami/etcd@sha256:c1419aec942eae324576cc4ff6c7af20527c8b2e1d25d32144636d8b61dfd986
            imagePullPolicy: IfNotPresent
            name: snapshoter
            securityContext:
              runAsUser: 0
            volumeMounts:
            - mountPath: /backup
              name: etcd-backup
            - mountPath: /etc/kubernetes/pki
              name: host-pki
              readOnly: true
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          restartPolicy: OnFailure
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/control-plane
            operator: Exists
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
            operator: Exists
          volumes:
          - emptyDir: {}
            name: etcd-backup
          - hostPath:
              path: /etc/kubernetes/pki
            name: host-pki
  schedule: 0,30 * * * *
  successfulJobsHistoryLimit: 0
  suspend: false
---
apiVersion: v1
kind: Secret
metadata:
  name: s3-restic-credentials
  namespace: kube-system
stringData:
  AWS_ACCESS_KEY_ID: ""
  AWS_SECRET_ACCESS_KEY: ""
  restic_password: ""
type: Opaque

Succeeded

@praveenrewar
Copy link
Member

@qeqar Sorry, missed the notification, could you also share all the files present in the bundle? Are these two resources that you shared above the only ones in the bundle?

@qeqar
Copy link
Author

qeqar commented Nov 18, 2024

@praveenrewar yes it will render only the cronjob and the secret.

I have the schemal and default.yaml and bundle.yaml

apiVersion: imgpkg.carvel.dev/v1alpha1
kind: Bundle
metadata:
  name: gks-cluster-backup-bundle
authors:
- name: GKS
  email: mail
websites:
- url: url

Thats all.

And i use these to command to create/upload the bundle:
ytt -f bundle/templates -f bundle/schemas -f bundle/values $(LOCAL_TEST_VALUES) --data-value k8s_version=v$$k8sver ; done | kbld --imgpkg-lock-output bundle/.imgpkg/images.yml -f -

imgpkg push -b ${REPO_URL}/gks/bundles/${BUNDLE_NAME}:v${RP_VERSION_SHORT} -f ./bundle --registry-password="${DEPLOY_PASSWORD}" --registry-username="${DEPLOY_USER}"

that is all!

@praveenrewar
Copy link
Member

That is indeed a bit weird, because I am not able to reproduce the issue with a CronJob and I can't think of any other way kapp rebase rules could have been passed.
Could you try deleting the App, and then deploying the resource directly using kapp? (i.e. run the following command twice, first without the securityContext and then with it)

ytt -f bundle/templates -f bundle/schemas -f bundle/values --data-value k8s_version=v1.31.1 | kbld -f - | kapp deploy --kubeconfig ~/xxx/kubeconfig -c -f - -a gks-cluster-backup.app --yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Triage
Development

No branches or pull requests

6 participants