Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ImagePullSecret for pulling from private registry #584

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

gniltaws
Copy link
Contributor

What is this PR About?

Adds the ability to specify an image pull secret for pulling a custom image from a private container registry

How do we test this?

It should run as-is if pulling the default image, since it only adds the secret if installPlanApproverAndVerifyJobsImagePullSecret is specified.

Test that the secret works requires

  • Update values.yaml installPlanApproverAndVerifyJobsImage to point at to an image in a private registry
  • Create an imagepull secret with credentials for the private registry
  • Update values.yaml installPlanApproverAndVerifyJobsImagePullSecret

cc: @redhat-cop/day-in-the-life

@itewk
Copy link
Contributor

itewk commented Nov 21, 2024

@gniltaws would you be willing to add a test to https://github.com/redhat-cop/helm-charts/tree/main/charts/operators-installer/_integration-tests as part of this?

I am working to get permissions so i can approve the workflow running, but you can run the tests locally too

@gniltaws
Copy link
Contributor Author

@itewk The default image pulls from private registry registry.redhat.io, so I think the only thing that's needed to test this update is to add a Pull Secret to the namespace and put it's name in the values file.

Does this repository have a way to handle secret data (like a token for registry.redhat.io)? If not the rest of this comment is moot.

The simplest way I can see to test this is to modify the existing _integration-tests/test-install-operator-1-automatic-intermediate-manual-upgrades-values.yaml that does the upgrade to use the default image from the private registry and add a installPlanApproverAndVerifyJobsImagePullSecret. That would test the pull secret and the update functionality at the same time. I'm new to automated testing like this, so I'm not sure if testing multiple things in the same step is a good idea.

@garethahealy
Copy link
Contributor

@itewk The default image pulls from private registry registry.redhat.io, so I think the only thing that's needed to test this update is to add a Pull Secret to the namespace and put it's name in the values file.

Does this repository have a way to handle secret data (like a token for registry.redhat.io)? If not the rest of this comment is moot.

no, we don't currently have a secret for registry.redhat.io but if you build a test in your fork, I'll get a token created in this repo.

@garethahealy
Copy link
Contributor

garethahealy commented Nov 22, 2024

echo "${{ secrets.REDHAT_REGISTRY_TOKEN }}" | podman login registry.redhat.io --username "6340056|redhat-cop-helm-charts" --password-stdin
Login Succeeded!

if you want it as a k8s Secret, let me know, and I'll create another secret for that

@itewk
Copy link
Contributor

itewk commented Nov 22, 2024

@garethahealy @gniltaws rather then putting a token in this repo which gives access to red hat private registry, which puts us at risk of violating some rules about use of that repo...why not create a quay.io repo and add a pull secret to that?

@garethahealy
Copy link
Contributor

@garethahealy @gniltaws rather then putting a token in this repo which gives access to red hat private registry, which puts us at risk of violating some rules about use of that repo...why not create a quay.io repo and add a pull secret to that?

I don't see the difference. We are not redistributing any of the redhat.io images, which would be a concern for the RH usage agreement. All this is doing is pulling and testing the image, and once the test is finished, throwing away the environment.

Creating a mirror (which I think is what you are suggesting), makes the situation worse, as we are actually redistributing the image then.

@itewk
Copy link
Contributor

itewk commented Nov 22, 2024

Creating a mirror (which I think is what you are suggesting), makes the situation worse, as we are actually redistributing the image then.

I wouldn't mirror one of the redhat.io images but rather have this test test against some other image that is already publically avaiable

@garethahealy
Copy link
Contributor

garethahealy commented Nov 22, 2024

I wouldn't mirror one of the redhat.io images but rather have this test test against some other image that is already publically avaiable

I see a few issues with that:

  • chart is packaged and released with a custom image ref
  • is the custom image maintained? and who by?
  • if customers or RH services were to use this chart, task one is to replace that reference with the redhat.io image

Feels like that's causing more issues, than the concern of pulling from the redhat.io registry...

@itewk
Copy link
Contributor

itewk commented Nov 22, 2024

the chart woudldn't be packaged with a custom image. the test only would be set to reference the custom image/repo, which the tests actually already do, for precisly this reason, so the tests can run without access to redhat.io (https://github.com/redhat-cop/helm-charts/blob/main/charts/operators-installer/_integration-tests/test-install-operator-0-automatic-intermediate-manual-upgrades-values.yaml#L3)

Really we just need the test to set the secret. we don't really need to verify the secret actually works. you could copy that exact test above and just add the parameter to set the secret.

But anyway, if you all want to have the pull secret for redhat.io here. okay.

@itewk itewk added the enhancement New feature or request label Nov 22, 2024
@itewk itewk self-requested a review November 22, 2024 17:56
@garethahealy
Copy link
Contributor

linting failing due to:

charts/operators-installer/values.yaml
  Error: 16:77 [trailing-spaces] trailing spaces

@gniltaws
Copy link
Contributor Author

I don't understand why the secret creation is failing in the integration test. Am I specifying the secret incorrectly? @garethahealy

error: either --from-file or the combination of --docker-username, --docker-password and --docker-server is required
$ grep -e 'docker-username' -e 'docker-password' -e 'docker-server' .github/workflows/install-integration-tests-operators-installer.yaml 
          --docker-password="${{ secrets.REDHAT_REGISTRY_TOKEN }}" \
          --docker-server=registry.redhat.io \
          --docker-username="6340056|redhat-cop-helm-charts"

@garethahealy
Copy link
Contributor

it'll be because your fork/pull request cant access the secret:

so this test would only work on the main branch

@itewk
Copy link
Contributor

itewk commented Dec 2, 2024

it'll be because your fork/pull request cant access the secret:

so this test would only work on the main branch

it doens't h ave to be on main branch, it just as to be a PR from a branch on this repo rather then a fork of this repo.

so optinos

  1. update test to pull from a dummy quay.io repo with a dummy read only robot accoutn where we can just put the access token write in the test rather then uusing github secrets
  2. take the branch for this PR and put it on this repo and re-do the PR from that branch
  3. YOLO it and assume it will pass when we merge, and if it doesn't do another merge

you all have any preferences?

i am fine with any of it

@gniltaws
Copy link
Contributor Author

gniltaws commented Dec 3, 2024

I think strategy number 2 is the best strategy, but I don't want to make more work for you.

I'm a little nervous about option 3 (YOLO strategy) working the first time, but I think it would only need minor changes, if anything. If it goes poorly, issue a bugfix and remove the tag for the bad version?

Thoughts? @garethahealy @itewk

@gniltaws gniltaws requested a review from garethahealy December 6, 2024 16:49
@sabre1041
Copy link
Contributor

The reliance on content contained in GitHub secrets presents a challenge not only in the management of credentials, but limitations on the testing aspects within pull requests.

There are two approaches that I might suggest. Regardless of these approaches, I would not maintain an actual, valid credential to any registry. As a result, here are two approaches:

  1. Include credentials to a dummy repository. The image pull secret will be added, but will not actually get used
  2. Spin up a registry:2 registry in the kind cluster, secure it with htpasswd and then copy the publicly available cli image to it. The credentials that is used to secure it does not need to be stored as a GitHub secret since it just contains an already publicly available image.

@gniltaws
Copy link
Contributor Author

gniltaws commented Dec 9, 2024

@sabre1041 I like this idea. Can you point me to a doc on what you mean by registry:2 and/or how to set it up?

  1. Spin up a registry:2 registry in the kind cluster, secure it with htpasswd and then copy the publicly available cli image to it. The credentials that is used to secure it does not need to be stored as a GitHub secret since it just contains an already publicly available image.

@itewk
Copy link
Contributor

itewk commented Dec 10, 2024

@sabre1041 I like this idea. Can you point me to a doc on what you mean by registry:2 and/or how to set it up?

we actually already have a kind cluster installed as part of the ci: https://github.com/redhat-cop/helm-charts/blob/main/.github/workflows/install-unit-test.yaml#L57-L61

So in theory you just need to add step to install registry:2 container image (it comes from dockerhub so you can just use the short name) and then you need to do the httpassword confifiguration for it, which that i dont know how to do, but maybe something from this doc: https://distribution.github.io/distribution/about/deploying/

@sabre1041
Copy link
Contributor

@sabre1041 I like this idea. Can you point me to a doc on what you mean by registry:2 and/or how to set it up?

we actually already have a kind cluster installed as part of the ci: https://github.com/redhat-cop/helm-charts/blob/main/.github/workflows/install-unit-test.yaml#L57-L61

So in theory you just need to add step to install registry:2 container image (it comes from dockerhub so you can just use the short name) and then you need to do the httpassword confifiguration for it, which that i dont know how to do, but maybe something from this doc: https://distribution.github.io/distribution/about/deploying/

I have a deployment (OCP based but im sure you can tweak it for KIND) that uses htpasswd. @gniltaws @itewk feel free to hit me up on Wednesday and Ill share

@sabre1041
Copy link
Contributor

@itewk @gniltaws I spent a little bit of time and tested the following in a KIND cluster

Namespace

apiVersion: v1
kind: Namespace
metadata:
  name: registry

PersistentVolumeClaim

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: registry
  namespace: registry
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: registry
  name: registry
  namespace: registry
spec:
  ports:
    - name: http
      port: 5000
      protocol: TCP
      targetPort: 5000
  selector:
    app: registry
  sessionAffinity: None
  type: ClusterIP

Secret
Add a htpasswd file to a secret

apiVersion: v1
data:
  htpasswd: <base64_htpasswd>
kind: Secret
metadata:
  name: htpasswd
  namespace: registry

Ingress
May not be needed if you want to use internal Kubernetes services

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: registry
  namespace: registry
spec:
  rules:
  - host: registry.<ip_address>.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: registry
            port:
              number: 5000

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: registry
  namespace: registry
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: registry
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: registry
    spec:
      containers:
        - env:
            - name: REGISTRY_HTTP_ADDR
              value: :5000
            - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
              value: /var/lib/registry
            - name: REGISTRY_AUTH
              value: htpasswd
            - name: REGISTRY_AUTH_HTPASSWD_REALM
              value: Registry Realm
            - name: REGISTRY_AUTH_HTPASSWD_PATH
              value: /auth/htpasswd
          image: registry:2
          imagePullPolicy: IfNotPresent
          name: registry
          ports:
            - containerPort: 5000
              name: http
              protocol: TCP
          resources:
            limits:
              cpu: 500m
              memory: 128Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /var/lib/registry
              name: registry
            - mountPath: /auth
              name: htpasswd
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
        - name: registry
          persistentVolumeClaim:
            claimName: registry
        - name: htpasswd
          secret:
            defaultMode: 420
            secretName: htpasswd

@itewk
Copy link
Contributor

itewk commented Dec 12, 2024

@sabre1041 you are 1 in a billion.

@gniltaws can you incorperate that config into the CI testing pipeline as part of this change with your test?

@sabre1041 i am going to be out the next 4 weeks, so if this gets all sorted, please merge this with my only other comment being to bump version to 3.1.0 since this a non-breaking feature add

@gniltaws
Copy link
Contributor Author

First, a big thank-you to @sabre1041! That saved me so much trial and error! Thanks also to @itewk for the guidance.

I added my work-in-progress so I could ask a few questions on tooling:

  1. Is skopeo available in the container where the integration testing happens? If not, what's the preferred tool for copying the image into the private repo?
  2. I got as far as I could without adding an Ingress (trying to avoid since I wasn't sure what should go in place of "registry.<ip_address>.nip.io"), but I realize it's needed in place of the registry.registry.svc.cluster.local when pushing/ pulling the image to/from the private repository. What should I use for the host in the Ingress?

@sabre1041
Copy link
Contributor

First, a big thank-you to @sabre1041! That saved me so much trial and error! Thanks also to @itewk for the guidance.

I added my work-in-progress so I could ask a few questions on tooling:

  1. Is skopeo available in the container where the integration testing happens? If not, what's the preferred tool for copying the image into the private repo?
  2. I got as far as I could without adding an Ingress (trying to avoid since I wasn't sure what should go in place of "registry.<ip_address>.nip.io"), but I realize it's needed in place of the registry.registry.svc.cluster.local when pushing/ pulling the image to/from the private repository. What should I use for the host in the Ingress?
  1. Assuming we are using any of the standard GitHub Actions Ubuntu images, skopeo, buildah and podman are all installed
  2. If you are just pushing from the host itself, set the ingress with the hostname registry.localhost. It will resolve locally and route into the kind cluster

@itewk
Copy link
Contributor

itewk commented Dec 13, 2024

@gniltaws now to clean up the pipeline jobs :)

thanks for sticking with this! i know we made it much more complicated on you. but this thing is used in production at many a client so testing important.

@gniltaws
Copy link
Contributor Author

Is someone able to trigger the tests again? @itewk @garethahealy

@gniltaws
Copy link
Contributor Author

If the skopeo command in this latest version doesn't work, I'm going to need some advice on using skopeo and/or Ingress in a kind cluster

@sabre1041
Copy link
Contributor

If the skopeo command in this latest version doesn't work, I'm going to need some advice on using skopeo and/or Ingress in a kind cluster

What specifically?

@gniltaws
Copy link
Contributor Author

If the skopeo command in this latest version doesn't work, I'm going to need some advice on using skopeo and/or Ingress in a kind cluster

What specifically?

How to address the registry in the kind cluster in the skopeo copy command. When I used this:

  skopeo copy
    --dest-creds ${registry_user}:${registry_password} \
    docker://$(awk '$1 ~/Image:/ {print $2}' charts/operators-installer/_integration-tests/test-install-operator-0-automatic-intermediate-manual-upgrades-values.yaml) \
    docker://registry.localhost/origin-cli

I got this error: pinging container registry registry.localhost: Get \"https://registry.localhost/v2/\": dial tcp [::1]:443: connect: connection refused

I tried adding the port on (docker://registry.localhost/origin-cli:5000) and switching from docker:// to https:// and got the same error. This time I'm trying http://, but I'm not sure what to do if that doesn't work. I don't have experience with non-OpenShift k8s (and thus Ingresses) and skopeo.

@sabre1041
Copy link
Contributor

If the skopeo command in this latest version doesn't work, I'm going to need some advice on using skopeo and/or Ingress in a kind cluster

What specifically?

How to address the registry in the kind cluster in the skopeo copy command. When I used this:

  skopeo copy
    --dest-creds ${registry_user}:${registry_password} \
    docker://$(awk '$1 ~/Image:/ {print $2}' charts/operators-installer/_integration-tests/test-install-operator-0-automatic-intermediate-manual-upgrades-values.yaml) \
    docker://registry.localhost/origin-cli

I got this error: pinging container registry registry.localhost: Get \"https://registry.localhost/v2/\": dial tcp [::1]:443: connect: connection refused

I tried adding the port on (docker://registry.localhost/origin-cli:5000) and switching from docker:// to https:// and got the same error. This time I'm trying http://, but I'm not sure what to do if that doesn't work. I don't have experience with non-OpenShift k8s (and thus Ingresses) and skopeo.

Steps to set up port mapping to expose 80/443 are found here

https://kind.sigs.k8s.io/docs/user/ingress

@gniltaws
Copy link
Contributor Author

I think I figured out how to expose the registry so skopeo can use it. Could someone trigger the tests again?

@gniltaws
Copy link
Contributor Author

The error should be fixed now, is someone able to trigger the tests again?

@gniltaws
Copy link
Contributor Author

Thank you! I finally found the issue that I had been blind to. Could you run the tests workflow again?

@gniltaws
Copy link
Contributor Author

The integration test only failed because it hit the timeout after 30 minutes, not because of a yaml error or something like that.

Is there a way to see more detail about what went wrong, or is the best course of action just to increase the timeout on the update job?

@gniltaws
Copy link
Contributor Author

It still timed out after 30 minutes despite my change to 35 minutes. Any ideas?

@gniltaws
Copy link
Contributor Author

gniltaws commented Jan 6, 2025

It still timed out after 30 minutes despite my change to 35 minutes. Any ideas?

Is there a hard limit of 30 minutes, or do I need to change it in a different way?

@gniltaws
Copy link
Contributor Author

I'm sorry to be a pest, but we have a rapidly approaching deadline where authentication will be required on our local registry. I'm hoping someone has ideas for how to get past/around the 30 minute timeout.

Thanks!

@ckavili
Copy link
Contributor

ckavili commented Jan 15, 2025

I'm happy to merge this if you already tested the chart. We can figure out the timeout issue separately, no worries!

@itewk
Copy link
Contributor

itewk commented Jan 15, 2025

im digging in, ill spend a bit of time today seeing if can fix the test timeout issue. if not. im fine to "cheat" and merge this without the new test and create a new issue to create a test.

add support for pulling the operator approver job image from a private registry
@itewk
Copy link
Contributor

itewk commented Jan 16, 2025

Heyo all. i have been poking at this for a while. I have a refactored version of this off on my fork. I am close. But there is still some issue with the job pulling the image from the local registry that i have not been able to figure out. i think i need to try and re-create locally. but hole nother host of issues trying to run kind and images on my arm64 mac....sigh

itewk#1

i'll keep at it tomorrow

@itewk
Copy link
Contributor

itewk commented Jan 16, 2025

okay. after far to much fighting and stupidity on my part. the problem is certs.

9s          Warning   Failed             Pod/external-secrets-operator-v0-8-1-approver-fhsdx   Failed to pull image "registry.localhost/origin-cli:4.15": failed to pull and unpack image "registry.localhost/origin-cli:4.15": failed to resolve reference "registry.localhost/origin-cli:4.15": failed to do request: Head "https://registry.localhost/v2/origin-cli/manifests/4.15": tls: failed to verify certificate: x509: certificate is valid for ingress.local, not registry.localhost

which means have to either go through the nonesense of adding an untrusted registry to the kind cluster (i dont remember how its done, i just remember its a pain) or the pain of creating a CSR that kind can sign and then inject that cert into the registry.

in any case. im fried. tomorrow problem.

@gniltaws
Copy link
Contributor Author

@itewk Thanks for all of your efforts on this!

I haven't yet tested this in our cluster since we had been prioritizing other facets of the registry authentication project. I'll hopefully be able to get something together to test this tomorrow.

implement/refactor unit tests for the operators-installer operator approvaer job image pull from a private registry
@itewk
Copy link
Contributor

itewk commented Jan 16, 2025

ignoreing the story of how I waisted hours being an idiot....

@gniltaws i have it working over here: itewk#1

if you want to reset your PR branch to the branch from that PR (it is already rebased on the latest from main) then we can go ahead and merge (assuming the tests pass over here as well)

@gniltaws gniltaws force-pushed the imagePullSecret branch 5 times, most recently from 4d4d791 to ae57277 Compare January 17, 2025 00:02
@gniltaws
Copy link
Contributor Author

@itewk

if you want to reset your PR branch to the branch from that PR (it is already rebased on the latest from main) then we can go ahead and merge (assuming the tests pass over here as well)

I think I finally did this after many unsuccessful attempts, but this is my first attempt at grabbing a branch from another repo, so please tell me if I did something wrong. Thanks again!

@itewk
Copy link
Contributor

itewk commented Jan 17, 2025

@gniltaws it would be nice to have a cleaner history, from scratch this is what i would do:

git clone [email protected]:redhat-cop/helm-charts.git
cd helm-charts
git remote add gniltaws [email protected]:gniltaws/rhcop-helm-charts.git
git remote add itewk [email protected]:itewk/helm-charts.git
git fetch --all
git switch -c imagePullSecret gniltaws/imagePullSecret
git reset --hard itewk/imagePullSecret
git push gniltaws imagePullSecret --force

this will update your branch to be the same as the one on my repo which will then update this PR to have a clean history ontop of current main from this repo.

@itewk itewk self-requested a review January 17, 2025 16:14
@itewk itewk merged commit 73e6173 into redhat-cop:main Jan 17, 2025
4 of 5 checks passed
@itewk
Copy link
Contributor

itewk commented Jan 17, 2025

@gniltaws thanks for the contribution and for sticking with us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants