Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add check on 2 MultiKueue CRD if v1alpha1 version in the cluster and not in termination, error out #1616

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

zdtsw
Copy link
Member

@zdtsw zdtsw commented Feb 5, 2025

  • DSC condition on kueueReady:
    reason: MultiKueueCRDV1Alpah1Exist
    message: MultiKueue CRDs MultiKueueConfig v1alpha1 and MultiKueueCluster v1alpah1 exist, please remove them before proceed
  • reconcile gets triggered by user's deletion on these 2 CRD

Description

ref: https://issues.redhat.com/browse/RHOAIENG-18251

How Has This Been Tested?

local build: quay.io/wenzhou/opendatahub-operator-catalog:v2.23.205

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

// For the case when user manually delete old MultiKueueConfig/MultiKueueCluster CRDs v1alpha1
predicate.Funcs{
DeleteFunc: func(e event.DeleteEvent) bool {
return e.Object.GetName() == gvk.MultiKueueConfigV1Alpha1.Kind || e.Object.GetName() == gvk.MultikueueClusterV1Alpha1.Kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably already covered by the component.ForLabel predicate (https://github.com/opendatahub-io/opendatahub-operator/blob/main/pkg/controller/predicates/component/component.go#L15-L17), unless the crd were not deployed by the operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, if the CRD has app.opendatahub.io/kueue: 'true'

i dont know if we should cover it or not, for the case, kueue team can run "make deploy" to get CRD into cluster but these wont have labels/annotations.
maybe we can remove DeleteFuncs here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be strict here and only support what the operator expects, if you do something outside of the operator, I guess there's not so much we can do/support

pkg/cluster/operator.go Outdated Show resolved Hide resolved
…r, error out

- DSC condition on kueueReady:
	reason: MultiKueueCRDV1Alpah1Exist
	message: MultiKueue CRDs MultiKueueConfig v1alpha1 and MultiKueueCluster v1alpah1 exist, please remove them before proceed
- reconcile gets triggered by user's deletion on these 2 CRD

Signed-off-by: Wen Zhou <[email protected]>
Copy link
Contributor

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zdtsw , I've tested this and works as expected:

  1. Old version of Kueue deployed with v1alpha1 CRDs.
  2. Deleted deployment, and checked that the v1alpha1 CRDs remained.
  3. Installed this version of opendatahub-operator.
  4. Kueue pods were blocked from creation, and found the message in the DSC resource.
  5. After deletion of both v1alpha1 CRDs, the Kueue pod starts and installs newer v1beta1 CRDs.

controllers/status/status.go Outdated Show resolved Hide resolved
- rename function to postive
- remove prediate on non-operator created CRD

Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
@lburgazzoli
Copy link
Contributor

@zdtsw it looks good to me, we'd need an e2e test for this I guess

Copy link
Contributor

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a new clean installation of opendatahub-operator when cluster has none of the 2 CRDs. Kueue pods came up as expected and installed the new CRDs.

@zdtsw
Copy link
Member Author

zdtsw commented Feb 5, 2025

@zdtsw it looks good to me, we'd need an e2e test for this I guess

i will add one on that

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 0% with 41 lines in your changes missing coverage. Please review.

Project coverage is 20.28%. Comparing base (cc7bf04) to head (c9054ac).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...llers/components/kueue/kueue_controller_actions.go 0.00% 23 Missing ⚠️
pkg/cluster/operator.go 0.00% 17 Missing ⚠️
controllers/components/kueue/kueue_controller.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1616      +/-   ##
==========================================
- Coverage   20.35%   20.28%   -0.08%     
==========================================
  Files         163      163              
  Lines       11096    11137      +41     
==========================================
  Hits         2259     2259              
- Misses       8597     8638      +41     
  Partials      240      240              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zdtsw
Copy link
Member Author

zdtsw commented Feb 5, 2025

/test opendatahub-operator-e2e

@lburgazzoli
Copy link
Contributor

I think we probably need a test to validate that the component goes into a non ready state if v1alpha1 is installed, and then gets back to ready when the CRD is deleted & reinstalled

@openshift-ci openshift-ci bot added the lgtm label Feb 6, 2025
Copy link

openshift-ci bot commented Feb 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lburgazzoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 6, 2025
@zdtsw
Copy link
Member Author

zdtsw commented Feb 6, 2025

/test opendatahub-operator-e2e

@zdtsw
Copy link
Member Author

zdtsw commented Feb 6, 2025

/test image-build

Copy link

openshift-ci bot commented Feb 6, 2025

@zdtsw: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test ci-index
/test images
/test opendatahub-operator-e2e
/test opendatahub-operator-pr-image-mirror

Use /test all to run all jobs.

In response to this:

/test image-build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zdtsw
Copy link
Member Author

zdtsw commented Feb 6, 2025

/test all

@zdtsw
Copy link
Member Author

zdtsw commented Feb 6, 2025

/test opendatahub-operator-e2e

@openshift-merge-bot openshift-merge-bot bot merged commit b1c2524 into opendatahub-io:main Feb 6, 2025
10 checks passed
zdtsw added a commit to zdtsw-forking/opendatahub-operator that referenced this pull request Feb 10, 2025
…r and not in termination, error out (opendatahub-io#1616)

* feat: add check on 2 MultiKueue CRD if v1alpha1 version in the cluster, error out

- DSC condition on kueueReady:
	reason: MultiKueueCRDV1Alpah1Exist
	message: MultiKueue CRDs MultiKueueConfig v1alpha1 and MultiKueueCluster v1alpah1 exist, please remove them before proceed
- reconcile gets triggered by user's deletion on these 2 CRD

Signed-off-by: Wen Zhou <[email protected]>

* update:

- rename function to postive
- remove prediate on non-operator created CRD

Signed-off-by: Wen Zhou <[email protected]>

* update: message

Signed-off-by: Wen Zhou <[email protected]>

* update: add test to check muttikueue api version

Signed-off-by: Wen Zhou <[email protected]>

* update: add e2e test to mimic CRD API version

Signed-off-by: Wen Zhou <[email protected]>

---------

Signed-off-by: Wen Zhou <[email protected]>
(cherry picked from commit b1c2524)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants