Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninstallation flow for DFC offering #63

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bindrad
Copy link
Member

@bindrad bindrad commented May 23, 2023

Raised this PR as replacement of #55

This PR implements uninstallation flow for DFC offering

  • checks for PVs using ocs storage classes and halt uninstallation if found
  • issue delete for all storageclass claims
  • issue delete for storageclient

@bindrad bindrad force-pushed the dfc_uninstall branch 6 times, most recently from d838477 to 9e958e0 Compare May 29, 2023 09:11
@@ -40,11 +43,16 @@ const (
csiKMSConnectionDetailsConfigMapName = "csi-kms-connection-details"
storageClientCRDName = "storageclients.ocs.openshift.io"
storageClassClaimCRDName = "storageclassclaims.ocs.openshift.io"
defaultStorageClientName = "ocs-storageclient"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default? this is the only storageclient we support right now.

Suggested change
defaultStorageClientName = "ocs-storageclient"
storageClientName = "ocs-storageclient"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 146 to 165
for i := range storageClassClaimList.Items {
storageClassClaim := &storageClassClaimList.Items[i]
r.Log.Info("Issuing a delete for storageClassClaim %s", storageClassClaim.Name)
if err := r.unrestrictedDelete(storageClassClaim); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for storageClassClaim %s: %v", storageClassClaim.Name, err)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues I see with this loop.

  1. if one SCC delete fails, you will not issue and delete for anything else after it.
  2. You might be issuing a delete for an already "marked for deletion" SCC again and again and again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines +154 to +170
r.Log.Info("issuing a delete for StorageClient")
if err := r.delete(&r.storageClient); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for Storage Client: %v", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? we are the controller owner of this resource. It will get deleted when the offering CR will get deleted. Am I wrong here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to explain using a scenario

  • Let's assume that the ocs-client operator adds a finalizer to a secret for some reason if a storageClient is created.
  • This finalizer should be removed by the ocs-client operator if storageClient is in deleting state.

Now, if we delete offering CR that is the owner of storageClient then storageClient will be removed instantly as the owner(offering CR) was removed. At this moment, it is possible that ocs-client operator didn't trigger a reconcile quick enough to remove the finalizer from the secret. This will block deletion of namespace. So according to me, the agent should initiate the deletion of storageClient and after the successful removal of storageClient, we should remove offering CR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do the same for storageCluster in DF offering

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are describing is a bug in the uninstallation/deletion flow of the client. If the client added a finalizer on a resource it needs to act in one of two consistent ways:

  1. remove the finalizer if the client is marked for delete
  2. stop uninstallation until the secret is removed.
    Which one is dependent on what it tries to achieve.

Any other behavior will be considered a bug and the client developers will need to fix it.

Copy link
Collaborator

@nb-ohad nb-ohad May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do the same because we copy-pasted the current behavior from the old deployer where there was no governing offering CR that held an owner ref. We need to evaluate if a change is also needed there.

It all depends on how you want the behavior to look like. Do we implement an eventually consistent model for offering removal or a sync one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where there was no governing offering CR that held an owner ref

We had managedOCS CR owning the storageCluster, right?

eventually consistent model for offering removal or a sync one

Can you elaborate on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we can follow the behaviour of the deployer. If required, we can evaluate if a change is required for both offerings. I have updated the PR to follow an eventually consistent model for offering removal.

- checks for PVs using ocs storage classes and halt uninstallation if found
- issue delete for all storageclass claims
- issue delete for storageclient

Signed-off-by: Dhruv Bindra <[email protected]>
@bindrad bindrad requested a review from nb-ohad May 30, 2023 08:34
@rchikatw rchikatw mentioned this pull request Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants