Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFC uninstall flow #55

Closed

Conversation

ezio-auditore
Copy link
Collaborator

@ezio-auditore ezio-auditore commented May 4, 2023

  • checks for PVs using ocs storage classes and restricts delete if found
  • delete default storageclass claims
  • delete storageclient

leelavg
leelavg previously approved these changes May 8, 2023
@ezio-auditore ezio-auditore requested a review from bindrad May 8, 2023 11:12
Copy link
Member

@bindrad bindrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezio-auditore I don't see a function that will indicate whether the offering is safe to be removed or not. We should have dfcIsReadyToBeRemoved function which will be called in pluginIsReadyToBeRemoved function.

@ezio-auditore ezio-auditore force-pushed the uninstall_dfc branch 2 times, most recently from 09c5614 to fea9ac2 Compare May 8, 2023 12:27
@ezio-auditore ezio-auditore requested a review from bindrad May 8, 2023 12:29
@leelavg
Copy link
Collaborator

leelavg commented May 9, 2023

I don't see a function that will indicate

  • good call, seems I need to look at agent code (standalone) before reviewing next PRs 😅

bindrad
bindrad previously approved these changes May 9, 2023
}
}
pvList := &corev1.PersistentVolumeList{}
if err := r.list(pvList); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this will only list PVs from the set of namespaces the operator group allows for it.
We need to list all PVs! for that I believe you can use the Unrestricted Client

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to fetch using UnrestrictedClient

Comment on lines 284 to 292
ocsStorageClass := map[string]bool{}
for i := range storageClassList.Items {
storageClass := storageClassList.Items[i]
if storageClass.Provisioner == r.offering.Namespace+ocsCephFsProvisionerSuffix || storageClass.Provisioner == r.offering.Namespace+ocsRBDProvisionerSuffix {
ocsStorageClass[storageClass.Name] = true
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to calculate the provisioners name on each loop iteration?

Suggested change
ocsStorageClass := map[string]bool{}
for i := range storageClassList.Items {
storageClass := storageClassList.Items[i]
if storageClass.Provisioner == r.offering.Namespace+ocsCephFsProvisionerSuffix || storageClass.Provisioner == r.offering.Namespace+ocsRBDProvisionerSuffix {
ocsStorageClass[storageClass.Name] = true
}
}
ocsStorageClass := map[string]bool{}
rbdProvisionerName = r.offering.Namespace + ocsRBDProvisionerSuffix
fsProvisionerName := r.offering.Namespace + ocsCephFsProvisionerSuffix
for i := range storageClassList.Items {
storageClass := storageClassList.Items[i]
if storageClass.Provisioner == rbdProvisionerName || storageClass.Provisioner == fsProvisionerName {
ocsStorageClass[storageClass.Name] = true
}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@ezio-auditore ezio-auditore force-pushed the uninstall_dfc branch 2 times, most recently from 9b9151c to e00697f Compare May 18, 2023 05:08
Comment on lines +142 to +148
r.Log.Info("issuing a delete for Default StorageClassClaims")
if err := r.delete(&r.defaultBlockStorageClassClaim); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for defaultBlockStorageClassClaim: %v", err)
}
if err := r.delete(&r.defaultFSStorageClassClaim); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for defaultFSStorageClassClaim: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we delete non-default storageClassClaims as well that are provisioned by cephfs.csi.ceph.com or rbd.csi.ceph.com?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good question! @ezio-auditore what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think
If the user creates his custom storageClassClaim using the the default storageclient spec only then we need to delete those storageClassClaims,else we need to ignore it.

@nb-ohad @bindrad If this sounds ok I will implement the same here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default storageclient spec

Are we even allowing customer to create non-default storageClient? Or do we have plans to allow customer to create non-default storageClients?

Copy link
Collaborator

@nb-ohad nb-ohad May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user creates his custom storageClassClaim using the the default storageclient spec

@ezio-auditore that does not make any sense. The reason someone will create another claim is if they want some aspects of the storageclass to be different

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nb-ohad, the ocs-client-operator is already checking for any PV before deleting the SCC. Why do we need to recheck the PV here again?

The question here was about SCC, I think we should just delete the lot of them

You mean list all the SCC, delete them, and wait for them to be deleted before deleting the storage client right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support Users creating custom storageclass and static provisioning without SCC in managed service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support Users creating custom storageclass and static provisioning without SCC in managed service?

On consumer clusters, customer can do anything they want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean list all the SCC, delete them, and wait for them to be deleted before deleting the storage client right?

I think we can issue a delete for all SCCs and then issue a delete for storageClient. No need to wait for the removal of all SCCs before deleting storageClient because client-operator won't delete storageClient unless all SCCs are removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 I am aware the ocs-client-operator is already checking for that.
But a k8s delete operation is unrevertable so we check for potential lock conditions, from the operator that owns the lifecycle of the offering (in this case the client operator) before issuing the delete request.

Signed-off-by: Kaustav Majumder <[email protected]>
)

//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclients,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclassclaims,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=operators.coreos.com,resources=clusterserviceversions,verbs=get;list;watch;delete;update;patch
//+kubebuilder:rbac:groups="",resources=persistentvolumes,verbs=get;list;watch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this give us enough privileges to read and delete over all namespaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it gives us enough privileges

)

//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclients,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclassclaims,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=operators.coreos.com,resources=clusterserviceversions,verbs=get;list;watch;delete;update;patch
//+kubebuilder:rbac:groups="",resources=persistentvolumes,verbs=get;list;watch
//+kubebuilder:rbac:groups="storage.k8s.io",resources=storageclass,verbs=get;list;watch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need access to storageclass? Doesn't uninstallation of ocs-client-operator already take care of deletion of storageclassclaims and as a result the attached storageclasses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need access to get, list and watch storageClass to fetch the PVs that are using ocs storageclasses

Comment on lines +91 to +93
r.defaultBlockStorageClassClaim.Name = defaultBlockStorageClassClaimName

r.defaultFSStorageClassClaim.Name = "ocs-storagecluster-cephfs"
r.defaultFSStorageClassClaim.Name = defaultFSStorageClassClaimName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we are treating these storageclassclaims differently then ant other storagelcassclaim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the default storageClassClaims that the agent is creating.

Comment on lines +142 to +148
r.Log.Info("issuing a delete for Default StorageClassClaims")
if err := r.delete(&r.defaultBlockStorageClassClaim); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for defaultBlockStorageClassClaim: %v", err)
}
if err := r.delete(&r.defaultFSStorageClassClaim); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to issue a delete for defaultFSStorageClassClaim: %v", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence of storageclassclaims or even storageclasses does not infer usage of storage. I am against blocking uninstallation based on that

rbdProvisionerName := r.offering.Namespace + ocsRBDProvisionerSuffix
fsProvisionerName := r.offering.Namespace + ocsCephFsProvisionerSuffix
for i := range storageClassList.Items {
storageClass := storageClassList.Items[i]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent a copy operation, please take a reference

Suggested change
storageClass := storageClassList.Items[i]
storageClass := &storageClassList.Items[i]

Comment on lines +285 to +286
rbdProvisionerName := r.offering.Namespace + ocsRBDProvisionerSuffix
fsProvisionerName := r.offering.Namespace + ocsCephFsProvisionerSuffix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ocsRBDProvisionerSuffix and ocsCephFsProvisionerSuffix are only used once here. Can we just embed the values?

}
pvList := &corev1.PersistentVolumeList{}
pvCount := 0
if err := r.UnrestrictedClient.List(r.ctx, pvList); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we had an r.UnrestrictedList(pvList) operation.

if err := r.list(storageClassList); err != nil {
return false, fmt.Errorf("failed to list Storage Classes: %v", err)
}
ocsStorageClass := map[string]bool{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ocsStorageClass := map[string]bool{}
ocsStorageClasses := map[string]bool{}

Comment on lines +304 to +309
if pvCount > 0 {
r.Log.Info(fmt.Sprintf("Current count of PVs blocking uninstallation: %v", pvCount))
return true, nil
}

return false, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense to have logic/logging of messages base on the result of this func inside the func itself.
Let's have this function be countOCSVolumes which returns a number and handles the decision-making process outside.

Log: ctrl.Log.WithName("controllers").WithName("ManagedFusionOffering"),
Scheme: mgr.GetScheme(),
Client: mgr.GetClient(),
UnrestrictedClient: getUnrestrictedClient(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create the unrestricted client once and rescue it in all controllers?

@ezio-auditore
Copy link
Collaborator Author

closing since another PR #63 already addressing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants