-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No affinity on pv when using topology #566
Comments
can you reproduce this with latest 1.11.3 release? |
Hi, We also observe this behavior with topologies, we are currently running the ibm-block CSI 1.12.0 on an a 4.17 openshift cluster The toplogy is respected on creation, but we expect this affinity is also taken into account when rescheduling occurs, but pods still trying to schedule on the other zone where it is technically impossible (and volountary) to find its volume.
Do we miss something ? |
IBM CSI pods filters access to system/volumes in both the host definer and volume operations. Each node on the cluster should only be able access volumes allowed by the zoning labels. If you want your own pods to have filtered access - you need to do that with K8S topology (node affinity), not CSI topology. Below is an example - can you please confirm that you defined topology similarly? If not - I'll need the secret file (of course replace any confidential information with fake values) and also the label information in the node description, also the PV definition in the pod you're trying to run, and how the pod configures access to the mount. In the following example of one element in a topology-aware secret definition (taken from the IBM CSI documentation):
If you define your pod with a volume mount definition - the corresponding PV must conform to CSI zoning restrictions. To allow you pod to run only on nodes that have access to a specific PV - you also need to use K8S zoning (node affinity). Of course - CSI cannot control or define affinity of user pods. If you defined your pods as above and see an issue - I'll I need more details as requested above. Thanks! |
Hi @lechapitre , thx for your reply ! I agree that's it is not the role of the csi to define pod affinity or topology constraints, and in our case, it is natively done through kubernetes topology labels, this one is used : Here is the setup about node labels for topology usage:
By the way, why using specific labels for the csi instead of native ones ? we expect something consistent here, or maybe i don't have the use case in mind, but we would appreciate to be able to rely on our own labels or native one like other csi does We use the topology aware setup according to the documentation, here is the specific secret configuration, as you can see, a different
Also, a volume can be remap only on another worker in the same zone
To summarize a "user story" : A pod is created on zone1 (kubernetes decide this for us or we assume we specified the zone), claim a pvc via topology aware storageClass on the correct zone : zone1 and csi create the volume on the expected zone. Then, let's assume we lose zone1 workers, and the pod is allowed to schedule on the other zone: zone2 (could be an error of configuration or whatever), it is currently trying to find its volume already created on zone1, of course, it did not find it and pod will be in error instead of pending state. Some points about your answer :
Thanks ! |
@dje4om , As suspected - your use case is not supported by CSI. The purpose of topology in IBM CSI is to restrict access to volumes from certain nodes. It is not meant to limit resource usage. It is unclear to me why in your use case you need IBM CSI zoning in the first place - if I understand correctly you want the pod to run on either node selected by OpenShift and that this pod should have access to the same volume regardless of the node selected. Why did you define zoning? I mentioned host-definer because it also needs to be topology aware - if only node1 is able to access volume1 - then only node1 ports need to be defined on volume1 storage. I don't know the historical reason for IBM CSI using its own labels - but it does allow more flexibility. Only subset of pods restricted by affinity can have access to certain volumes (restricted by IBM CSI labels). |
Hello @lechapitre, Our case is a very standard one, confusion probably comes due to the way i try to describe how to reproduce We need zoning because both workers are physicals nodes (it's a simplified example), and are in a different datacenter with there own storage array, this is why we need csi toplogy. We absolutely don't want a pod to be able to move from one zone to another, and that what's actually happenning, of course, the pod can't schedule (crashloopbackoff) because it can't find its volume, but it tried and this is the issue here. I'm wondering how you define deployments or statefulsets to ensure the topology is respected ? |
@dje4om , So pod starts to run on node1 with access to volume1. Can you explain how K8S/OpenShift allows the pod to be rescheduled on node2 if the pod affinity is "node1 only"? (which you should have defined yourself with node annotations, not IBM CSI labels). You asked about our implementation - we use the labels such that our pods (CSI and host-definer) only "see" volumes which are allowed by label. If topology doesn't allow a node instance of CSI/host-definer to access a volume - then it's simply as if the storage doesn't exist on that node. |
Yes, but we don't expect it to fail or to create a new volume, but to remain pending We observed this behaviour on a statefulset with one replica during node maintenance that we had to force due to a pdb, we expected the pod to remain pending, but it continues to crashloopback even when the node has returned from maintenance. Fyi, I since check another csi (well know vendor) that support topologies, and this configuration is well defined Here is some additional informations : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity |
Hi
IBM block CSI driver does not add nodeAffinity information to pv when using CSI Topology. This behavior causes pod scheduling and unsuccessful attempts to mount pv outside of those described in secret "supported_topologies".
Versions:
ibm-block-csi-operator.v1.10.0
Openshift 4.11.0 (k8s v1.24.0+9546431)
The text was updated successfully, but these errors were encountered: