-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Topology CRD #325
base: main
Are you sure you want to change the base?
Add Topology CRD #325
Conversation
// APIAffinity exposes PodAffinity and PodAntiaffinity overrides that are applied | ||
// to the StatefulSet | ||
// +optional | ||
APIAffinity *affinity.Overrides `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuggi APIAffinity
(we can change the name eventually) comes from [1], where the k8s struct is wrapped to expose what we already had.
We have two choices here:
- Expose affinity/antiaffinity from PodSpec as is
- Refine [1] to match how we want the user to interact w/ this CR.
A Topology CR would be like the following:
apiVersion: topology.openstack.org/v1beta1
kind: Topology
metadata:
name: default-sample
namespace: openstack
spec:
topologySpreadConstraint:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone" # Spread evenly across zones
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: foo
affinity:
preferred:
weight: 80
topologyKey: "topology.kubernetes.io/zone"
selectKey: "foo"
selectValues:
- "bar"
You can put only a subset of them (e.g. only define topologySpreadConstraint and skip affinity/antiaffinity).
@stuggi @olliewalsh it can be consumed by something like [2]. It would put a finalizer on it and watches for changes in the spec (with the Predicate
approach we just discussed last time).
[1] openstack-k8s-operators/lib-common#582
[2] openstack-k8s-operators/glance-operator@530a2bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have much size constraints here, I'm wondering if it would be a better choice to expose Affinity/AntiAffinity as provided by k8s and avoid the need of documenting these parameters. The lib-common wrapping made more sense in a scenario where we passed affinity
as override in a form like:
...
glanceAPIs:
default:
override:
affinity:
preferred:
weight: 70
selectorKey: glance
selectorValues:
- "glance"
- "foo"
- "bar"
topologyKey: "topology.kubernetes.io/zone"
In our case we can rely on [1] (which is the same syntax exposed in StatefulSets/Deployments/etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot comment on the glance commit directly so noting it here. There is a k8s API guide section about object references that worth reading. One thing that I noted from it is that we should call the field on the glance side as topologyRef instead of just topology.
Also looking that the examples there they always using a nested struct not just a simple string field even if the only useful field in that struct is a name field. So I'm wondering if this is a way to make the ref forward compatible when new fields are needed as adding fields to a struct is easier than replacing a string field with a struct field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in Glance we can follow the pattern referenced in [1] and use topologyRef
as struct w/ all the required parameters (e.g. Namespace):
topologyRef:
name: foo
namespace: foo-namespace
If this proposal moves forward, I will base the Glance PR on [1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, what would be the way to define a default topology for Glance? On the Glance CRD side we cannot define a default directly now.
- We can define a default in the glance controller code and document it at the ref field but that feels a bit hacky (even though mariadbaccount handling works similarly as if not provided the service controller creates a default one).
- We can define a default name for the ref in Glance CRD, e.g. glance-topology and in our DTs/VAs define that topology CR with the defaults we need. (this will be a bit error prone as if somebody miss the creation of that CR Glance will refuse to deploy)
- We can add a layer of logic in the openstack-operator controller to that creates default Topology CR(s) automatically if it detects that no user defined one are passed to Glance. (this creates a bit of reversed logic that the openstack-controller defines the default of a service level field, but might be fixed to move the defaulting logic to a function in the glance api module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we already have a default Affinity
policy driven by what we defined in lib-common [1], Topology today is not generated at all.
In my mind, and because I don't need a default because [1] and NodeSelectors
cover the default scenario, until the openstack-operator
side of things is sorted, this would be an "advanced configuration" that you can only apply creating a separated CR and consuming it in the service operator.
Being that said:
- point 1 would be a no go
- We might need something in architectures repo to match the reference architecture we would like to propose ( stretched ctlplane vs DCN vs the combination of the two)
- This is interesting: the simple use case would be: I want the same distribution applied to the whole ctlplane; in this case it would make sense to generate a
default
policy from openstack-operator and pass it down if there's no override at the service level.
On point 1, if I define my topologyRef
like:
// +kubebuilder:validation:Optional
// Topology to apply the Policy defined by the associated CR referenced by name
Topology *TopologyRef `json:"topologyRef,omitempty"`
...
// TopologyRef -
type TopologyRef struct {
// +kubebuilder:validation:Optional
// Name -
Name string `json:"name,omitempty"`
// Namespace -
Namespace string `json:"namespace,omitempty"`
}
I can still create my Glance
CR without any Topology reference, and keep it backward compatible with my previous version (this is why it would be an advanced configuration).
Also I was thinking that this struct
(TopologyRef
) is something we could define here or in lib-common
, so we hide this detail from the service operator APIs, and any change on that struct only need a bump of infra-operators/apis
.
[1] https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/common/affinity/affinity.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, right now at least, I think there is no need to create a default topology, or the openstack-operator could create a default topology, which maps to what we have today. If a user wants/needs to have a specific topology for the service, they'd have to created and set the ref on the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I'm also ok to not go too far with the implementation until we have a clear view of the direction we want to take: if we define a proposal that allows the services API to grow without necessarily hitting the size limit issues we have today, and we decide to inline
this struct into v2, we risk to double pay the price of an opinionated implementation. I think the first step in my mind is to allow services to consume a certain topology/affinity, and being able to create a CR that can be used for that purpose is the first step.
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/733699b4849940629a1ec927e6f8dbc0 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 43m 19s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I understand that we cannot add this input to all of our service CRDs as then OpenStackOperator CRD will be over the size limit. So we need to start adding new input to the control plane by reference instead of by containment. If we go with this route then I suggest to take a step back and think about what we really want to add as a ref and what we want to add directly to our CRD as a field. Right now I think using the ref approach here just because we happened to run out of CRD space around here/now. I would like to see a more methodical approach where we decide what type of information should be contained and what can be referred based on the usage / lifecycle / meaning of the fields not just based on luck.
An alternative cut up could be to stop including service CRDs into the OpenStackControlPlane CRD just reference them by name and ask the human deployer to create each service CRD one by one. I think the OpenStackControlPlane and Version CRD still can orchestrate the minor update via references. This structure would feel more natural to me as each service CRD has high cohesion while the coupling across service CRDs is low so they don't need to be tight together into the OpenStackControlPlane struct.
Log.Info("Input maps hash", "HashName", common.InputHashName, "Hash", hash) | ||
} | ||
|
||
instance.Status.Conditions.MarkTrue(condition.ServiceConfigReadyCondition, condition.InputReadyMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels mixing InputReady with ServiceConfigReady.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should simply get rid of the controller if we agree there's nothing we need to compute here. Maybe can this be useful to run "validations" against the input parameters of the CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are complex validations that needs to read the state of the cluster then sure we need a controller for that. If it are CR level validations then we are good with a webhook and no controller is needed. I agree to delete the controller if we don't see a use for it now.
// TopologyStatus defines the observed state of Topology | ||
type TopologyStatus struct { | ||
// Conditions | ||
Conditions condition.Conditions `json:"conditions,omitempty" optional:"true"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is no controller logic (as today) then we probably simplifly the status struct heavily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my previous comment, probably we don't need a status, but let's revisit this part further.
That is an approach we could take in a long term run, but I think today we're not there yet (maybe it will take ~6months to align what we have and not break existing deployments). |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d6ff072498354c4aa25f05fc905ea216 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 34m 58s |
352fd95
to
7e07d12
Compare
"Find a solution for this problem is a matter of time," I would like to challenge that. The whole 18.0 GA was a rush and we are keep rushing forward event though we already see the limits of our current approach. Adding the topology as a separate CRD now just because we have no time to fix our strategy will build up more technical debt for the future, creating even more necessary change in the future when we want to correct our structure and therefore making it less likely that we will be able to actually do a proper shift of strategy later. I know that it is not you and me to decide this alone and I happy to argue this on bigger forums as well. I will be similarly against adding the Watcher CRD as a ref while keeping all other service CRDs as a containment just because Watcher is created now where there is no more space to contain it. |
Regardless of the rush for GA (that btw still continue for FR2), I'm wondering if this approach prevents or is in conflict with an eventual review of the ctlplane based on referencing CRs. In this proposal I tried to put |
I cannot decide if having a topology CRD will conflict with a control plane v2 as I don't have a control plane v2 plan in my head so I cannot judge this move based on that. What I feel is that if we need to cut then I would cut across clear functional boundaries. And for me the clearest boundary is between the ControlPlane CRD and the Service CRDs today, not between the ControPlane CRD and the Topology CRD. And if we go with the Sevice CRDs cutout then having a Topology CRD does not make much sense to me. The topology of an openstack service is highly coupled to the CRD representing the given openstack service. So If we need further cuts after the Service CRDs cut out then maybe the next step would be cutting out per openstack service CRDs like NovaAPI from the Nova CRD or maybe decomposing a NovaAPI CRD into a StatefulSet CRD and NovaAPI helper CRD. Another reason I suggest to do the top level Service CRD cutout from the ControPlance CRD is that the pieces we cut out are already exists and well defined (i.e. Nova CRD). This means we does not need to design and evolve new structs. And upgrading an existing deployment to v2 has a limited impact on the control plane as the existing Service CRs does not need to change, just the ownership of them would change. In contrast dissolving the Topology CRD in v2 and having contained the info in the Service CRDs seems like a more complicated upgrade path as the Service CRs need to get a new schema and the filling of the new fields needs to come from the Topology CRs. "I don't see a ctlplane v2 as something that will be here soon." "I can put everything on hold," So yeah I'm -1 on this approach at least until it is discussed further and a reasonable commitment / preliminary plan for v2 is in place. But I guess my -1 can be overwritten by the core team. |
Are you suggesting to split the CRD and include only one of them (e.g. the helper CRD) in the openstack-operator? Or apply the byRef patter entirely on the top-level Nova CR? I think I see your point but we should clarify how the user experience works in this case. If I understand you correctly the top-level CRD has stable structs (especially for core fields) and they are pretty defined and won't change the size of the openstack control plane during the time. What I'm not sure is how the helper CRD would work in that case (I assume you want the user to patch the Nova CR for any change applied to Nova components, right?)
I'm not saying this: even in a v2 world no one prevent us to still use the
I think @dprince (who is going to hate me because of this long thread :D) has already plans for a v2: even if the implementation would require time (fwiw, is something we can help with to speedup the process) - an hypothetical FR3 milestone - with the
No need to override the -1, I think this conversation represents the most important part of the patch as we are explicitly raising the problem, writing ideas (either in English or by POCs) and look for a solution that won't present code refactoring challenges in the future. I think >1y ago ( |
iirc in our last thu meeting we discussed to talk about a possible v2 this week. it might help to wait for this meeting to see how this PR could fit? |
I'm ok with that. A potential problem with this patch could be that we start splitting the |
I don't have a fully detailed out plan in my head, more like a direction to plan towards. I think the main configuration interface after the cut would be the Nova CR not the OpenStackControlPlane CR. but I acknowledge that there are inputs to coordinate across multiple top level service CRs so we need a solution for that coordination. Either it is top down, and then it means we still need a service Core spec included in OpenStackControlPlane or it is bottom up (like in the case of you proposal) when the service CR controller needs to look up some other CRs to read the common configuration out. I feel (but without more details I cannot be certain) that the bottom up approach is actually leads to a more scaleable design due to less centralization.
My point here are twofold. i) I feel that Topology CRD will not be a main cut point in v2. ii) I think we need to be intentional about the cost of inlineing it later. I haven't written any conversion webhook so for me it is hard to judge the cost of it, so I feel this as a risky move. If other did such conversion before then I would be happy see their judgement about the cost of it.
In my eyes it is a long thread, but a valid discussion to have. Maybe we can do the discussion in other format. But personally I like async, written discussion (and I know other might not like this form). I'm happy to hear that the Toplogy issue triggered the overall v2 discussion. I'm looking forward to know more about the planned v2 changes to see the big picture.
For me k8s stack was new too. But to be clear we saw this CRD limit already a year ago, and so far we only worked around the problem instead of trying to solve it. I'm happy to see that this time we will not just trying to workaround the problem but actually discuss a different structure that potentially playing nicer with the constraints of the tech stack we choose. tl;dr: My goal here is not just to solve the immediate problem, but look at the big picture and create a solution that works with the given CRD size limits long term. And for that this discussion is helpful as well as looking at the v2 proposal from Dan thihs week. |
This change depends on a change that failed to merge. Change openstack-k8s-operators/lib-common#587 is needed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fmount The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4bc86cfea9964df1b240db2ef5631623 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 00m 47s |
Topology CRD is made by two different structs imported and abstracted from PodSpec [1]: 1. TopologySpreadConstraint 2. Affinity/AntiAffinity The above seems enough to draft a dedicated CR instead of exposing those parameters through the service operators' API. In addition, Affinity/AntiAffinity is imported from PodSpec [1] and can override the default affinity policy defined in lib-common. [1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec Signed-off-by: Francesco Pantano <[email protected]>
Topology CRD is made by two different structs imported and abstracted from PodSpec [1]:
The above seems enough to draft a dedicated CR instead of exposing those parameters through the service operators' API. In addition, Affinity/AntiAffinity is wrapped in lib-common and not imported as is from PodSpec.
NodeSelector can be imported here in future.
Depends-On: openstack-k8s-operators/lib-common#587
[1] https://pkg.go.dev/k8s.io/api/core/v1#PodSpec