-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure MariaDBAccount gets created with MariaDBDatabase #858
Conversation
Currently the MariaDBAccount gets created in an early step before the password secret gets validated to be there. In case the service password is missing the deployment stops after the MariaDBAccount is there. If one deletes the ctlplane at this point, the nova-api MariaDBAccount won't be deleted because the loadDatabaseAndAccountCRs() will not return the account because the MariaDBDatabase object was not created. With this the nova-api MariaDBAccount remains with a finalizer. When the password secret now is created with a new ctlplane, the old nova-api MariaDBAccount conficts with the new deployment because it will not be created in the db instance and all nova tasks to initialize its DB fail with an access error. This change moves creating the nova-api MariaDBAccount right before creating the MariaDBDatabase. This reduces the situation that there will be a MariaDBAccount for nova-api without its MariaDBDatabase. Currently this situation could also happen when the service password is there, but galera is not created properly, like DB root pwd missing. Jira: OSPRH-10167 Signed-off-by: Martin Schuppert <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stuggi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
so here's the thing (well two things).
|
afaik this is currently specific to nova because:
|
I guess im not following "In case the service password is missing" , what's "the service password" here? EnsureMariaDBAccount generates a password and a secret so that part is not missing |
or otherwise I'm just curious to know how to reproduce the problem. the PR is fine |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/76adc48b245a4e0d92e1bb9310369723 ✔️ openstack-meta-content-provider SUCCESS in 1h 57m 56s |
sorry with service password I meant the OSP user/password you provide in the osp secret. you can easy reproduce it as I mentioned in the jira. just
|
so the keystone password is ment to come form the secrete but the database passowrkd will eventually not use that at all and be passed into us |
by the way, as @zzzeek noted this code should not be required in the ga release assuming we have I'm inliced to just delete this code entirely as non of our docs should be using this and we did not intent support creating them this way in 18.0 ga |
we put MariaDBAccount creation in openstack-operator? I missed that |
you cant remove this code entirely as it handles the case where the MariaDBAccount CR attribute changes to a new name, this routine then generates a new username/password for that new name. this is in the documentation |
unless we move that to openstack-operator also... |
don't think any code landed in the openstack-operator to pre-create the accounts. |
Yeah I don't think either that the MariaDBAccount creation is moved to the openstack-operator. |
Looking into the problem I think the issue is that loadDatabaseAndAccountCRs() assumes that both the MariaDBDatabase and the MariaDBAccount objects exists at the same time (or not present at the same time). But as these two are two separate CRs they can exists independently. One example of such situation is described by @stuggi in this PR above, but there could be other cases, like any failure that prevents MariaDBDatabase creation followed by a ctrlplane deletion will lead to stuck MariaDBAccounts. Therefore this change is just a band aid but does not remove the root cause of the issue. I think we should separate the handling of the two CRs into two set of functions to avoid that if one is missing then the other is ignored. I think we can keep the Database struct if we want but we should at least needs to make GetAccount() and GetDatabase() to load the CRs independently and replace the usage of GetDatabaseByNameAndAccount() in the service operators with those independent calls. |
yes, thats actually what I tried to describe in my description of the pr what the issue is with loadDatabaseAndAccountCRs().
That is correct, we have to fix the underlying issue, but this PR would at least make the nova-operator to behave like the others. |
recheck |
If there is an agreement that we will fix the underlying issue then I'm fine merging this PR as a temporary measure. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/29beb6e4303f4042b0ba810fee60b6d1 ✔️ openstack-meta-content-provider SUCCESS in 1h 35m 53s |
So good news it looks like GetDatabaseByNameAndAccount across various operators seems to only be in the delete flow now, which is a pretty limited spot. If the actual issue here is simply that the deletion needs to locate the MariaDBDatabase and MariaDBAccount separately so that the MariaDBAccount gets deleted in the absense of MariaDBDatabase, why dont we change that directly here and leave the EnsureMariaDBAccount part of things where it is? Basically I can undertake the task of splitting out GetDatabaseByNameAndAccount across all operators and we can deprecate that function call. |
I've implemented this approach in #862 for nova, making use of a new it needs a new mariadb API version but once we confirm that's good to merge and get it in, it's a two line change for all operators that are using the old pattern. |
closing in favor of #862 |
Currently the MariaDBAccount gets created in an early step before the password secret gets validated to be there. In case the service password is missing the deployment stops after the MariaDBAccount is there.
If one deletes the ctlplane at this point, the nova-api MariaDBAccount won't be deleted because the loadDatabaseAndAccountCRs() will not return the account because the MariaDBDatabase object was not created. With this the nova-api MariaDBAccount remains with a finalizer.
When the password secret now is created with a new ctlplane, the old nova-api MariaDBAccount conficts with the new deployment because it will not be created in the db instance and all nova tasks to initialize its DB fail with an access error.
This change moves creating the nova-api MariaDBAccount right before creating the MariaDBDatabase. This reduces the situation that there will be a MariaDBAccount for nova-api without its MariaDBDatabase.
Currently this situation could also happen when the service password is there, but galera is not created properly, like DB root pwd missing.
Jira: OSPRH-10167