Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement seed mechanism #4

Merged
merged 14 commits into from
Jun 7, 2024
Merged

implement seed mechanism #4

merged 14 commits into from
Jun 7, 2024

Conversation

domcyrus
Copy link

No description provided.

@domcyrus domcyrus changed the base branch from database-request-crd to main April 29, 2024 08:38
@domcyrus domcyrus requested a review from shreddedbacon May 6, 2024 07:44
@domcyrus
Copy link
Author

domcyrus commented May 6, 2024

@shreddedbacon is there anything missing here?

@domcyrus domcyrus self-assigned this May 6, 2024
@shreddedbacon
Copy link
Member

@shreddedbacon is there anything missing here?

Need to set aside some time for a review

@shreddedbacon
Copy link
Member

@domcyrus can you rebase/merge main latest into this so the e2e action can run?

@shreddedbacon
Copy link
Member

shreddedbacon commented May 21, 2024

I can't get this to work, I don't think the implementation is complete. The e2e tests don't have appear to have a test where a DatabaseRequest is created to consume the seed, otherwise some errors would have been discovered:

The seed example has host but the code is looking for hostname in the secret. Creating a DatabaseRequest that uses this seed results in the controller returning an error.

If I update the secret to use hostname and create a new DatabaseRequest to use this seed secret the controller errors with the following:

mysql db drop failed due to connection reference is missing

This is because the provider reference is missing?. I guess there is a missing step where the seed gets correctly checked against the available providers and will add the required reference.

I think the seed logic should probably check if the requested scope + the hostname/port in the seed have a match, if so, add the provider reference

For example, if i have the following providers

apiVersion: crd.lagoon.sh/v1alpha1
kind: DatabaseMySQLProvider
metadata:
  name: development-providers
  namespace: lagoon
spec:
  scope: development
  connections:
    - name: development1
      hostname: mysql1-dev.local
      <snip>
    - name: development2
      hostname: mysql2-dev.local
      <snip>

But then I want to create 2 Database requests to either use existing credentials that may exist, or have the controller create these credentials on the requested host:

apiVersion: v1
kind: Secret
metadata:
  name: mysql-seed-secret
  namespace: default
type: Opaque
data:
  database: c2VlZC1kYXRhYmFzZQ==
  password: c2VlZC1wYXNzd29yZA==
  username: c2VlZC11c2VybmFtZQ==
  #hostname: mysql2-dev.local
  hostname: bXlzcWwyLWRldi5sb2NhbA==
  port: MzMwNg==
---
apiVersion: crd.lagoon.sh/v1alpha1
kind: DatabaseRequest
metadata:
  name: dbaas-request-development
  namespace: default
spec:
  scope: development
  type: mysql
  name: mysql-db-service
  seed: 
    name: mysql-seed-secret
    namespace: default
---
apiVersion: v1
kind: Secret
metadata:
  name: mysql-seed-secret2
  namespace: default
data:
  database: c2VlZC1kYXRhYmFzZQ==
  password: c2VlZC1wYXNzd29yZA==
  username: c2VlZC11c2VybmFtZQ==
  #hostname: mysql3-dev.local
  hostname: bXlzcWwzLWRldi5sb2NhbA==
  port: MzMwNg==
---
apiVersion: crd.lagoon.sh/v1alpha1
kind: DatabaseRequest
metadata:
  name: dbaas-request-development-broken
  namespace: default
spec:
  scope: development
  type: mysql
  name: mysql-db-service2
  seed: 
    name: mysql-seed-secret2
    namespace: default

Then I would expect that the dbaas-request-development would correctly be seeded and the appropriate provider reference added.

The dbaas-request-development-broken should fail, because there is no provider in the development scope that matches mysql3-dev.local

@domcyrus
Copy link
Author

I'm on holiday this week. I'll have a look next week.

@domcyrus
Copy link
Author

@shreddedbacon I didn't want to make the seed mechanism depend on a provider but if you want this I can adapt the code to do that.

@domcyrus
Copy link
Author

I've updated the test files so that it does actually create a full test. Note that I didn't change anything in the code but the resulting secret and service do look fine for me:

apiVersion: v1
items:
- apiVersion: v1
  data:
    database: c2VlZC1kYXRhYmFzZQ==
    hostname: bXlzcWwtc2VydmljZS5teXNxbA==
    password: c2VlZC1wYXNzd29yZA==
    port: MzMwNg==
    username: c2VlZC11c2VybmFtZQ==
  kind: Secret
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"v1","data":{"database":"c2VlZC1kYXRhYmFzZQ==","hostname":"bXlzcWwtc2VydmljZS5teXNxbA==","password":"c2VlZC1wYXNzd29yZA==","port":"MzMwNg==","username":"c2VlZC11c2VybmFtZQ=="},"kind":"Secret","metadata":{"annotations":{},"name":"mysql-seed-secret","namespace":"default"},"type":"Opaque"}
    creationTimestamp: "2024-05-27T14:46:53Z"
    name: mysql-seed-secret
    namespace: default
    resourceVersion: "1032"
    uid: 4bb29c0a-7e5e-4d20-bdc1-f0eb585c78c1
  type: Opaque

Checking base64 values:

base64 -d <<< bXlzcWwtc2VydmljZS5teXNxbA==
mysql-service.mysql%                                                                                                                        
base64 -d <<< c2VlZC1kYXRhYmFzZQ==        
seed-database%                                                                                                                              
base64 -d <<< MzMwNg==            
3306%                                                                                                                                       
base64 -d <<< c2VlZC11c2VybmFtZQ==
seed-username%

Therefore I'm not really sure why it shouldn't have worked for you...

@shreddedbacon
Copy link
Member

@shreddedbacon I didn't want to make the seed mechanism depend on a provider but if you want this I can adapt the code to do that.

The main usage of the seed was primarily to make it easy for a Lagoon build to update references from an old MariaDBConsumer to the new DatabaseRequest without having to transfer any data. Sort of a low effort migration, but mainly used as a way to start leveraging the new controller sooner.

Currently Lagoon and the dbaas operator provisions databases like so:

  • old dbaas operator has providers:
    • development-cluster1 of type development
    • development-cluster2 of type development
  • Lagoon creates a MariaDBConsumer with a type of development which requests the old dbaas operator to provision a DB against one of the 2 development providers.
  • Lagoon waits for the consumer to be populated with the data, and then sets some environment variables into the lagoon-env configmap
  • deployments have the lagoon-env mounted into all pods.

What we want is to use the new DatabaseRequest mechanism and seed it with the data from the MariaDBConsumer

The result would be something like this:

  • new dbaas controller has providers:
    • development-cluster1 of scope development
    • development-cluster2 of scope development
  • Lagoon will check if a MariaDBConsumer already exists in the namespace
    • If it does, Lagoon build will use it to create the seed secret pulling from it the hostname, port, username, password, username
      • Creates the DatabaseRequest using the seed, since the dbaas controller will know that the hostname requested is the same as one that the old dbaas operator used.
    • If no existing MariaDBConsumer, it will just create an unseeded DatabaseRequest
  • new dbaas controller will then create the services and secret like it normally would
  • Lagoon build will wait for the DatabaseRequest to "succeed"
  • Lagoon will remove the old variables that the MariaDBConsumer added to the lagoon-env and associate the new secret that the dbaas controller created to the environment instead
    • Lagoon will add a label to the new secret and then deployments will automatically consume it because of this label, rather than having to populate the lagoon-env file.
  • Lagoon will handle removing the old MariaDBConsumer from the namespace somehow

All that the dbaas controller needs to do is:

  • Check that the seed secrets hostname, and the DatabaseRequest scope have a match in the dbaas controller list of providers
    • If there is no matching provider, then the DatabaseRequest should fail. How can it provision and deprovision against something it has no knowledge/access to?
    • If there is a matching provider
      • Check if the database, username, password exist in the provider already
        • If the username and password work, but the database doesn't exist, then the request should fail
        • If the username and password fail to authenticate, but the database does exist, then the request should fail
        • If the username, password, and database exist, then then request should succeed (no need to create anything)
        • If the username, password, and database don't exist, then the request should succeed (create the user, pass, db)
      • Create the services and secret that would normally be created using the seed data
      • optionally: delete the seed secret once complete? (not fussed either way, just one less thing hanging around after it is done)

@shreddedbacon
Copy link
Member

shreddedbacon commented May 28, 2024

I've updated the test files so that it does actually create a full test. Note that I didn't change anything in the code but the resulting secret and service do look fine for me:

Therefore I'm not really sure why it shouldn't have worked for you...

I could create the seed secret fine. But when I created a DatabaseRequest using that seed, the dbaas controller failed to actually do anything with the seed.

I don't see anywhere obvious in the tests where the DatabaseRequest status is checked to confirm that the request was completed and a kind: Service and kind: Secret that would normally be created, are created?

Edit: I think I see it now (github didn't refresh properly for me initially so I missed the last commit), I will have another run through when I get some more time

@domcyrus
Copy link
Author

@shreddedbacon as mentioned on slack I've merged the other two PRs and rebased it. Let me know if you need some more information on this PR.

@shreddedbacon
Copy link
Member

Yep, did you see my previous comment: #4 (comment)

@domcyrus
Copy link
Author

A sorry I see... This (#4) sounds like a good plan to me. I'll update the PR today.

@domcyrus
Copy link
Author

@shreddedbacon I've now implemented it in the way you've described above #4

@domcyrus
Copy link
Author

domcyrus commented Jun 3, 2024

@shreddedbacon let me know if you need more details about something or if something else is still missing. Thanks.

@domcyrus
Copy link
Author

domcyrus commented Jun 4, 2024

After today discussion:

  • test for checking when the seed username+password work but the database is not existing
  • test if the database is available

Copy link
Member

@shreddedbacon shreddedbacon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good. Only concern I have is that a DatabaseRequest that has fails to seed continues to be reconciled flooding the logs with reconcile errors.

This seems unnecessary. The DatabaseRequest should also probably go into a failed status that is easier to see

status:
  conditions:
    - lastTransitionTime: '2024-06-05T03:46:20Z'
      message: >-
        mysql db find connection from seed failed
      reason: Failed
      status: 'True'

@@ -112,6 +117,32 @@ func (ri *RelationalDatabaseImpl) Ping(ctx context.Context, dsn string, dbType s
}

if err := db.PingContext(ctx); err != nil {
if dbType == mysql {
log.FromContext(ctx).Error(err, "Failed to ping MMMMMySQL database")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on MySQL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove... It was some leftover from some testing.

@domcyrus
Copy link
Author

domcyrus commented Jun 5, 2024

This is all good. Only concern I have is that a DatabaseRequest that has fails to seed continues to be reconciled flooding the logs with reconcile errors.

Regarding flooding the logs:

The DatabaseRequestReconciler will get retried using the normal controller-runtime ratelimit. If we want to tweak that we could set the ratelimiter RateLimiter: workqueue.NewItemExponentialFailureRateLimiter(baseDelay, maxDelay).

Another idea would be to not log anything if nothing in the seed has changed and the error in the status condition is related to the seed. In that case the DatabaseRequestReconciler needs to store the seed and compare it which is possible but maybe a bit brittle.

I would vote to rather keep the current default runtime behavior which will solve handle the same error by increasing the time to retry. It will log a few times the same error at the beginning but later on it won't do that often anymore. I think this is a good behavior because it fixes issues automatically and plays nice with kubernetes eventual consistency concept.

This seems unnecessary. The DatabaseRequest should also probably go into a failed status that is easier to see

status:
  conditions:
    - lastTransitionTime: '2024-06-05T03:46:20Z'
      message: >-
        mysql db find connection from seed failed
      reason: Failed
      status: 'True'

I've added an additional Error status condition.

@domcyrus domcyrus requested a review from shreddedbacon June 5, 2024 06:25
@shreddedbacon
Copy link
Member

This is all good. Only concern I have is that a DatabaseRequest that has fails to seed continues to be reconciled flooding the logs with reconcile errors.

Regarding flooding the logs:

The DatabaseRequestReconciler will get retried using the normal controller-runtime ratelimit. If we want to tweak that we could set the ratelimiter RateLimiter: workqueue.NewItemExponentialFailureRateLimiter(baseDelay, maxDelay).

Another idea would be to not log anything if nothing in the seed has changed and the error in the status condition is related to the seed. In that case the DatabaseRequestReconciler needs to store the seed and compare it which is possible but maybe a bit brittle.

I would vote to rather keep the current default runtime behavior which will solve handle the same error by increasing the time to retry. It will log a few times the same error at the beginning but later on it won't do that often anymore. I think this is a good behavior because it fixes issues automatically and plays nice with kubernetes eventual consistency concept.

This seems unnecessary. The DatabaseRequest should also probably go into a failed status that is easier to see

status:
  conditions:
    - lastTransitionTime: '2024-06-05T03:46:20Z'
      message: >-
        mysql db find connection from seed failed
      reason: Failed
      status: 'True'

I've added an additional Error status condition.

I think this behaviour and lagoon are a bad combination generally. But I think we can be a bit smart about the things that retry and continue trying to reconcile until success, and things that are deemed a hard failure and to not retry them with.

Some parts of a Lagoon build are more suited to a hard failure than a "wait for" condition, that's all.

@domcyrus
Copy link
Author

domcyrus commented Jun 5, 2024

I understand. If we prefer to return success and not retry on an invalid or broken seed, that works fine for me.

I was thinking that it might make sense to retry. For example, if a platform engineer is alerted, they could inspect the seed secret and fix it manually. In that case retrying would allow the system to automatically correct itself. Without retrying, the system will not have the opportunity to autocorrect.

@shreddedbacon
Copy link
Member

I was thinking that it might make sense to retry. For example, if a platform engineer is alerted, they could inspect the seed secret and fix it manually. In that case retrying would allow the system to automatically correct itself. Without retrying, the system will not have the opportunity to autocorrect.

In theory this is fine. But it could end very badly if something is done incorrectly. In this case, if a seed fails it should not retry. The lagoon build will drop a warning and continue to use the MariaDBConsumer. The warning will inform of a failure to convert and require the customer to ask for help.

If it was to continue to retry, and someone from platform stepped in and resolved the issue incorrectly, then it could potentially result in outages or downtimes. A failure to convert from MariaDBConsumer to DatabaseRequest is not something that would be resolvable without involving a customer basically.

Because of some very old bugs in Lagoon builds, there are some MariaDBConsumers out there that are incorrectly provisioned.

I'll have to put together documentation about how to recover from these situations though when we get closer to doing this within a build though, because it could impact opensource users as well as AIO directly.

@domcyrus
Copy link
Author

domcyrus commented Jun 6, 2024

I've changed the code to not retry on specific seed errors. This triggered tests on seed to randomly fail, because of a race condition when creating the seed database. I've fixed it with a sleep. I think it makes a good example of why retries are useful.

PS: I'll certainly remove the comments about the retry it is just to illustrate on why they might be useful.

@shreddedbacon
Copy link
Member

I've changed the code to not retry on specific seed errors. This triggered tests on seed to randomly fail, because of a race condition when creating the seed database. I've fixed it with a sleep. I think it makes a good example of why retries are useful.

PS: I'll certainly remove the comments about the retry it is just to illustrate on why they might be useful.

Since the sleep is only in the test suite to give the mysql pod time to start up for the test to run, then this is somewhat acceptable to me, I'm sure the test could also be refactored to ensure the pod is running though instead of an arbitrary sleep too.

@domcyrus
Copy link
Author

domcyrus commented Jun 7, 2024

@shreddedbacon Is there anything else needed?

Copy link
Member

@shreddedbacon shreddedbacon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some quick verifications with emulating how a build may create the seed and request and this works great. The failure case with the seed is ideal for how Lagoon builds work to prevent unexpected behaviours.

Thanks Marco!

@shreddedbacon shreddedbacon merged commit 378d037 into main Jun 7, 2024
3 checks passed
@shreddedbacon shreddedbacon deleted the database-request-crd3 branch June 7, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants