Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial take on ADR reasoning the move from Azure managed db to in-cl… #583

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

ITViking
Copy link
Contributor

…uster db

What does this PR do?

Add an ADR describing why we are moving the database into the cluster instead of moving to another managed DB.

Should this be tested by the reviewer and how?

It should be read. Please request some changes if you think something is missing.

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-285

@ITViking ITViking self-assigned this Jan 20, 2025
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kasperg kasperg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the feelings going into this but to me this is too biased. The switch away from Azure Database for MariaDB is the initiator of this change. Recent experiences pushed the urgency of this change. The ADR should reflect this.

I also have some more comments but as this point I want to signal that I do not think that this can be accepted as is.

@ITViking
Copy link
Contributor Author

ITViking commented Jan 21, 2025

@kasperg please provide your thoughts. This needs to get done by today as agreed on the last meeting.

Copy link
Contributor

@hypesystem hypesystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the context section is way too long. It gets rambly and doesn't stick to the relevant background for the concrete decision.

It needs to be useful for future readers looking back at what the factors were, when we made this decision. Right now it reads as if the primary reason was frustration. I don't think that's accurate: there are significant objective factors that disappear in the impassioned appeal.

The pro/con list and final decision are fine imo.

docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
Copy link
Contributor

@hypesystem hypesystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, and is much more to the point!

I added some tiny notes on language and request for a few clarifications. With those, in my opinion, we're golden! ⭐

docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kasperg kasperg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read this through a couple of times now and provided some suggestions for you to consider.

My intention is to add documentation, improve structure and reduce emotional appeal.

docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
Comment on lines 8 to 39
The project has experienced project wide down time due to database crashes on
multiple occasions:

- 17.01.2025: Database went down due to a restart initiated by the Azure Product
Group team taking unsanctioned action
- 15.01.2025: Database went down due to a restart initiated by the Azure Product
Group team taking unsanctioned action
- 9.01.2025: Database went down as a result of to many login requests
- 28.11.2024: Database went down due to too many active connection attempts
- 7.11.2024: Database went down due to too many connections
- Summer 2024: Database was restarted by Azure for unscheduled and unnotified
maintenance

Microsoft's database server logs are prohibitively expensive.
Microsoft misconfigured the database provision-time.
The current Microsoft support tier is useless and useful support is
prohibitively expensive.
Microsoft support has made unsanctioned changes on the databse, such as
maintenance work during business hours and restarts in attempts to fix support
tickets.
It is not possible to access the database server.

Installing a MariaDB Sql operator in the cluster provides the following
benefits:

- access to the database server logs
- access to the server itself
- a guarantee that no one but the platform team can take action on the server
- ability to right-size the database
- Running clusters of databases
- Database recovery
- The ability to start recovering from a crash immediately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it these are pros and cons are partly duplicated below. I appreciate the pros & cons structure so I suggest restructuring this to fit into the sections below.

If we do this I think the current content should be replaced with something like:

"The database is a critical part of the sites managed by the platform. Downtime or degradation of service for the database is usually reflected to all applications running on the platform.

Our experience with using Azure MariaDB up until now requires us to consider alternative solutions instead of just going with the recommended solution. The two solutions we have considered are Azure Database for MySQL and moving to running the database ourselves within our Kubernetes cluster."

Can we provide any reasoning for why we do not seriously consider other alternatives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I think it is worth adding a bit more context into what we mean by "In-cluster database".

I am not an expert here but as I see it there might be multiple ways to run a database in a cluster. I think it would be good if we could provide more info regarding what we actually have in mind, preferably combined with a link or two if we are referring to specific solutions/products/architectures e.g. Kubernetes Operator for MariaDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database is a critical part of the sites managed by the platform. Downtime or degradation of service for the database is usually reflected to all applications running on the platform.
This is already in the document. Do you find it unclear?

Can we provide any reasoning for why we do not seriously consider other alternatives?
I don't think I know of any architecturally sound alternatives to the two options we have at hand.

Also: I think it is worth adding a bit more context into what we mean by "In-cluster database".
Please elaborate on what you think should be mentioned

I think it would be good if we could provide more info regarding what we actually have in mind,
I agree, but we haven't decided on an architecture yet. We should circle back to this when we have made a decision.

docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
docs/architecture/adr/adr-006-in-cluster-db.md Outdated Show resolved Hide resolved
@ITViking
Copy link
Contributor Author

@kasperg a lot of good comments, suggestions and revisions you provided here. I appreciate the effort put into this.
I believe I have answered all you comments and suggestions. A few require and answer from you

@ITViking ITViking requested a review from kasperg January 28, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants