Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial take on ADR reasoning the move from Azure managed db to in-cl… #583

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
1286f5c
initial take on ADR reasoning the move from Azure managed db to in-cl…
ITViking Jan 20, 2025
417b2cb
add another pro
ITViking Jan 20, 2025
60115b4
space around headers
ITViking Jan 20, 2025
03e079f
space around lists
ITViking Jan 20, 2025
3147881
add edits requested by Fini
ITViking Jan 21, 2025
2dfa856
rewrite context section more objectively
ITViking Jan 21, 2025
5c5c8be
remove old context and a two more points
ITViking Jan 21, 2025
f3a0d73
add a specific section on recovery and restore of the database(s)
ITViking Jan 21, 2025
927ac36
shorten and objectify context section further
ITViking Jan 22, 2025
9c9d30c
rework section on benefits of installing a MariaDB operator
ITViking Jan 22, 2025
a3b7a25
remove recovery and database sections since theses benefits are expla…
ITViking Jan 22, 2025
0141a55
adding requested changes
ITViking Jan 22, 2025
18143b1
remove line lamenting inability to access managed database server
ITViking Jan 28, 2025
e3553f3
link to page that informs MariaDB users that Managed MariaDB is being…
ITViking Jan 28, 2025
94f7fe0
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
575f69a
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
4bf0ad5
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
fceead2
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
ab452c4
add link to high-availability feature of MariaDB
ITViking Jan 28, 2025
8f43795
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
129c0fe
remove bullet on migration need
ITViking Jan 28, 2025
b5d558f
add high-availabililty as pro to manged db
ITViking Jan 28, 2025
16cf0ef
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
dafb7c2
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
856ab92
we also have to do updates ourselves, which is a con
ITViking Jan 28, 2025
7d08972
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
177eefe
Update docs/architecture/adr/adr-006-in-cluster-db.md
ITViking Jan 28, 2025
20a03bd
make it clear that we save money on more powerfull databases when we …
ITViking Jan 28, 2025
319659e
fix typos an akward sentences
ITViking Jan 28, 2025
ff1f105
requested changes to the ADR
ITViking Jan 28, 2025
a06317b
fix formatting issues
ITViking Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions docs/architecture/adr/adr-006-in-cluster-db.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# ADR 006: In-cluster Database

## Context

We have known, since summer 2024, that Microsoft had decided to sunset Azure
database for MariaDB. Their recommendation has been to migrate to their Flexible
MySql database which they have provided pretty well documented. Back then we had
ITViking marked this conversation as resolved.
Show resolved Hide resolved
already experienced database downtime, which prompted us to start a discussion
about wether to move the databse to Azure Flexible MySql server or to an
ITViking marked this conversation as resolved.
Show resolved Hide resolved
in-cluster database. This conversation has been running for the better part of
6-8 months.
ITViking marked this conversation as resolved.
Show resolved Hide resolved

The main points we kept debating was:

- Can we operate an in-cluster database on par or better than Azure?
- Are we able to deal with problems that might arise on the databases with out
Azures Technical Support?
- If we self host the databse, are we then able to optimize the performance?
- Can we guarantee a better or on par uptime to Azure's managed database if we
selfhost the database?
- Would we get better support if we moved to the suggested managed database?
- Would the performance be better on the database suggested by Azure?
- Would downtime be minimized if we moved to the sugggested manged database?
ITViking marked this conversation as resolved.
Show resolved Hide resolved

We have continually, and to a worsening degree, seen even more downtime since
summer. We have been cut-off from having insight into the logs of the database
due to an extremely steep price on server logs, which we tried enabling during
the summer. This resulted in a more than doubled operations cost, which the
operations budget in now way can bear.
ITViking marked this conversation as resolved.
Show resolved Hide resolved
We have on serveral occasions found the database to have been misconfigured, to
such a degree that it has caused unreasonable downtime.

The expertise and technical support of Azure has been a factor in favor of
migrating to Azure manged MySql database. As mentioned before we have already
seen Azure neglience to setup database configured to correctly to the specs
readily available from MariaDB's documentation on operating a database.
We have on multiple occasions experienced that the database has crashed due to
e.g. too many connection, too many slow queries, which should be the exact thing
the database should have been configured to handle in a reasonable way.
The reasonable behaviour in the afforementioned cases would have been, "too
many connections" and "Ressource busy", which should have resulted in end-user
experiences such as 500 errors and not project/nation wide downtime.

Conversations with the Azure Technical Support is seeped in effective solution
conversations. This is due to:

- Language barriers as the Azure support is handled by people whos English
proficiency is wildly varying.
- The tiered support which bars us from talking to the actual technical staff.
This results in a knowledge relay bottleneck, often resulting in
lost-in-translation situations.
- Tranparency issues stemming from somewhere inside of the technical support,
where information is withheld in such a way that it builds mistrust to Azure.

Lastly we have experienced Azure Technical Support taking unsanctioned actions
on the database, resulting in project wide downtime on serveral occasions. The
first time we exprienced this was summer 2024, where they restarted the databse
while doing some configuration. That last time was january 2025, where they
restarted the database multiple times trying to fix a bad database recovery.

Deranged has dealt with managed databases in the past, and has never experienced
a level of service as bad as this.
ITViking marked this conversation as resolved.
Show resolved Hide resolved

Deranged has been managing databases in-cluster for the last 7 years. During
this time, Deranged has managed the following database types:

- Postgress
- MongoDB
- MariaDB/MySql
- ElasticSearch
- Redis
- InfluxDB
ITViking marked this conversation as resolved.
Show resolved Hide resolved

By moving the database into the cluster, we gain:

- access to the database server logs
- access to the server itself
- a guarantee that no one but the platform team takes actions on the server
- access to the server logs and thus insight into how to tweak the databaes to
perform better
- Agency
- The ability to start recovering from a crash, should the need arise.

### Pros & Cons

#### Azure Managed DB
ITViking marked this conversation as resolved.
Show resolved Hide resolved

##### Cons

- Experienced to be poorly managed
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- Server logs is disproportionately priced = too expensive for this project
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- We don't have access to the server
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- The wait time for support can be long and incorrect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have had bad experiences but is this really one of them? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In part this is an attempt at shorten the experience with Azure Support. I do believe on of the issues with the Support is that we have to wait for them to have time to help us. In contrast, having the databases in-cluster, will allow immediate troubleshooting

- Server might be (was in our case) misconfigured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hypothetical. I think it is worth moving the mention of the misconfigured database into the bullet above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server was configured to match that of much more powerful server than the one we have, so I would say they definitely misconfigured our server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the first list going over the bad experiences we've had?

- Will require a migration to a potentially as bad service
ITViking marked this conversation as resolved.
Show resolved Hide resolved

##### Pros

- One-click setup
- Minimal configuration needed from us
- Azure support
- Azure can be blamed for downtime, they are directly or indirectly responsible
for.
- MySql, which is not set to be sunset, might have a noticably better service
as that is their chosen flavor to keep offering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all this can be reduced to something like "Continual use of managed database product".

Copy link
Contributor Author

@ITViking ITViking Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ADR's are to be readable to non-technical staff, the suggested one-liner leaves out crucial information, that would enable non-technical staff to understand the trade offs. Leaving it as is, elaborates on some of the points for using a managed database.

ITViking marked this conversation as resolved.
Show resolved Hide resolved

#### In-Cluster DB

##### Cons

- We have no one but ourselves to blame if something goes awry
- Logging and Monitoring has to be set up manually
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- All configuration has to be done ourselves
- Not possible to do PITR recovery at the moment.
ITViking marked this conversation as resolved.
Show resolved Hide resolved
ITViking marked this conversation as resolved.
Show resolved Hide resolved

##### Pros

- Ability to investigate logs
- Logging and monitoring can be set up to alert us to noteworthy changes
- A little cheaper on the face of it
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- Ability to split out databases such that one database having issues doesn't
cause all sites to crash.
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- Possible to run in a HA setup if we should decide to do so.
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- Performance can be optimized and finetuned to our usecase
- Databases will be located closer, logically as well as physically, to
workloads relying on them = faster response time, which should be noticable for
end-users.
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- We can right-size the databae, thereby getting the maximum performance for the
buck.
ITViking marked this conversation as resolved.
Show resolved Hide resolved

### Recovery and database restores

The in-cluser database will be deployed using MariaDB's operator.
This gives us the following advantages:

- Recovery is done by the operator, representing MariaDB's expertise on how to
best recover a crashed DB
- Database server splitting, enabling us to dividing the projects collective
sites out over serveral database servers, so one server crashing only affect
sites on that database server and not every site.
- No changes to single-site database restores

## Decision

We have made the decision to implement the In-cluster database due the
afforemention pros and cons lists.
We are tired of having a poorly managed database, with bad support.
Additionally both, we and the DDF, as well as the libraries themselves are
tired of having preventable downtime.
This way we take our agency back and ensure that we will not have an incoming
task of migrating the database later in the year, when Azure End-Of-Lifes
Managed MariaDB's.
ITViking marked this conversation as resolved.
Show resolved Hide resolved

We will start by implementing a PoC, where we can test for a good setup of the in-cluster
database before migrating all the databases to use the in-cluster databases.
The PoC will also be used for testing mgiration against.

Every step taken towards moving into in-cluster database shall done
transparantly.

The In-cluster database must:

- have ressource monitoring setup
- be able to have backups taken
ITViking marked this conversation as resolved.
Show resolved Hide resolved
- Be able to be restored
- Have log monitoring setup

## Status

Approved
Loading