Preventing deadlocks in Galera #1691
Wouter0100
started this conversation in
Development discussions
Replies: 2 comments 4 replies
-
I'm not so technical, but we "solve" the clustering issue with 2 installs of Postal and a Proxy in front. |
Beta Was this translation helpful? Give feedback.
3 replies
-
This, somewhat old, blog post suggests that deadlocks are a limitation of Galera but I don't know enough about multi-write SQL clusters to know if there is an easy resolution. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Last week I posted about having deadlocks in MySQL in Discord, but I'd love to discuss this a little bit more in-depth as I may be contributing a fix for this at some point (or I'll hire someone, as I do not have any experience with Ruby). And nothing is more sad then building something which doesn't get merged in the end. Let me explain the issue first.
Due to redundancy, we nearly always run MySQL in Galera (multi-master) setup. This has been working pretty well with Postal as well, but we've seen one (major) issue. When we ramp up the sending of e-mails things deadlock. This is because we have a loadbalancer in front of our SMTP and web servers of Postal, that - randomly - forwards SMTP connections to a set of SMTP servers. If these SMTP servers run specific queries at the exact same time, a deadlock occurs. Mainly due to statistics. We could set the loadbalancer to only use a single server, but this greatly reduces capacity.
A good example is here. At line 571 the following query is ran:
UPDATE `statistics` SET `total_outgoing` = COALESCE(`total_outgoing`, 0) + 1 WHERE `statistics`.`id` = 1
. This query locks this specific row (due how InnoDB works), and if this query is ran twice at the same time by different SMTP servers - it creates a deadlock and throws an error in Ruby. As a result, theadd_to_message_queue
is not called and I must call it manually. As a workaround I created a new Docker image whereadd_to_message_queue
is called before the update of the stats, but this is of course a nasty workaround not feasible for real production use with Postal.Another issue that I've seen is the update
last_used_at
timestamp for the credentials. Here the specific row is updated, but if 2 API calls arrive at the exact same time - this creates another deadlock. Luckily, none of our customers currently run using the API - so this is not that important for me.Regardless, as you see, this is quite an issue. I've thought about potential proper fixes for it, and normally I would use some kind of temporary key value store for it - but this is not included in the current requirements of Postal and thus not ideal. In the end I'd love to suggest the following:
A new (optional?) queue for updating single rows. As far as I can see it should support incrementing a number and setting a value. This queue should be processed by a single, dedicated worker (something like
cron
andreq-queuer
, but not creating new queue items - but processing them). New queue items are then submitted in the code, instead of actually incrementing or setting the values.Would this be a suitable solution or do you guys have any other suggestions?
Beta Was this translation helpful? Give feedback.
All reactions