-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry delete when deadlocked #91
base: master
Are you sure you want to change the base?
Retry delete when deadlocked #91
Conversation
Have been using this in production for the last few hours. Seems to be working great. |
This fix looks specific to one database. MySQL perhaps? |
yes, it is, it should be safe on other databases, and other databases do not have the problem (to my knowledge). If we want to retry on deadlocks w/ other databases we can look for these: https://github.com/mperham/deadlock_retry/blob/master/lib/deadlock_retry.rb#L16-L20 |
Are 100 retires really necessary? What about a small sleep? Are there any plans to merge this? |
No, probably not. I don't remember why I picked 100 but it shouldn't matter as if it's a deadlock it should go through eventually and it should have every opportunity w/o being infinite. A small sleep would be ok. |
@danielmorrison Any chance of getting this or something similar merged? |
+1 We also had problems with this in the past and had to downgrade to v0.3.3 which is working fine. |
b3f2a9b
to
b3adaee
Compare
did you all see this approach ? ndbroadbent@d311b55 as discussed in #63 |
Since we got #89 merged and using a monkey-patch to use plain sql instead of the "optimized" mysql-query version everything works just fine for us. No more deadlocks. |
so then close this one here, no? |
👍 It's a hack that maybe solves the symptom but not the actual problem. |
I'm not sure that I would agree that this is a hack. If the method of reservation is faster but occasionally deadlocks (and the deadlocks could not be prevented because locking order cannot be specified for the particular locks that are being acquired) then retrying is the right thing to do. If there were a solution that were faster and did not have the deadlock issue then that would be better, of course, but if not then this solution is best. Retrying on deadlock in general is not a hack. |
well it's not a hack but its no solution as @klausmeyer said. The pr feels like a "try-catch" because you don't trust the code. This responsibility should be implemented by the host-application. |
Or at least there should be a switch to turn it on or off. |
The failed job will automatically be retried as usual with Dj. And I don't like prs which just cure the symptoms like you said. |
That's true. I just wanted to say that in case this would be included in the official gem I'd like to have a way to turn it off but I still would prefer to not have it included at all. |
Sorry, but I still disagree. Either the optimization for selecting records needs to be changed completely to something that cannot possibly deadlock or we need to just accept the fact that this particular query can and will deadlock, and that is OK. You cannot always prevent deadlocks and the thing to do when you cannot is to retry. This is how it works. This should not be monkey patched in on the host application. The only solutions I know of are:
There are no other solutions. If you have any others I'm open to them. Making the code easier to monkey patch is not a solution. The job locking algorithm is either susceptible to deadlocks or it isn't. This one is.
I'm not sure if you're referring to with this, but if the job was retried at this point that would be a bad thing since the job already happened. This retry is on the delete. I don't actually remember the effect of this delete failing (if the job retries, the row just gets stuck, or what).
Would you mind saying what you're concerned about here? This retry logic is specifically around deleting the delayed job record and only responds to deadlocks (deadlocks which are known to happen and to my knowledge unpreventable given the order in which mysql acquires locks and the way that this gem is selecting the next job.) |
I agree that deleting the job after successful execution is a concern of this gem. |
1240e8d
to
639c9e5
Compare
Version 4.0.3
+1 for this fix, we've applied it as a monkeypatch and it seems to have smoothed out our issue w/ deadlocks-on-delete crashing workers. (Our environment is multiple workers on multiple machines, with a large queue of fast-running jobs.) |
+1 for the PR, 100% valid strategy for deadlocks, not a hack. For heavy throughput with many workers, this is a big deal. From my experience, this problem is one of the reasons people move out from DJs. |
Fixes #63
Unlike #90, this does not retry reservations because that is handled by delayed_job: https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/worker.rb#L281