-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: increase workflow run pop timeout, fix broken concurrency query #1189
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
How often do we call this function in regular operation? I feel that 30 seconds is probably way too long for a DB timeout - if we start going that long something is definitely hosed - and we'll probably have a ton of other operations queued up behind us so it's not going to help us. |
I think there is a potential for starvation / unfairness here if we have a very large number of concurrency groups (which I think is the issue here right? I think our limit needs to be at least so that we don't ignore any concurrency groups that are at the end (say the 501st concurrency group if we had one for everything) Also I think our limit needs to be a multiple of COUNT(DISTINCT "concurrencyGroupId") FROM workflow_runs because otherwise we'll unfairly schedule the first groups in the ordering. |
also, I feel like I'll need to add an obligatory nit about the github actions not passing to save @grutt the hassle 😆 |
This method is called very often with high usage of concurrency keys -- the problem with this query is that is scales linearly with the number of queued workflow runs and concurrency keys. So we'd likely expect to see > 5 seconds if we've enqueued > 1 million workflow runs that have very distinct concurrency keys. And the thing that concerns me about a low timeout is that we may be cancelling work after the database has spent a bunch of cycles nearly completing a query (and if this query times out, there's no recovering the queue).
this should be written in a way where there aren't other operations queued up behind us, so I'm less concerned about this
I'm not sure I follow -- yes, the limit will only pop 500 workflow runs at a time, but it shouldn't be unfair because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a review cause I think I just commented before. I think with the changes we talked about on the call today this should be good.
Description
Fixes:
PopWorkflowRuns...
query so we don't accidentally update too many rows (we should potentially increase this further)seqnum
cross product issue which was making the query slowType of change