-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunTask
does not properly remove / update the executed task
#51
base: master
Are you sure you want to change the base?
Conversation
@aertje this is the last thing on my list for the moment, which we found after tracing some odd race conditions in our end-to-end test suites (our app code queues deferred tasks, then we had our tests calling Obviously in practice this really only happens coupled with a delayed As with the others this is based on the earlier work so the diff for this branch alone is https://github.com/aertje/cloud-tasks-emulator/pull/51/files/0519806882ebabbd9e7da18fedb5862f20863ad0..9762c38728f6fd8679e207854e984f58c2c7c64d and I'll update / rebase as required. |
Per documentation for RunTask https://cloud.google.com/tasks/docs/reference/rpc/google.cloud.tasks.v2#google.cloud.tasks.v2.CloudTasks.RunTask > If Cloud Tasks receives a successful response from the task's > target, then the task will be deleted; otherwise the task's > schedule_time will be reset to the time that RunTask was called > plus the retry delay specified in the queue's RetryConfig. In other words, it essentially internally works the same as if you'd set `ScheduleTime` to `now` (albeit that it also bypasses rate limits / queue pause state / etc). However the emulator is treating RunTask as a separate execution - the task will still run at the original scheduled time, albeit it immediately disappears from ListTasks / GetTasks. And additionally, the emulator is not retrying if a task triggered by RunTasks fails.
Makes the behaviour work the same as in production. I considered writing to the `cancel` channel instead of the boolean flag, but that would have triggered a second `task.onDone()` call which might have produced unexpected side-effects.
This was previously only used to trigger the no-retry behaviour for `RunTask`, and is therefore now always true. Tasks will always be considered for retry after a failure response. They are only not retried if the queue retry config prevents it e.g. if they have reached the max retries.
9762c38
to
0453b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @acoulton; I'm not quite sure how this resolves the underlying problem, apart from hiding the symptoms a bit more. Isn't a lock around the whole of Run and Attempt needed to ensure only the scheduled or runtask run is allowed to execute?
task.stateMutex.Lock() | ||
defer task.stateMutex.Unlock() | ||
task.isDone = true | ||
task.onDone(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be included inside the lock?
task.queue.fire <- task | ||
// It's possible the task might already be `done` if e.g. it was called by RunTask before the originally | ||
// scheduled execution time. In that case this is a no-op, otherwise, run the task. | ||
if !task.isDone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get how this check avoids that problem - what if runtask is called at exactly this point?
That's a fair question, however I think that the production behaviour if the calls are actually concurrent is undefined / at-least-once. The docs say (my emphasis):
The docs don't specify what result is stored if there are concurrent executions, my hunch is it will be removed if either is successful, or the last to complete if failure. Equally, the major issue at the moment is that tasks run several minutes later (despite not being visible in ListTasks). This PR reliably solves that AFAIK. There is still a small risk of duplicate execution at the exact time of the RunTask call, but IMO that is probably acceptable, since CloudTasks doesn't guarantee exactly-once delivery anyway so apps should already be able to cope with a duplicate despatch. The issue we have in CI is that by the time the originally scheduled task fires our suite has moved on so the database is no longer in valid state to handle that duplicate. Let me see if I can do a bit more testing of exactly what production does for concurrent cases, and then I'll come back to you. |
Per documentation for RunTask
https://cloud.google.com/tasks/docs/reference/rpc/google.cloud.tasks.v2#google.cloud.tasks.v2.CloudTasks.RunTask
In other words, it essentially internally works the same as if you'd set
ScheduleTime
tonow
(albeit that it also bypasses rate limits / queue pause state / etc).However the emulator is treating RunTask as a separate execution - the task will still run at the original scheduled time. Unfortunately, it is immediately removed from ListTasks / GetTasks / etc, so it's not possible to see this is happening or cancel
the task manually. This causes odd race conditions in testing (especially if coupled with delayed schedule times) as the queue looks empty but you get an unexpected duplicate delivery some time later.