Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunTask does not properly remove / update the executed task #51

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

acoulton
Copy link
Contributor

Per documentation for RunTask
https://cloud.google.com/tasks/docs/reference/rpc/google.cloud.tasks.v2#google.cloud.tasks.v2.CloudTasks.RunTask

If Cloud Tasks receives a successful response from the task's target, then the task will be deleted; otherwise the task's
schedule_time will be reset to the time that RunTask was called plus the retry delay specified in the queue's RetryConfig.

In other words, it essentially internally works the same as if you'd set ScheduleTime to now (albeit that it also bypasses rate limits / queue pause state / etc).

However the emulator is treating RunTask as a separate execution - the task will still run at the original scheduled time. Unfortunately, it is immediately removed from ListTasks / GetTasks / etc, so it's not possible to see this is happening or cancel
the task manually. This causes odd race conditions in testing (especially if coupled with delayed schedule times) as the queue looks empty but you get an unexpected duplicate delivery some time later.

@acoulton acoulton marked this pull request as ready for review August 30, 2021 14:51
@acoulton
Copy link
Contributor Author

@aertje this is the last thing on my list for the moment, which we found after tracing some odd race conditions in our end-to-end test suites (our app code queues deferred tasks, then we had our tests calling RunTasks to allow us to move on to assertions, but that was causing a duplicate execution a while later during an unrelated test).

Obviously in practice this really only happens coupled with a delayed ScheduleTime because in other cases the task fires faster than you can call RunTask anyway. In theory it could also happen if you were using RunTask to force a task that was delayed by queue rate limits, but I suspect that's unlikely when using the emulator.

As with the others this is based on the earlier work so the diff for this branch alone is https://github.com/aertje/cloud-tasks-emulator/pull/51/files/0519806882ebabbd9e7da18fedb5862f20863ad0..9762c38728f6fd8679e207854e984f58c2c7c64d and I'll update / rebase as required.

Per documentation for RunTask
https://cloud.google.com/tasks/docs/reference/rpc/google.cloud.tasks.v2#google.cloud.tasks.v2.CloudTasks.RunTask

> If Cloud Tasks receives a successful response from the task's
> target, then the task will be deleted; otherwise the task's
> schedule_time will be reset to the time that RunTask was called
> plus the retry delay specified in the queue's RetryConfig.

In other words, it essentially internally works the same as
if you'd set `ScheduleTime` to `now` (albeit that it also
bypasses rate limits / queue pause state / etc).

However the emulator is treating RunTask as a separate
execution - the task will still run at the original
scheduled time, albeit it immediately disappears from
ListTasks / GetTasks.

And additionally, the emulator is not retrying if a task
triggered by RunTasks fails.
Makes the behaviour work the same as in production.

I considered writing to the `cancel` channel instead of the boolean
flag, but that would have triggered a second `task.onDone()` call
which might have produced unexpected side-effects.
This was previously only used to trigger the no-retry behaviour
for `RunTask`, and is therefore now always true. Tasks will
always be considered for retry after a failure response.
They are only not retried if the queue retry config
prevents it e.g. if they have reached the max retries.
@acoulton acoulton force-pushed the bug-run-task-not-clearing-tasks branch from 9762c38 to 0453b91 Compare September 13, 2021 08:42
@acoulton
Copy link
Contributor Author

@aertje and this is also now good to go, there's no dependency / conflict with #50 so both should be mergeable.

Copy link
Owner

@aertje aertje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @acoulton; I'm not quite sure how this resolves the underlying problem, apart from hiding the symptoms a bit more. Isn't a lock around the whole of Run and Attempt needed to ensure only the scheduled or runtask run is allowed to execute?

task.stateMutex.Lock()
defer task.stateMutex.Unlock()
task.isDone = true
task.onDone(task)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be included inside the lock?

task.queue.fire <- task
// It's possible the task might already be `done` if e.g. it was called by RunTask before the originally
// scheduled execution time. In that case this is a no-op, otherwise, run the task.
if !task.isDone {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get how this check avoids that problem - what if runtask is called at exactly this point?

@acoulton
Copy link
Contributor Author

Hi @acoulton; I'm not quite sure how this resolves the underlying problem, apart from hiding the symptoms a bit more. Isn't a lock around the whole of Run and Attempt needed to ensure only the scheduled or runtask run is allowed to execute?

That's a fair question, however I think that the production behaviour if the calls are actually concurrent is undefined / at-least-once. The docs say (my emphasis):

Forces a task to run now.
When this method is called, Cloud Tasks will dispatch the task, even if the task is already running, the queue has reached its RateLimits or is PAUSED.

The docs don't specify what result is stored if there are concurrent executions, my hunch is it will be removed if either is successful, or the last to complete if failure.

Equally, the major issue at the moment is that tasks run several minutes later (despite not being visible in ListTasks). This PR reliably solves that AFAIK.

There is still a small risk of duplicate execution at the exact time of the RunTask call, but IMO that is probably acceptable, since CloudTasks doesn't guarantee exactly-once delivery anyway so apps should already be able to cope with a duplicate despatch. The issue we have in CI is that by the time the originally scheduled task fires our suite has moved on so the database is no longer in valid state to handle that duplicate.

Let me see if I can do a bit more testing of exactly what production does for concurrent cases, and then I'll come back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants