Occasional Duplicate Runs of System Jobs #6030
Labels
Status: Confirmed
It's clear what the subject of the issue is about, and what the resolution should be.
Status: In Dev Queue
This issue is being worked on, and has someone assigned.
Type: Bug
Confirmed bugs or reports that are very likely to be bugs.
Description
We're seeing what appear to be occasional and random duplicate runs of system jobs. The below screenshots, taken from pre-alpha, show what seems to be the most common manifestation of this issue. In this case, the Update Persisted DataView job was launched twice at 2:24 pm. An apparent duplicate entry appears in the service job history log and was never updated from a running status, even after subsequent job runs completed successfully.
Duplicate job launches seem more common (and understandable) during an overlapped IIS app pool recycle. However, we're seeing this appear to occur at random, unrelated to app pool recycles or Rock restarts. In the case of the example shown above from pre-alpha, Rock reports last being restarted at 2 am when this occurred, and the duplicated job run occurred at 2:24 pm.
There doesn't seem to be a pattern with this (we've seen it affecting multiple job types), but it seems more common with the very frequently processed job types (such as persisted datasets and persisted data views).
A scenario we do see occasionally is multiple web worker processes running and causing all job runs to be duplicated. That usually seems to happen when an application pool is recycled and the old worker process doesn't seem to exit properly, though it happens very infrequently. That doesn't seem to be what's happening here, since we've only seen a single job appear to be affected at one time.
Actual Behavior
System jobs occasionally launched multiple times, seemingly at random. One of the "duplicates" often appears to get stuck in the running state.
Expected Behavior
System jobs would only launch once when scheduled.
Steps to Reproduce
I don't have concrete steps to reproduce this, since it seems to happen at random. I used the SQL query below to look for this across several Rock environments, found a handful of examples, and posted the example I found on pre-alpha above showing pattern idenfied. Environments with examples were on different point releases of v16, including pre-alpha on v17 as described above.
This query looks in service job history for any examples of jobs with the same service job id starting within 10 seconds of each other within the last week.
Note that I couldn't find an example of this on Rock Solid Demo but did on pre-alpha, which is why I've shown that example above.
Issue Confirmation
Rock Version
v16.3
Client Culture Setting
en-US
The text was updated successfully, but these errors were encountered: