Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional Duplicate Runs of System Jobs #6030

Open
2 tasks done
billdeitrick opened this issue Oct 4, 2024 · 0 comments
Open
2 tasks done

Occasional Duplicate Runs of System Jobs #6030

billdeitrick opened this issue Oct 4, 2024 · 0 comments
Labels
Status: Confirmed It's clear what the subject of the issue is about, and what the resolution should be. Status: In Dev Queue This issue is being worked on, and has someone assigned. Type: Bug Confirmed bugs or reports that are very likely to be bugs.

Comments

@billdeitrick
Copy link

billdeitrick commented Oct 4, 2024

Description

We're seeing what appear to be occasional and random duplicate runs of system jobs. The below screenshots, taken from pre-alpha, show what seems to be the most common manifestation of this issue. In this case, the Update Persisted DataView job was launched twice at 2:24 pm. An apparent duplicate entry appears in the service job history log and was never updated from a running status, even after subsequent job runs completed successfully.

2024-10-04_11-40-26

Duplicate job launches seem more common (and understandable) during an overlapped IIS app pool recycle. However, we're seeing this appear to occur at random, unrelated to app pool recycles or Rock restarts. In the case of the example shown above from pre-alpha, Rock reports last being restarted at 2 am when this occurred, and the duplicated job run occurred at 2:24 pm.

2024-10-04_11-45-15

There doesn't seem to be a pattern with this (we've seen it affecting multiple job types), but it seems more common with the very frequently processed job types (such as persisted datasets and persisted data views).

A scenario we do see occasionally is multiple web worker processes running and causing all job runs to be duplicated. That usually seems to happen when an application pool is recycled and the old worker process doesn't seem to exit properly, though it happens very infrequently. That doesn't seem to be what's happening here, since we've only seen a single job appear to be affected at one time.

Actual Behavior

System jobs occasionally launched multiple times, seemingly at random. One of the "duplicates" often appears to get stuck in the running state.

Expected Behavior

System jobs would only launch once when scheduled.

Steps to Reproduce

I don't have concrete steps to reproduce this, since it seems to happen at random. I used the SQL query below to look for this across several Rock environments, found a handful of examples, and posted the example I found on pre-alpha above showing pattern idenfied. Environments with examples were on different point releases of v16, including pre-alpha on v17 as described above.

This query looks in service job history for any examples of jobs with the same service job id starting within 10 seconds of each other within the last week.

SELECT
      [ServiceJobId]      = [sj].[Id]
    , [ServiceJobName]    = [sj].[Name]
    , [History1StartTime] = [sjh].[StartDateTime]
    , [History2StartTime] = [sjh2].[StartDateTime]
    , [History1Id]        = [sjh].[Id]
    , [History1Status]    = [sjh].[Status]
    , [History1Duration]  = DATEDIFF(MINUTE, [sjh].[StopDateTime], [sjh].[StartDateTime])
    , [History2Id]        = [sjh2].[Id]
    , [History2Status]    = [sjh2].[Status]
    , [History2Duration]  = DATEDIFF(MINUTE, [sjh].[StopDateTime], [sjh].[StartDateTime])
FROM
    [ServiceJobHistory] [sjh]
    JOIN [ServiceJobHistory] [sjh2]
            ON [sjh2].[ServiceJobId] = [sjh].[ServiceJobId] AND
               ABS(DATEDIFF(SECOND, [sjh].[StartDateTime], [sjh2].[StartDateTime])) <= 10 AND [sjh].[Id] != [sjh2].[Id]
    JOIN [ServiceJob] [sj]
            ON [sj].[Id] = [sjh].[ServiceJobId]
WHERE
    [sjh].[StartDateTime] >
    DATEADD(WEEK, -1, CAST(SYSDATETIMEOFFSET() AT TIME ZONE 'Eastern Standard Time' AS DATETIME))
ORDER BY
    [sjh].[ServiceJobId]
    , [sjh].[StartDateTime] DESC

Note that I couldn't find an example of this on Rock Solid Demo but did on pre-alpha, which is why I've shown that example above.

Issue Confirmation

  • Perform a search on the Github Issues to see if your bug or enhancement is already reported.
  • Reproduced the problem on a fresh install or on the demo site.

Rock Version

v16.3

Client Culture Setting

en-US

@sparkdevnetwork-service sparkdevnetwork-service added Status: Confirmed It's clear what the subject of the issue is about, and what the resolution should be. Type: Bug Confirmed bugs or reports that are very likely to be bugs. labels Oct 4, 2024
@sparkdevnetwork-service sparkdevnetwork-service added the Status: In Dev Queue This issue is being worked on, and has someone assigned. label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Confirmed It's clear what the subject of the issue is about, and what the resolution should be. Status: In Dev Queue This issue is being worked on, and has someone assigned. Type: Bug Confirmed bugs or reports that are very likely to be bugs.
Projects
None yet
Development

No branches or pull requests

2 participants