Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Job Scheduler logic #103

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

f-galland
Copy link
Member

Description

This PR adds job scheduler logic to the command-manager plugin.

Issues Resolved

Resolves #87

@f-galland f-galland marked this pull request as ready for review November 5, 2024 15:20
@f-galland f-galland requested a review from a team as a code owner November 5, 2024 15:20
Copy link
Member

@mcasas993 mcasas993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mcasas993
Copy link
Member

mcasas993 commented Nov 7, 2024

When I tested this branch with ./gradlew run or ./gradlew run --debug after a couple of minutes this error appears in the terminal:

[2024-11-07T09:48:36,565][ERROR][o.o.a.s.TransportCreatePitAction] [integTest-0] PIT creation failed while updating PIT ID for indices [[.commands]]
[2024-11-07T09:48:36,565][ERROR][c.w.c.j.PointInTime      ] [integTest-0] all shards failed

This is the complete log file:
integTest.log

@f-galland
Copy link
Member Author

An issue was observed where the plugin entered an infinite loop whenever API calls were received whilst outgoing http requests where being performed.
The following logs would be output in quick succession:

[2024-11-07T12:03:52,564][ERROR][c.w.c.j.PointInTime      ] [integTest-0] all shards failed
[2024-11-07T12:03:52,566][ERROR][o.o.a.s.TransportCreatePitAction] [integTest-0] PIT creation failed while updating PIT ID for indices [[.commands]]

I haven't been able to pin point the issue with exactitude, but I took the following measures to try and fix it:

  • Made all the job scheduler related code synchronous by avoiding ActionListeners wherever possible.
    In cases where this was not possible (ie. for methods like client.createPit() only have an async version), I forced synchronous execution by blocking until the return value is obtained.
  • I also found that a PointInTimeBuilder was being instantiated per-page, instead of per job executuion.
  • Lastly, the SearchJob class was Singleton, but it seems there is no need for this.

With these changes, the problem seems to not manifest itself anymore. However this brings new issues as we now need to handle the lifecycle of the SearchJob object. I will research how this is usually done in opensearch and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the job-scheduler logic
3 participants