Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timeout based search request cancellation #817

Open
sohami opened this issue Jun 4, 2021 · 17 comments · May be fixed by #16681
Open

Support timeout based search request cancellation #817

sohami opened this issue Jun 4, 2021 · 17 comments · May be fixed by #16681
Labels
enhancement Enhancement or improvement to existing feature or request

Comments

@sohami
Copy link
Collaborator

sohami commented Jun 4, 2021

Is your feature request related to a problem? Please describe.
Currently the optional "timeout" parameter in the SearchRequest applies to the individual child shard level search requests not at the parent search request. The shard request to a node is sent in multiple batches based on "maxConcurrentRequestsPerNode" parameter. So if a search request results in sending N such batches, the parent request timeout will essentially be N*batchNumber. Also the timeout is only honored in query phase and not in Fetch phase. If there is a long running search for which client doesn't want to wait for the result anymore, they have to use the task API to cancel such request. In cases, when user doesn't initiate the cancellation the previous search will still be consuming the cluster resources until it completes.

Describe the solution you'd like
The proposal is to have a separate parameter in search request like "cancel_after_timeinterval" which can be set by the user both at request level and at cluster level. Based on this new parameter, after the timeout expiry the search requests will be cancelled automatically using the cancellation framework. This will help: 1) to reduce the wasted resource usage. 2) automatic cancellation mechanism for the search request, where client doesn't have to explicitly use the task API to cancel it.

Describe alternatives you've considered

  1. Use the existing timeout parameter in search request instead of new parameter. This may create confusion for current users because of change in behavior and may break certain client applications. Current timeout behavior is to return partial results, however with cancellation no results will be returned since fetch phase will error out for cancelled search requests.

Additional context
Add any other context or screenshots about the feature request here.

@sohami sohami added the enhancement Enhancement or improvement to existing feature or request label Jun 4, 2021
@Bukhtawar
Copy link
Collaborator

I guess we should also have a mechanism to support partial results we have gotten before the timeout unless
allow_partial_search_results is set to false

@sohami
Copy link
Collaborator Author

sohami commented Jul 2, 2021

@Bukhtawar: Cancellation mechanism will be helpful to terminate the workload in events when it is of no use to the client and it wants to stop the workload from consuming any more resources ASAP. If client submits a cancellation request for a task externally using task API then also the search request is failed in fetch phase irrespective of allow_partial_search_results flag. I would prefer to keep the behavior same in both the use cases.

To provide more context, the allow_partial_search_results flag is only honored in query phase after getting results (docId) from all or subset of the shards. If the search request is cancelled before fetch phase then even send of fetch request from coordinator to shards will be failed with task cancellation exception.

@AmiStrn
Copy link
Contributor

AmiStrn commented Jul 2, 2021

I was looking into a similar task recently, I support this requirement 100%. What do we need to get started on this? @sohami are you in the process of writing the code for this yet?

@sohami
Copy link
Collaborator Author

sohami commented Jul 2, 2021

@AmiStrn - yes. I was waiting on some initial feedback. Will raise the PR as soon as I am done

@sohami
Copy link
Collaborator Author

sohami commented Jul 19, 2021

@AmiStrn - I am dividing the change into 2 separate PRs. First one introduces the request level parameter support. Second one will consume the parameter to schedule cancellation task (WIP). Would be great if you can help review this.

@AmiStrn
Copy link
Contributor

AmiStrn commented Jul 19, 2021

Would be great if you can help review this.

Gladly:)

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented Jul 19, 2021

@sohami I don't think size 0 aggregations have a fetch phase. Does partial results make sense here

I guess it should work out of the box since we plan on simply cancelling

@sohami
Copy link
Collaborator Author

sohami commented Jul 19, 2021

@Bukhtawar - Yes in cases when there is no fetch phase and some of the shards successfully completed the execution before cancellation, then partial results will be returned if allow_partial_search_results is set to true. But note that the cancellation on timeout is basically an intent from client that it is not waiting on any results anymore and just wants the request to terminate. In case client wants to get the partial results, then current search timeout parameter should be used. That will enforce timeout only in search phase and irrespective of query type with size >=0, it will return the partial results.

@AmiStrn
Copy link
Contributor

AmiStrn commented Jul 27, 2021

@dblock how do we get assignee's from the maintainers to review and approve PR's such as #986 ? (I had commented on the PR, but have not approved since I'm not a maintainer)

edit: didn't notice that there is quite some lag with the PRs :) Please let us know what to expect in terms of a timeline on this.

@dblock
Copy link
Member

dblock commented Jul 27, 2021

@dblock how do we get assignee's from the maintainers to review and approve PR's such as #986 ? (I had commented on the PR, but have not approved since I'm not a maintainer)
edit: didn't notice that there is quite some lag with the PRs :) Please let us know what to expect in terms of a timeline on this.

Can't promise an SLA rn, but we do also have some automation that is nagging maintainers that PRs are open for longer than we would like without action. For now, if you feel that no traction has been had on an issue, feel free to tag me and I'll go find someone to review.

@dblock
Copy link
Member

dblock commented Jul 27, 2021

edit: didn't notice that there is quite some lag with the PRs :) Please let us know what to expect in terms of a timeline on this.

While you're not a maintainer you did a solid review of that PR, so nobody (including myself) felt the need to jump in. Thank you. I will click buttons after the next iteration to get more of the tests to run, and take a closer look at the code as well if needed.

@dblock
Copy link
Member

dblock commented Aug 12, 2021

#986 implements much of this, leaving to @sohami to close when you feel like it does everything you want or close and open smaller issues for the max_-related business.

@kkewwei
Copy link
Contributor

kkewwei commented Apr 5, 2024

I guess we should also have a mechanism to support partial results we have gotten before the timeout unless allow_partial_search_results is set to false

@sohami @AmiStrn we also have the same need: OS should returns partial results after the timeout in coordinate node. The ideal situation is that the coordinate node returns partial results(if allow_partial_search_results is true) and send cancel request when timeout. If possible, I would like to try to implement it.

ritty27 pushed a commit to ritty27/OpenSearch that referenced this issue May 12, 2024
…pensearch-project#817)

* Bump org.owasp.dependencycheck from 9.0.8 to 9.0.9 in /java-client

Bumps org.owasp.dependencycheck from 9.0.8 to 9.0.9.

---
updated-dependencies:
- dependency-name: org.owasp.dependencycheck
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update changelog

Signed-off-by: dependabot[bot] <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
@kkewwei
Copy link
Contributor

kkewwei commented Jun 26, 2024

@sohami, @AmiStrn please have a look, whether it should be supported, I'm pleasure to implement it.

@AmiStrn
Copy link
Contributor

AmiStrn commented Jul 8, 2024

i think it should be supported. sorry i was not more clear about that, i did a 👍 on your commend in agreement with it. I think @sohami is the one to say if they are not going to work on this.
@sohami are you actively working on this task?

@kkewwei
Copy link
Contributor

kkewwei commented Sep 30, 2024

i think it should be supported. sorry i was not more clear about that, i did a 👍 on your commend in agreement with it. I think @sohami is the one to say if they are not going to work on this. @sohami are you actively working on this task?

@sohami, If you are not working on it, I will try to implement it?

@kkewwei
Copy link
Contributor

kkewwei commented Nov 19, 2024

@Bukhtawar @AmiStrn @sohami, please have a look when you are free. #16681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
None yet
5 participants