-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator can return partial results after the timeout when allow_partial_search_results is true #16681
base: main
Are you sure you want to change the base?
Conversation
3b9454f
to
27eed03
Compare
27eed03
to
61d84d1
Compare
❌ Gradle check result for 61d84d1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
long leftTimeMills; | ||
if (queryPhase) { | ||
// it's costly in query phase. | ||
leftTimeMills = task.queryPhaseTimeout() - (System.currentTimeMillis() - task.startTimeMills()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind the queryPhaseTimeoutPercentage
concept? I think it's going to depend on the query and the setup whether query or fetch phase takes longer and it doesn't seem super intuitive for a user to understand how to use this. For example a query that matches a lot of sparse documents using searchable snapshots might spend much longer in the fetch phase while a query that performs complex aggregations might spend a lot longer in the query phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to reserve some time for the subsequent phase as a backup measure, to ensure each stage can be allocated a certain amount of time. Of course, if the previous stage takes a very short time, it won't affect the remaining time available for the subsequent phases either.
If no such reservation is made, and a shard is blocked in query phase and uses up all the time, even if it returns after the timeout, there won't be any executable time left for the subsequent stages, and the timeout would be meaningless in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm should we have separate timeouts for the coordinator and the shard level search tasks then? I still think it's pretty unintuitive to use a % like this.
61d84d1
to
5172db6
Compare
❌ Gradle check result for 5172db6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
5172db6
to
5638e3c
Compare
❌ Gradle check result for 5638e3c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 17cef4f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
17cef4f
to
e3dda9a
Compare
…artial_search_results is true Signed-off-by: kkewwei <[email protected]> Signed-off-by: kkewwei <[email protected]>
e3dda9a
to
f2cb9f7
Compare
❌ Gradle check result for f2cb9f7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
In query phase, the coordinate concurrently search each shard, If any shard is blocked or responds very slowly, the coordination node will be stuck even if the timeout is set.
The pr supports timeout waiting, if the timeout is exceeded, the coordinator considers the shard as failed and gos on the fetch phase.
Related Issues
Resolves #817 (comment)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.