-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve relevance scoring for titles and object-name matches in search results #12441
Improve relevance scoring for titles and object-name matches in search results #12441
Conversation
Related to sphinx-doc#12391. Intended as a proof of concept, not a full solution.
Commit cb0f6e7 -- regenerating a search index file from scratch -- seems to be have been necessary because I had a stale That seems like a bug; re-using an existing |
…earch-scoring-adjustment Conflicts: tests/js/searchtools.js (no edit conflict; unit test failure)
* Minimize diff relative to mainline codebase to ease review/historic viewing. * Move updated main-title score variable into a ``Scorer`` constant.
I've a slight preference for #12047 to be merged before this, to make the code and diff history easier to follow, if+when either of them are considered ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Some minor optional comments.
This should be ready for further review / merge; I've no changes planned on this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of extra comments but in general this LGTM. I think this will be a nice incremental improvement. @picnixz may you could take a look too (and merge when you think it ready)?
let score = Math.round(Scorer.title * queryLower.length / title.length); | ||
let boost = titles[file] === title ? 1 : 0; // add a boost for document titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look, could these be declared as const
?
let score = Math.round(Scorer.title * queryLower.length / title.length); | |
let boost = titles[file] === title ? 1 : 0; // add a boost for document titles | |
const score = Math.round(Scorer.title * queryLower.length / title.length); | |
const boost = titles[file] === title ? 1 : 0; // add a small boost for document titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinkg it's better using a const as well. But on a second thought, I'm wondering whether a +1 is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously a title and subsection title with the same text would have equal scores, leaving their relative ranking undefined.
Any positive value here should have the effect of elevating the main-document titles above same-named subsection titles in the search results.
A single-integer increment is used because ideally we don't want the main document titles to move up in the rankings 'too much' and overtake other matches. That is possible, though, especially given that some scores are fractional. So I have the opposite worry: that +1 might be too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a good way to figure these out could be to develop counterexamples and add test cases for them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?
let score = Math.round(Scorer.title * queryLower.length / title.length); | ||
let boost = titles[file] === title ? 1 : 0; // add a boost for document titles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinkg it's better using a const as well. But on a second thought, I'm wondering whether a +1 is sufficient.
Yep, the thinking here was to replicate the |
This PR uses a similar approach to what was described/shown in #12393 (comment) (edit: original link was wrong) so it should. However, it would be good to another test before landing to be sure. I tested there by checking out the cpython repository and regenerating the |
Co-authored-by: Will Lachance <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Will Lachance <[email protected]>
… search relevance tests.
@jayaddison are you happy with this // ready to review & merge? A |
@AA-Turner yep, I think this is ready. |
Thanks all! A |
Thank you @AA-Turner! |
Feature or Bugfix
Purpose
Detail
Relates