Skip to content

Commit

Permalink
Improve relevance scoring in HTML search results (#12441)
Browse files Browse the repository at this point in the history
Co-authored-by: Will Lachance <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
  • Loading branch information
3 people authored Jul 11, 2024
1 parent e7beb8b commit 91c5cd3
Show file tree
Hide file tree
Showing 8 changed files with 132 additions and 3 deletions.
4 changes: 4 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ Bugs fixed
* #12425: Use Docutils' SVG processing in the HTML builder
and remove Sphinx's custom logic.
Patch by Tunç Başar Köse.
* #12391: Adjust scoring of matches during HTML search so that document main
titles tend to rank higher than subsection titles. In addition, boost matches
on the name of programming domain objects relative to title/subtitle matches.
Patch by James Addison and Will Lachance.

Testing
-------
Expand Down
5 changes: 3 additions & 2 deletions sphinx/themes/basic/static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,14 @@ const Search = {
for (const [title, foundTitles] of Object.entries(allTitles)) {
if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) {
for (const [file, id] of foundTitles) {
let score = Math.round(100 * queryLower.length / title.length)
const score = Math.round(Scorer.title * queryLower.length / title.length);
const boost = titles[file] === title ? 1 : 0; // add a boost for document titles
normalResults.push([
docNames[file],
titles[file] !== title ? `${titles[file]} > ${title}` : title,
id !== null ? "#" + id : "",
null,
score,
score + boost,
filenames[file],
]);
}
Expand Down
1 change: 1 addition & 0 deletions tests/js/fixtures/titles/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions tests/js/roots/titles/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import os
import sys

sys.path.insert(0, os.path.abspath('.'))

extensions = ['sphinx.ext.autodoc']
20 changes: 20 additions & 0 deletions tests/js/roots/titles/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Main Page
=========

This is the main page of the ``titles`` test project.

In particular, this test project is intended to demonstrate how Sphinx
can handle scoring of query matches against document titles and subsection
heading titles relative to other document matches such as terms found within
document text and object names extracted from code.

Relevance
---------

In the context of search engines, we can say that a document is **relevant**
to a user's query when it contains information that seems likely to help them
find an answer to a question they're asking, or to improve their knowledge of
the subject area they're researching.

.. automodule:: relevance
:members:
7 changes: 7 additions & 0 deletions tests/js/roots/titles/relevance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
class Example:
"""Example class"""
num_attribute = 5
text_attribute = "string"

relevance = "testing"
"""attribute docstring"""
13 changes: 13 additions & 0 deletions tests/js/roots/titles/relevance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Relevance
=========

In some domains, it can be straightforward to determine whether a search result
is relevant to the user's query.

For example, if we are in a software programming language domain, and a user
has issued a query for the term ``printf``, then we could consider a document
in the corpus that describes a built-in language function with the same name
as (highly) relevant. A document that only happens to mention the ``printf``
function name as part of some example code that appears on the page would
also be relevant, but likely less relevant than the one that describes the
function itself in detail.
79 changes: 78 additions & 1 deletion tests/js/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,23 @@ describe('Basic html theme search', function() {
return req.responseText;
}

function checkRanking(expectedRanking, results) {
let [nextExpected, ...remainingItems] = expectedRanking;

for (result of results.reverse()) {
if (!nextExpected) break;

let [expectedPage, expectedTitle, expectedTarget] = nextExpected;
let [page, title, target] = result;

if (page == expectedPage && title == expectedTitle && target == expectedTarget) {
[nextExpected, ...remainingItems] = remainingItems;
}
}

expect(remainingItems.length).toEqual(0);
}

describe('terms search', function() {

it('should find "C++" when in index', function() {
Expand Down Expand Up @@ -76,7 +93,7 @@ describe('Basic html theme search', function() {
'Main Page',
'',
null,
100,
16,
'index.rst'
]
];
Expand All @@ -85,6 +102,66 @@ describe('Basic html theme search', function() {

});

describe('search result ranking', function() {

/*
* These tests should not proscribe precise expected ordering of search
* results; instead each test case should describe a single relevance rule
* that helps users to locate relevant information efficiently.
*
* If you think that one of the rules seems to be poorly-defined or is
* limiting the potential for search algorithm improvements, please check
* for existing discussion/bugreports related to it on GitHub[1] before
* creating one yourself. Suggestions for possible improvements are also
* welcome.
*
* [1] - https://github.com/sphinx-doc/sphinx.git/
*/

it('should score a code module match above a page-title match', function() {
eval(loadFixture("titles/searchindex.js"));

expectedRanking = [
['index', 'relevance', '#module-relevance'], /* py:module documentation */
['relevance', 'Relevance', ''], /* main title */
];

searchParameters = Search._parseQuery('relevance');
results = Search._performSearch(...searchParameters);

checkRanking(expectedRanking, results);
});

it('should score a main-title match above an object member match', function() {
eval(loadFixture("titles/searchindex.js"));

expectedRanking = [
['relevance', 'Relevance', ''], /* main title */
['index', 'relevance.Example.relevance', '#module-relevance'], /* py:class attribute */
];

searchParameters = Search._parseQuery('relevance');
results = Search._performSearch(...searchParameters);

checkRanking(expectedRanking, results);
});

it('should score a main-title match above a subheading-title match', function() {
eval(loadFixture("titles/searchindex.js"));

expectedRanking = [
['relevance', 'Relevance', ''], /* main title */
['index', 'Main Page > Relevance', '#relevance'], /* subsection heading title */
];

searchParameters = Search._parseQuery('relevance');
results = Search._performSearch(...searchParameters);

checkRanking(expectedRanking, results);
});

});

});

describe("htmlToText", function() {
Expand Down

0 comments on commit 91c5cd3

Please sign in to comment.