-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML Search: omit anchor reference from document titles in the search index. #12047
Changes from all commits
e951f65
5e05a27
d44dc11
a978110
ed47ab9
26d6e3c
5c75c81
3c557cd
b8177c5
d3a40d8
f794764
7c06fe9
e947f6b
9fef43b
ad3488e
dcdd743
30d626e
86fd24c
c7245d2
2857c7b
520a205
64387c4
b349764
b256267
964d179
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -71,6 +71,9 @@ def is_registered_term(index, keyword): | |||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
.. test that comments are not indexed: boson | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
another_title | ||||||||||||||||||||||||||||||||||||||
============= | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
test that non-comments are indexed: fermion | ||||||||||||||||||||||||||||||||||||||
''' | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
@@ -168,6 +171,10 @@ def test_IndexBuilder(): | |||||||||||||||||||||||||||||||||||||
'docname2_1': 'title2_1', 'docname2_2': 'title2_2'} | ||||||||||||||||||||||||||||||||||||||
assert index._filenames == {'docname1_1': 'filename1_1', 'docname1_2': 'filename1_2', | ||||||||||||||||||||||||||||||||||||||
'docname2_1': 'filename2_1', 'docname2_2': 'filename2_2'} | ||||||||||||||||||||||||||||||||||||||
# note: element iteration order (sort order) is important when the index | ||||||||||||||||||||||||||||||||||||||
# is frozen (serialized) during build -- however, the _mapping-related | ||||||||||||||||||||||||||||||||||||||
# dictionaries below may be iterated in arbitrary order by Python at | ||||||||||||||||||||||||||||||||||||||
# runtime. | ||||||||||||||||||||||||||||||||||||||
assert index._mapping == { | ||||||||||||||||||||||||||||||||||||||
'ar': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
'fermion': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
|
@@ -176,7 +183,10 @@ def test_IndexBuilder(): | |||||||||||||||||||||||||||||||||||||
'index': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
'test': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._title_mapping == {'section_titl': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}} | ||||||||||||||||||||||||||||||||||||||
assert index._title_mapping == { | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, I see that the values are sets and thus can be unordered. Is it possible to actually have a more reliable guarantee on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To check that I understand correctly: use an assertion that checks not only the contents of the title mapping, but also the ordering of the elements inside it? (makes sense to me, order can be significant for index data) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If my understanding there was correct: I'd prefer to handle that in a separate issue thread and PR, if that's OK. Or, if I misunderstood: let's make sure to resolve that first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is more than one occurrence of that, yes we could discuss it in another thread. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep; I think it would be applicable for both sphinx/sphinx/search/__init__.py Lines 447 to 464 in b70578f
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correction: any runtime (in-memory) differences in the ordering of objects within the I'm not aware of any other current bugs causing nondeterminism in our There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, pretty much. I'll be a bit verbose:
What you suggest about a comment makes sense... ...however.. perhaps I could attempt a separate PR to refactor the code so that sorting occurs in-place and in-memory, and to adjust the unit tests to assert on that. Those would provide stronger properties than a comment in the code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for that. It's better that at runtime, the objects are represented in sets since they are more efficient. It's only for serialization that you need to sort to ensure reproducibility. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, agreed (storing the lists in-place for ordered iteration would probably add various types of overhead). I've pushed a commit to add some commentary as a docstring on the relevant method and also in the unit tests. |
||||||||||||||||||||||||||||||||||||||
'another_titl': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
jayaddison marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||
'section_titl': {'docname1_1', 'docname1_2', 'docname2_1', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._objtypes == {} | ||||||||||||||||||||||||||||||||||||||
assert index._objnames == {} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
@@ -196,8 +206,14 @@ def test_IndexBuilder(): | |||||||||||||||||||||||||||||||||||||
'non': [0, 1, 2, 3], | ||||||||||||||||||||||||||||||||||||||
'test': [0, 1, 2, 3]}, | ||||||||||||||||||||||||||||||||||||||
'titles': ('title1_1', 'title1_2', 'title2_1', 'title2_2'), | ||||||||||||||||||||||||||||||||||||||
'titleterms': {'section_titl': [0, 1, 2, 3]}, | ||||||||||||||||||||||||||||||||||||||
'alltitles': {'section_title': [(0, 'section-title'), (1, 'section-title'), (2, 'section-title'), (3, 'section-title')]}, | ||||||||||||||||||||||||||||||||||||||
'titleterms': { | ||||||||||||||||||||||||||||||||||||||
'another_titl': [0, 1, 2, 3], | ||||||||||||||||||||||||||||||||||||||
'section_titl': [0, 1, 2, 3], | ||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||
'alltitles': { | ||||||||||||||||||||||||||||||||||||||
'another_title': [(0, 'another-title'), (1, 'another-title'), (2, 'another-title'), (3, 'another-title')], | ||||||||||||||||||||||||||||||||||||||
'section_title': [(0, None), (1, None), (2, None), (3, None)], | ||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||
'indexentries': {}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._objtypes == {('dummy1', 'objtype1'): 0, ('dummy2', 'objtype1'): 1} | ||||||||||||||||||||||||||||||||||||||
|
@@ -238,7 +254,10 @@ def test_IndexBuilder(): | |||||||||||||||||||||||||||||||||||||
'index': {'docname1_2', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
'test': {'docname1_2', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._title_mapping == {'section_titl': {'docname1_2', 'docname2_2'}} | ||||||||||||||||||||||||||||||||||||||
assert index._title_mapping == { | ||||||||||||||||||||||||||||||||||||||
'another_titl': {'docname1_2', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
'section_titl': {'docname1_2', 'docname2_2'}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._objtypes == {('dummy1', 'objtype1'): 0, ('dummy2', 'objtype1'): 1} | ||||||||||||||||||||||||||||||||||||||
assert index._objnames == {0: ('dummy1', 'objtype1', 'objtype1'), 1: ('dummy2', 'objtype1', 'objtype1')} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
@@ -257,8 +276,14 @@ def test_IndexBuilder(): | |||||||||||||||||||||||||||||||||||||
'non': [0, 1], | ||||||||||||||||||||||||||||||||||||||
'test': [0, 1]}, | ||||||||||||||||||||||||||||||||||||||
'titles': ('title1_2', 'title2_2'), | ||||||||||||||||||||||||||||||||||||||
'titleterms': {'section_titl': [0, 1]}, | ||||||||||||||||||||||||||||||||||||||
'alltitles': {'section_title': [(0, 'section-title'), (1, 'section-title')]}, | ||||||||||||||||||||||||||||||||||||||
'titleterms': { | ||||||||||||||||||||||||||||||||||||||
'another_titl': [0, 1], | ||||||||||||||||||||||||||||||||||||||
'section_titl': [0, 1], | ||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||
'alltitles': { | ||||||||||||||||||||||||||||||||||||||
'another_title': [(0, 'another-title'), (1, 'another-title')], | ||||||||||||||||||||||||||||||||||||||
'section_title': [(0, None), (1, None)], | ||||||||||||||||||||||||||||||||||||||
}, | ||||||||||||||||||||||||||||||||||||||
'indexentries': {}, | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
assert index._objtypes == {('dummy1', 'objtype1'): 0, ('dummy2', 'objtype1'): 1} | ||||||||||||||||||||||||||||||||||||||
|
@@ -347,7 +372,8 @@ def assert_is_sorted(item, path: str): | |||||||||||||||||||||||||||||||||||||
assert_is_sorted(value, f'{path}.{key}') | ||||||||||||||||||||||||||||||||||||||
elif isinstance(item, list): | ||||||||||||||||||||||||||||||||||||||
if not is_title_tuple_type(item) and path not in lists_not_to_sort: | ||||||||||||||||||||||||||||||||||||||
assert item == sorted(item), f'{err_path} is not sorted' | ||||||||||||||||||||||||||||||||||||||
# sort nulls last; http://stackoverflow.com/questions/19868767/ | ||||||||||||||||||||||||||||||||||||||
assert item == sorted(item, key=lambda x: (x is None, x)), f'{err_path} is not sorted' | ||||||||||||||||||||||||||||||||||||||
for i, child in enumerate(item): | ||||||||||||||||||||||||||||||||||||||
assert_is_sorted(child, f'{path}[{i}]') | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the shape of this value, i.e., to what corresponds
list[tuple[int, str | None]
(what is the integer, what is thestr | None
and) and what is the keystr
of thedict
, and why does it change from theall_titles
? (the list has pairs of int and strings and not pairs of strings anymore).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm, this is not very clear at the moment, but I believe that the integer values here are what I'd call
documentId
values - sequential IDs assigned to each of the source.rst
files at build-time. The second value in each tuple -- thestr
or empty value -- is a target reference (akaanchor
when talking about HTML) -- it's an indicator about what destination on the page/document the user should be taken to were they to navigate using this entry.Perhaps this is a situation where we should refactor and extract out meaningful type aliases (a bit like the recent Intersphinx refactorings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use aliases, but the
int
always seem to me like corresponding to the index in the list, so that you could possibly have multiple times the same element but with different indices. However, I'd be happy to know if this is really the reason why we duplicate that or if we could just live with a list of strings.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess an example to exlain that could be a title like
Recent Changes
that could appear in multiple documents and might have different headings (or perhaps even be the document title in some cases).alltitles: {"Recent Changes": [0, None], [1, "coolmodule-recentchanges"], [5, "anothermodule-recentchanges"], ...}
(a bit hypothetical, but it's to handle situations like that I believe. again, a good case for checking and documenting)