Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phrase Searching for Blacklight Search Results #6171

Open
2 tasks
joncameron opened this issue Jan 15, 2025 · 5 comments
Open
2 tasks

Phrase Searching for Blacklight Search Results #6171

joncameron opened this issue Jan 15, 2025 · 5 comments

Comments

@joncameron
Copy link
Contributor

joncameron commented Jan 15, 2025

Description

Phrase searching is possible in Avalon's current configuration by enclosing a search phrase in quote characters, but extra results also appear for things that seem beyond the scope of the phrase.

Phrase searching isn't working as expected for metadata, section data, and transcripts that are searched as part of Blacklight results in MCO: search matched for a substring within a phrase and not the exact phrase string.

"Found in" tokenization is a different case; this ticket only relates to the blacklight search and results, not the "Found In" count/report.

Feedback on search and default "OR":

in searching for a phrase on MCO that it is searching for Term OR Term rather than AND — I did a search for Rich Searles (no quotes), Richard Searles and then tried "Richard Searles". All bring up lots of results not relevant to my need. And in addition to doing an OR search, it's also truncating the search - the first hit for the 2nd search brought up https://media.dlib.indiana.edu/media_objects/1544bq372
which has Richard and then elsewhere in the record, Searle (no "s" at the end). (the real recording I wanted came up as the 3rd result)

We'll need to look at the case above—deleting the transcript document for 1544bq372 in mco-staging removed it from returning as a result.

To Reproduce

MCO

First example

avalon-dev

  • Search "venice edit" on avalon-dev
  • The first hit (https://avalon-dev.dlib.indiana.edu/media_objects/6d56zw687) is a real match with the phrase venice edit in the section data
  • The Found In counts are definitely counting each individual term match, not phrases.
  • The second result (https://avalon-dev.dlib.indiana.edu/media_objects/0k225b04p) does not have the phrase "venice edit" anywhere in the metadata, section, or transcript. One of the terms "venice" is in one of the transcripts. The word "edit" is not anywhere in the metadata, section, or transcript, which matches with the Found In saying there is 1 hit in the transcript.

Done Looks Like

  • Query terms wrapped in quotes are treated as phrases
  • Results matching only substrings of the phrase in either the metadata, section, or transcript are not returned
@joncameron joncameron mentioned this issue Jan 15, 2025
2 tasks
@elynema elynema changed the title Phrase Searching for Metadata Phrase Searching for Blacklight Results Jan 29, 2025
@elynema
Copy link
Contributor

elynema commented Jan 29, 2025

Resolving this may also impact searching on the collections page. Theoretically, it uses the same API, so would also implement phrase searching.

@joncameron joncameron changed the title Phrase Searching for Blacklight Results Phrase Searching for Metadata Jan 29, 2025
@elynema elynema changed the title Phrase Searching for Metadata Phrase Searching for Blacklight Search Results Jan 29, 2025
@elynema
Copy link
Contributor

elynema commented Jan 29, 2025

I searched for “goodman, benny.” Second result has benny goodman, but not goodman, benny. No transcripts are involved. This suggests that at least in MCO, the metadata search is not respecting phrases.

It does look like MDPI (which is still running 7.7??) may be respecting phrase searching.

@joncameron
Copy link
Contributor Author

An OR clause is added to the main query: if there are any "Found in" matches, it will match the record as a whole. We could redefine this main clause; worst case would be having to do a big reimplement, but maybe it wouldn't matter.

@joncameron
Copy link
Contributor Author

We could look at the search_builder model's search_section_transcripts method to possibly change this behavior.

@elynema
Copy link
Contributor

elynema commented Feb 7, 2025

@joncameron Was this moved back to Backlog because more examples are needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants