Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Split compound terms via VectorTermsService #2081

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

dustinbyrne
Copy link
Contributor

@dustinbyrne dustinbyrne commented Oct 21, 2024

The LLM is now instructed to split compound words into individual search terms. This will improve search performance among lower case search terms that cannot be split by word boundary without a word list.

E.g., given the input

@vector-terms htmlconsumer phpclient llmconfiguration

The following vector terms are generated

html consumer php client llm configuration +htmlconsumer +phpclient +llmconfiguration

This is preferred over the current behavior, which outputs the following:

htmlconsumer phpclient llmconfiguration +htmlconsumer +phpclient +llmconfiguration

Without proper casing, these outputs will not be split on word boundary, resulting in lower quality search results. This previously created a disparity between two inputs like HTTPServer, httpserver.

Resolves #2067

@dustinbyrne dustinbyrne changed the title Fix/llm split compound terms fix: Split compound terms via VectorTermsService Oct 21, 2024
@dustinbyrne dustinbyrne changed the base branch from main to feat/support-vertex-ai October 21, 2024 17:59
Base automatically changed from feat/support-vertex-ai to main October 22, 2024 12:17
@dustinbyrne dustinbyrne merged commit a831259 into main Oct 22, 2024
23 checks passed
@dustinbyrne dustinbyrne deleted the fix/llm-split-compound-terms branch October 22, 2024 20:21
@appland-release
Copy link
Contributor

🎉 This PR is included in version @appland/navie-v1.34.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Context search misses matches due to case sensitivity & camel case analysis
3 participants