Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve late chunking and optimize pgvector settings #51

Merged
merged 8 commits into from
Dec 4, 2024

Conversation

lsorber
Copy link
Member

@lsorber lsorber commented Nov 25, 2024

Changes:

  • Add a workaround for Improve late interaction/late chunking context window size #24 to increase the embedder's context size from 512 to a user-definable size.
  • Increase the default embedder context size to 1024 tokens (more degrades bge-m3's performance).
  • Upgrade llama-cpp-python to the latest version.
  • More robust testing of rerankers with Kendall's rank correlation coefficient.
  • Optimise pgvector's settings.
  • Offer better control of oversampling in hybrid and vector search.
  • Upgrade to the PostgreSQL 17.

@lsorber lsorber self-assigned this Nov 25, 2024
@lsorber lsorber force-pushed the ls-increase-late-chunking-context branch from c0e4abc to 463b54e Compare December 3, 2024 19:56
@lsorber lsorber force-pushed the ls-increase-late-chunking-context branch from 463b54e to 60312bf Compare December 3, 2024 20:05
@lsorber lsorber changed the title feat: improve late chunking context size feat: improve late chunking and pgvector config Dec 3, 2024
@lsorber lsorber force-pushed the ls-increase-late-chunking-context branch from 414ee81 to fa15e3f Compare December 3, 2024 22:00
@lsorber lsorber changed the title feat: improve late chunking and pgvector config feat: improve late chunking and optimize pgvector settings Dec 4, 2024
@lsorber lsorber merged commit 2680b74 into main Dec 4, 2024
2 checks passed
@lsorber lsorber deleted the ls-increase-late-chunking-context branch December 4, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant