Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mongodb): remove embeddings from top_n lookup #115

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

0xMochan
Copy link
Contributor

@0xMochan 0xMochan commented Nov 18, 2024

Eliminate the embeddings field from the top_n lookup results to streamline the response structure. Also adds a DocumentResponse type that doesn't include the embeddings field to be used with top_n responses.

Additionally, this PR un-hardcoded embeddings.vec and dynamically looks up the embedded field from the vector search index (which also confirms whether it exists or not properly during construction).

Implementation

We remove the embeddings.vec field via an aggregation pipeline step after we've performed our vector search. I also noticed that embeddings.vec is hardcoded within the mongodb implementation. I didn't want to abstract that in this PR so i've also used the same const for the filtering.

Update: We make a call to mongodb to check all search indexes. We only handle 1 index w/ 1 indexed embedded field but this could be updated in the future if needed.

@0xMochan 0xMochan requested a review from cvauclair November 19, 2024 01:27
rig-mongodb/src/lib.rs Outdated Show resolved Hide resolved
rig-mongodb/src/lib.rs Outdated Show resolved Hide resolved
rig-core/src/vector_store/mod.rs Outdated Show resolved Hide resolved
@0xMochan 0xMochan requested a review from cvauclair November 20, 2024 21:52
@cvauclair cvauclair merged commit b55075e into main Nov 22, 2024
4 checks passed
This was referenced Nov 20, 2024
@0xMochan 0xMochan deleted the fix/mongodb-embeddings-fix branch November 22, 2024 19:32
mateobelanger added a commit that referenced this pull request Dec 2, 2024
* fix: exclude embedding properties from top_n node query

* refactor: more ergonomic index creation

* docs(neo4j): update examples

* fix: unused import in example

* feat(provider): xAI (grok) integration (#106)

* feat(xai): initial xai (grok) implementation

* fix(xai): renamings + tests

* style(xai): Update rig-core/src/providers/xai/client.rs

Co-authored-by: Mathieu Bélanger <[email protected]>

* style(xai): adds various comments and README improvements

* fix(xai): add some print statements to the grok example

* docs(xai): fix readme

---------

Co-authored-by: Mathieu Bélanger <[email protected]>

* fix(rig-mongodb): remove embeddings from `top_n` lookup (#115)

* fix(mongodb): remove embeddings from `top_n` lookup

* fix(mongodb): filter embeddings within agg pipeline

* style(mongodb): clippy moment

* fix(mongodb): dynamically get embedded fields from mongodb

* fix(mongodb): apply fixes from comments

* style(mongodb): fmt

* docs(readme): add perplexity logo to integrations (#112)

* docs(readme): add perplexity logo to integrations
* fix: perplexity logo size

* fix(readme): perplexity logo size

* feat: embeddings API overhaul (#120)

* feat: setup derive macro

* test: test out writing embeddable macro

* test: continue testing custom macro implementation

* feat: macro generate trait bounds

* refactor: split up macro into multiple files

* refactor: move macro derive crate inside rig-core

* feat: replace embedding logic with new embeddable trait and macro

* refactor: refactor rag examples, delete document embedding struct

* feat: remove document embedding from in memory store

* refactor: remove DocumentEmbeddings from in memory vector store

* refactor(examples): combine vector store with vector store index

* docs: add and update docstrings

* fix (examples): fix bugs in examples

* style: cargo fmt

* revert: revert vector store to main

* docs: update emebddings builder docstrings

* refactor: derive macro

* tests: add unit tests on in memory store

* fic(ci): asterix on pull request sto accomodate for epic branches

* fix(ci): double asterix

* feat: add error type on embeddable trait

* refactor: move embeddings to its own module and seperate embeddable

* refactor: split up macro into more files, fix all imports

* fix: revert logging change

* feat: handle tools with embeddingsbuilder

* bug(macro): fix error when embed tags missing

* style: cargo fmt

* fix(tests): clippy

* docs&revert: revert embeddable trait error type, add docstrings

* style: cargo clippy

* clippy(lancedb): fix unused function error

* fix(test): remove useless assert false statement

* cleanup: split up branch into 2 branches for readability

* cleanup: revert certain changes during branch split

* docs: revert doc string

* fix: add embedding_docs to embeddable tool

* refactor: use OneOrMany in Embbedable trait, make derive macro crate feature flag

* tests: add some more tests

* clippy: cargo clippy

* docs: add docstring to oneormany

* fix(macro): update error handling

* refactor: reexport EmbeddingsBuilder in rig and update imports

* feat: implement IntoIterator and Iterator for OneOrMany

* refactor: rename from methods

* tests: fix failing tests

* refactor&fix: make PR review changes

* fix: fix tests failing

* test: add test on OneOrMany

* style: cargo fmt

* docs&fix: fix doc strings, implement iter_mut for OneOrMany

* fix: update borrow and owning of macro

* clippy: add back print statements

* fix: fix issues caused by merge of derive macro branch

* fix: fix cargo toml of lancedb and mongodb

* refactor: use thiserror for OneOtMany::EmptyListError

* feat: add OneOrMany to in memory vector store

* style: cargo fmt

* fix: update embeddingsbuilder import path

* tests: add tests for embeddingsbuilder

* clippy: add is empty method

* fix: add feature flag to examples in mongodb and lancedb crates

* fix: move lancedb fixtures into it's own file

* fix: add dummy main function in fextures.rs for compiler

* fix: revert fixture file, remove fixtures from cargo toml examples

* fix: update fixture import in lancedb examples

* refactor: rename D to T in embeddingsbuilder generics

* refactor: remove clone

* PR: update builder, docstrings, and std::markers tags

* style: replace add with push

* fix: fix mongodb example

* fix: update lancedb and mongodb doc example

* fix: typo

* docs: add and fix docstrings and examples

* docs: add more doc tests

* feat: rename Embeddable trait to ExtractEmbeddingFields

* feat: rename macro files, cargo fmt

* PR; update docstrings, update `add_documents_with_id` function

* doc: fix doc linting

* misc: fmt

* test: fix test

* refactor(embeddings): embed trait definition (#89)

* refactor: Big refactor

* refactor: refactor Embed trait, fix all imports, rename files, fix macro

* fix(embed trait): fix errors while testing

* fix(lancedb): examples

* docs: fix hyperlink

* fmt: cargo fmt

* PR; make requested changes

* fix: change visibility of struct field

* fix: failing tests

---------

Co-authored-by: Christophe <[email protected]>

* fix/docs: fix erros from merge, cleanup embeddings docstrings

* fix: cargo clippy in examples

* Feat: small improvements + fixes + tests (#128)

* docs: Make examples+docstrings a bit more realistic

* feat: Add Embed implementation for &impl Embed

* test: Reorganize tests

* misc: Add `derive` feature to `all` feature flag

* test: Fix dead code warning

* test: Improve embed macro tests

* test: Add additional embed macro test

* docs: Add logging output to rag example

* docs: Fix looging output in tools example

* feat: Improve token usage log messages

* test: Small changes to embedbing builder tests

* style: cargo fmt

* fix: Clippy + docstrings

* docs: Fix docstring

* test: Fix test

* style: Small renaming for consistency

* docs: Improve docstrings

* style: fmt

* fix: `TextEmbedder::embed` visibility

* docs: Simplified the `EmbeddingsBuilder` docstring example to focus on the builder

* style: cargo fmt

* docs: Small edit to lancedb examples

---------

Co-authored-by: cvauclair <[email protected]>

* misc: Add `rig-derive` missing manifest fields (#129)

* feat: Improve `InMemoryVectorStore` API (#130)

* feat: Improve `InMemoryVectorStore` API

* style: clippy+fmt

* test: fix test

* fix: remove unused module (#132)

* fix: exclude embedding properties from top_n node query

* refactor: more ergonomic index creation

* docs(neo4j): update examples

* fix: unused import in example

* fix(example): remove embedding field from Deserialization type

---------

Co-authored-by: Mochan <[email protected]>
Co-authored-by: Garance Buricatu <[email protected]>
Co-authored-by: cvauclair <[email protected]>
@github-actions github-actions bot mentioned this pull request Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants