-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(vector store): lancedb #33
Conversation
@0xMochan @garance-buricatu would like your opinion on something here. This PR adds an associated type for the // in rig-core/src/vector_store/mod.rs
pub trait VectorStoreIndex: Send + Sync {
type SearchParams: for<'a> Deserialize<'a> + Send + Sync; // <-------- This new type
fn top_n_from_query(
&self,
query: &str,
n: usize,
search_params: Self::SearchParams, // <--------- used here
) -> impl std::future::Future<Output = Result<Vec<(f64, DocumentEmbeddings)>, VectorStoreError>> + Send;
// ...
} However, a downstream effect is that you now need to pass an // in rig-core/examples/rag.rs
let rag_agent = openai_client.agent("gpt-4")
.preamble("
You are a dictionary assistant here to assist the user in understanding the meaning of words.
You will find additional non-standard word definitions that could be useful below.
")
.dynamic_context(1, index, "".to_string()) // <---- new empty string argument
.build(); This feels a little awkward, even though it "makes sense". However, another approach would be to push whatever search/index parameters into the struct that implements the pub struct LanceDbVectorStore<M: EmbeddingModel> {
model: M,
document_table: lancedb::Table,
embedding_table: lancedb::Table,
params: .... // <------- params would go inside the index object
} This also has the advantage of making the |
Interesting, for context, can you provide an example for how the search params can be used? |
Pro for new approach:
Con for new approach:
|
in lanceDB, search params can be used to define the distance type (cosine, l2, dot, ...), search type (approximate nearest neighbor, exact nearest neighbor), filters, and more fine tuning options on the search index. |
I think in the context of vector search, this is fine since these settings will most likely be static once the user has found the configuration that works for their RAG system. Plus, although these methods are public, users would usually interact with |
I do think it might be useful to change it on the fly, or if the user of an application wants to adjust this based on a UI / TUI. But you can always reconstruct an agent so it may not be that big of a deal. There can always be an extra search command to use a specific method of querying that overrides the default (so you choose a default when you create the |
@0xMochan @cvauclair thanks guys! I will implement both moving the search params into the vector store struct and also adding an extra override method on the vector search trait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed everything yet, but there are a lot of things to change so will make it in two parts
…tion of VectorStore trait
if let Some(SearchType::Flat) = search_type { | ||
query = query.bypass_vector_index(); | ||
} | ||
|
||
if let Some(SearchType::Approximate) = search_type { | ||
if let Some(nprobes) = nprobes { | ||
query = query.nprobes(nprobes); | ||
} | ||
if let Some(refine_factor) = refine_factor { | ||
query = query.refine_factor(refine_factor); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to a match
statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do match here because I don't need to handle None
on any of these cases
if let Some(distance_type) = distance_type { | ||
query = query.distance_type(distance_type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a default value be set if distance_type
is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I updated the doc strings to answer questions about defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove AI generated data in favor of repeated hardcoded data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove AI generated data in favor of repeated hardcoded data
rig-lancedb
crateVectorStoreIndex
trait to include a generic typetype SearchParams
which can be passed to the vector store query to customize the query.