Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add utility methods to simplify InMemoryVectorStore creation #32

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

cvauclair
Copy link
Contributor

  • Add InMemoryVectorStore::from_documents utility method
  • Add InMemoryVectorStore::from_embeddings utility method
  • Add docstrings
  • Update example

Previously:

let model = openai_client.embedding_model("text-embedding-ada-002");

let mut vector_store = InMemoryVectorStore::default();

let embeddings = EmbeddingsBuilder::new(model.clone())
    .simple_document("doc0", "Definition of a *flurbo*: A flurbo is a green alien that lives on cold planets")
    .simple_document("doc1", "Definition of a *glarb-glarb*: A glarb-glarb is a ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.")
    .simple_document("doc2", "Definition of a *linglingdong*: A term used by inhabitants of the far side of the moon to describe humans.")
    .build()
    .await?;

vector_store.add_documents(embeddings).await?;

let index = vector_store.index(model);

Can now be written as:

let model = openai_client.embedding_model("text-embedding-ada-002");

let embeddings = EmbeddingsBuilder::new(model.clone())
    .simple_document("doc0", "Definition of a *flurbo*: A flurbo is a green alien that lives on cold planets")
    .simple_document("doc1", "Definition of a *glarb-glarb*: A glarb-glarb is a ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.")
    .simple_document("doc2", "Definition of a *linglingdong*: A term used by inhabitants of the far side of the moon to describe humans.")
    .build()
    .await?;

let index = InMemoryVectorIndex::from_embeddings(model, embeddings).await?;

Or even shorter if using InMemoryVectorIndex::from_documents!

@cvauclair cvauclair added the feat label Sep 20, 2024
@cvauclair cvauclair requested a review from 0xMochan September 20, 2024 16:28
@cvauclair cvauclair changed the title feat: Add utility methods to simplify embeddings related operations feat: Add utility methods to simplify InMemoryVectorStore create Sep 20, 2024
@cvauclair cvauclair changed the title feat: Add utility methods to simplify InMemoryVectorStore create feat: Add utility methods to simplify InMemoryVectorStore creation Sep 20, 2024
Copy link
Contributor

@0xMochan 0xMochan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I also see there's an opportunity for fluid builder design on the openai_client.embedding_model(...) that could make it a bit cleaner but understandably makes the requirements on other provider clients a bit hazy.

@cvauclair
Copy link
Contributor Author

Looks good. I also see there's an opportunity for fluid builder design on the openai_client.embedding_model(...) that could make it a bit cleaner but understandably makes the requirements on other provider clients a bit hazy.

I think we might end up with traits for Clients to simplify this integration, but that will be tackled in another issue!

@cvauclair cvauclair merged commit e7233e6 into main Sep 23, 2024
2 checks passed
@cvauclair cvauclair deleted the feat/embeddings-utility branch November 7, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants