-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative to using in-memory collection #45
Comments
This is poorly phrased, thank you for flagging. I'll update the doc and message to make this clearer. |
Hi, love byaldi :) Can you add an example of usage for the in-memory collection? Reading the above answer I think I've over complicating my demo, I do this: def get_document_pages(index_fetch_res: list):
index_mapping_to_files = RAG.get_doc_ids_to_file_names()
for doc_id, file_name in index_mapping_to_files.items():
index_mapping_to_files[doc_id] = Path(file_name).stem
pages = []
for res in index_fetch_res:
doc_id = res["doc_id"]
page_num = res["page_num"]
page_path = Path("images-financial") / Path(index_mapping_to_files[doc_id]) / Path(f"page_{page_num}.png")
pages.append(page_path)
return pages
img_pages = get_document_pages(results) Where Then, in order for the images to be passed to the vlm, I have: def get_answer(prompt:str, images:List[str], top_1=True):
if top_1:
imgs_data = [Image.open(images[0])]
else:
imgs_data = [Image.open(image) for image in images]
# model is instantiated elsewhere
response = model.generate_content([*imgs_data, prompt])
return response.text
def answer_query(prompt, images, top_1=True):
return f"Gemini Response\n: {get_answer(prompt, images, top_1=top_1)}" As you can see, I have the image pages for my documents saved on the file system, and I need to go get those files and open them via the My next step is to save the embeddings in a vector db (and I did not understand how to do this actually), but it would be great also not to fetch image files if they are in memory as base64. Thanks! |
@fvisconti Hi I'm also working on storing the embeddings into a vector database, but rightnow I dont think Byaldi provides any way to store them directly to a vectorDB(like ChromaDB), alternatively we can load the embeddings from the .pt file that stores the embeddings locally to a vectorDB's Collection, but yes its more better with vectorDB for direct loading. |
Hi, I found this from Vespa, which seems definitely what we need. There's also something similar from Milvus. Probably it's not straightforward to integrate with byaldi, I'll have a look during Christmas break. Cheers! |
Hey @fvisconti . Were you able to integrate Vespa with byaldi? There is this blog as well of integrating qdrant with colpali but I want to keep using byaldi. |
Hey, not even tried yet; as soon as I do, I'll let you know here for sure :) |
About this instead, it was quite easy (forgot to mention before), I ended up doing this: def __get_base64(self) -> List[dict]:
self.doc_pages = [
{
'mime_type': magic.from_buffer(base64.b64decode(r['base64']), mime=True),
'data': r['base64']
}
for r in self.search_results
]
def rag_search(self, prompt):
self.search_results = self.rag.search(prompt, k=self.config.get("search_results"))
if self.config.get("img_path"):
self.__get_document_pages(self.config.get("img_path"))
else:
self.__get_base64() Where I essentially setup (in |
When I load from an index with
model = RAGMultiModalModel.from_index(index_path=index_name)
then I get the following message
I am not sure what the alternative is to using in-memory collection. I browsed the source files without finding anything.
Is there a way to use a database or any other persistent storage?
Thanks,
Carsten
The text was updated successfully, but these errors were encountered: