[Feature]: Multimodal embeddings #338

dberardo-com · 2025-01-03T13:52:01Z

I would like to use images/pdfs as well as texts as both input of user queries as well as knowledge base for the embeddings store.

images/pdfs and/or textual documents should be treated as optional columns in source tables.

user can use images and/or texts to perform queries and those queries could return texts and/or images as results.

example:

user creates a table with stock exchange pdf market reports --> images are extracted in an "image" column
user creates a different table with the same stock exchange pdf market reports --> texts are extracted in a "text column"
user creates 2 different vectorizers that use different embedding models that can process text/image chunks
now user inputs a semantic search query "i would like to read about Tesla stock data for the year 2015" --> result is given from the images and/or texts best matching the prompt
now user inputs a text generation query "which stock performed best in 2024 ?" --> result is given
now user inputs a text generation query "which stock performed best in 2000 ?" --> result is not given because the reports only contained info about year >2010

No response

None

dberardo-com · 2025-01-03T13:53:23Z

a quick proof of concept analysis makes me understand that this should be possible to do with the current technology stack: https://blog.geomusings.com/2024/07/19/image-similarity-with-pgvector/ ?

dberardo-com added community pgai labels Jan 3, 2025

alejandrodnm assigned adolsalamanca Jan 8, 2025

Provide feedback