Skip to content

Build a multimodal RAG system that supports extracting text, tables and images in a pdf document. See how we can use text LLMs and multi-modal LLMs together in the same pipeline.

Notifications You must be signed in to change notification settings

cevoaustralia/multimodal-rag-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal RAG with PDFs

PDFs will contain text, tables and images. This project aims build a RAG system that handles this with the help of multimodal language models.

  • text will be handled by a text LLM
  • tables will be handled by a text LLM
  • images will be handled by a multimodal LLM

Environment setup and installation

  • use pyenv to manage python versions
  • use venv to manage your virtual environments
pyenv versions
pyenv install 3.12.2
pyenv local 3.12.2
python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

About

Build a multimodal RAG system that supports extracting text, tables and images in a pdf document. See how we can use text LLMs and multi-modal LLMs together in the same pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published