This repository demonstrates the process of fine-tuning LLAVA for various tasks, including data parsing and extracting JSON information from images. It provides comprehensive guidance on how to handle different datasets and fine-tune the model effectively.
A detailed explanation of the project is available in the following YouTube video:
Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing: Link
-
data_exploration/
Contains notebooks for exploring the Cord-V2 and DocVQA datasets. -
fine-tuning/
Includes:- A notebook for fine-tuning LLAVA 1.6 7B
- A notebook for testing the fine-tuned model
-
test_model/
Contains multiple notebooks for testing:- LLAVA 1.5 7B and 13B
- LLAVA 1.6 7B, 13B, and 34B
-
src/
Contains a Streamlit app to showcase the performance of the fine-tuned model.To run the dashboard:
- In Terminal 1:
python src/serve_model.py
- In Terminal 2:
streamlit run src/app.py
Open the dashboard at http://localhost:8501/ and upload sample images from the
data
folder to view the results. You can find 20 sample images in thedata
folder. - In Terminal 1:
-
Clone this repository using:
git clone https://github.com/Farzad-R/Finetune-LLAVA-NEXT.git
-
Install dependencies from
requirements.txt
:pip install -r requirements.txt
-
Install additional requirements:
pip install git+https://github.com/huggingface/transformers.git
- Link to Hyperstack Cloud
- HuggingFace Hub to access the model
- A link to a YouTube video will be added here soon to provide further insights and demonstrations.
- LLAVA-NEXT models.
- LLAVA-NEXT info.
- LLAVA-NEXT demo.
- LLAVA-NEXT GitHub repository.
- LLAVA 1.5 demo.
- LLAVA 1.5 GitHub repository.