Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi modal rag demo #698

Open
wants to merge 56 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
aaadefb
Notebook for Part 3
init27 Oct 1, 2024
02d53f1
Create Part_1_Data_Preperation.ipynb
init27 Oct 1, 2024
730bc6f
Update Part_1_Data_Preperation.ipynb
init27 Oct 1, 2024
1773feb
Create label_script.py
init27 Oct 1, 2024
6dd384c
Update label_script.py
init27 Oct 1, 2024
4a297e5
Update label_script.py
init27 Oct 1, 2024
849684b
Update label_script.py
init27 Oct 1, 2024
7a9e595
Update label_script.py
init27 Oct 1, 2024
66cb619
Create Part_2_Cleaning_Data_and_DB
init27 Oct 2, 2024
f7e687b
Update Part_2_Cleaning_Data_and_DB
init27 Oct 2, 2024
abe2c5a
Add Gradio App and Re-fact Part 3 nb
init27 Oct 6, 2024
fde6ebe
add drop down menu
init27 Oct 6, 2024
f1c1f6d
Add methods
init27 Oct 6, 2024
089e0b1
Fix Methods and fix prompts
init27 Oct 6, 2024
9bbf3ef
Add high level ReadMe
init27 Oct 6, 2024
c8b1a2d
Update final_demo.py
init27 Oct 17, 2024
ca7cdce
Fix Folders, Add readme
init27 Oct 17, 2024
0fb47b6
Remove nb
init27 Oct 18, 2024
480fb94
fix nbs
init27 Oct 18, 2024
181e841
Fix Spelling Mistakes
init27 Oct 18, 2024
30018f8
Fix link
init27 Oct 18, 2024
2fde71d
Part 1 Readme
init27 Oct 18, 2024
bdb05c1
Fix Readme
init27 Oct 18, 2024
e76e148
Spell Check Round 2
init27 Oct 18, 2024
fe115fb
Round 3
init27 Oct 18, 2024
532618e
Rename
init27 Oct 18, 2024
d7cf320
Credit
init27 Oct 18, 2024
d984730
Nb 1
init27 Oct 18, 2024
fa2b2b6
Nb 1
init27 Oct 18, 2024
d77aa30
Part 1-fin
init27 Oct 18, 2024
3adf211
nb-2
init27 Oct 18, 2024
a013bc8
nb2-final
init27 Oct 18, 2024
9926fb2
Nb-3 complete
init27 Oct 18, 2024
1687587
Fix spelling
init27 Oct 20, 2024
cdaf80b
Rebase (#789)
init27 Nov 15, 2024
1a7e4a0
Update README.md
init27 Nov 15, 2024
26f0096
Update README.md
init27 Nov 17, 2024
03ffc5b
Update README.md
init27 Nov 20, 2024
3996d06
Update README.md
init27 Nov 20, 2024
1a2f8f8
Update README.md
init27 Nov 20, 2024
8764d5f
Update README.md
init27 Nov 20, 2024
1c02602
Update README.md
init27 Nov 20, 2024
565ed73
Update README.md
init27 Nov 20, 2024
cecfe46
Update Part_1_Data_Preperation.ipynb
init27 Nov 20, 2024
7b67aa5
Update Part_2_Cleaning_Data_and_DB.ipynb
init27 Nov 20, 2024
4640c83
Update Part_3_RAG_Setup_and_Validation.ipynb
init27 Nov 20, 2024
ec23508
Update Part_3_RAG_Setup_and_Validation.ipynb
init27 Nov 20, 2024
57ee65a
Update label_script.py
init27 Nov 20, 2024
7676268
rebase (#794)
init27 Nov 20, 2024
4d35c8c
Update recipes/quickstart/Multi-Modal-RAG/README.md
init27 Nov 20, 2024
c9df1be
Update recipes/quickstart/Multi-Modal-RAG/README.md
init27 Nov 20, 2024
9950b75
address comments
init27 Nov 20, 2024
51a5ae8
Update recipes/quickstart/Multi-Modal-RAG/notebooks/Part_1_Data_Prepe…
init27 Nov 20, 2024
78bdd63
Merge branch 'Multi-Modal-RAG-Demo' of https://github.com/meta-llama/…
init27 Nov 20, 2024
0284c66
Update Part_1_Data_Preperation.ipynb
init27 Nov 20, 2024
62c1fa4
move to use-cases
init27 Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions recipes/quickstart/Multi-Modal-RAG/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# End to End Tutorial on using Llama models for Multi-Modal RAG
init27 marked this conversation as resolved.
Show resolved Hide resolved

## Recipe Overview: Multi-Modal RAG using `Llama-3.2-11B` model:

This is a complete workshop on labelling images using the new Llama 3.2-Vision Models and performing RAG using the image caption capiblites of the model.

- **Data Labeling and Preparation:** We start by downloading 5000 images of clothing items and labeling them using `Llama-3.2-11B-Vision-Instruct` model
- **Cleaning Labels:** With the labels based on the notebook above, we will then clean the dataset and prepare it for RAG
- **Building Vector DB and RAG Pipeline:** With the final clean dataset, we can use descriptions and 11B model to generate recommendations

## Requirements:

Before we start:

1. Please grab your HF CLI Token from [here](https://huggingface.co/settings/tokens)
2. git clone [this dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) inside the Multi-Modal-RAG folder: `git clone https://huggingface.co/datasets/Sanyam/MM-Demo`
3. Launch jupyter notebook inside this folder
4. We will also run two scripts after the notebooks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unnecessary to be part of the requirements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamidShojanazeri actually we need to clone the dataset from Huggingface, so im pointing that out.

Do you think I should move it inside a notebook?

5. Make sure you grab a together.ai token [here](https://www.together.ai)

## Detailed Outline for running:

Order of running files, the notebook establish the method of approaching the problem. Once we establish it, we use the scripts to run the method end to end.

- Notebook 1: `Part_1_Data_Preperation.ipynb`
- Script: `label_script.py`
- Notebook 2: `Part_2_Cleaning_Data_and_DB.ipynb`
- Notebook 3: `Part_3_RAG_Setup_and_Validation.ipynb`
- Script: `final_demo.py`

Here's the detailed outline:

### Step 1: Data Prep and Synthetic Labeling:

init27 marked this conversation as resolved.
Show resolved Hide resolved
[Notebook for Step 1](./notebooks/Part_1_Data_Preperation.ipynb) and [Script for Step 1](./scripts/label_script.py)

To run the script (remember to set n):
```
python scripts/label_script.py --hf_token "your_huggingface_token_here" \
--input_path "../MM-Demo/images_compressed" \
--output_path "../MM-Demo/output/" \
--num_gpus N
```

The dataset consists of 5000 images with some meta-data.

The first half is preparing the dataset for labeling:
- Clean/Remove corrupt images
- EDA to understand existing distribution
init27 marked this conversation as resolved.
Show resolved Hide resolved
- Merging up categories of clothes to reduce complexity
- Balancing dataset by randomly sampling images
init27 marked this conversation as resolved.
Show resolved Hide resolved

Second Half consists of Labeling the dataset. We are bound by an interesting constraint here, 11B model can only caption one image at a time:
init27 marked this conversation as resolved.
Show resolved Hide resolved
- We load a few images and test captioning
- We run this pipeline on random images and iterate on the prompt till we feel the model is giving good outputs
- Finally, we can create a script to label all 5000 images on multi-GPU

After running the script on the entire dataset, we have more data cleaning to perform.

### Step 2: Cleaning up Synthetic Labels and preparing the dataset:

[Notebook for Step 2](./notebooks/Part_2_Cleaning_Data_and_DB.ipynb)

Even after our lengthy (apart from other things) prompt, the model still hallucinates categories and label-we need to address this
init27 marked this conversation as resolved.
Show resolved Hide resolved
init27 marked this conversation as resolved.
Show resolved Hide resolved

- Re-balance the dataset by mapping correct categories
init27 marked this conversation as resolved.
Show resolved Hide resolved
- Fix Descriptions so that we can create a CSV

Now, we are ready to try our vector db pipeline:

### Step 3: Notebook 3: MM-RAG using lance-db to validate idea

[Notebook for Step 3](./notebooks/Part_3_RAG_Setup_and_Validation.ipynb) and [Final Demo Script](./scripts/label_script.py)


With the cleaned descriptions and dataset, we can now store these in a vector-db

You will note that we are not using the categorization from our model-this is by design to show how RAG can simplify a lot of things.
init27 marked this conversation as resolved.
Show resolved Hide resolved

- We create embeddings using the text description of our clothes
- Use 11-B model to describe the uploaded image
- Try to find similar or complimentary images based on the upload

We try the approach with different retrieval methods.

Finally, we can bring this all together in a Gradio App.

For running the script:
```
python scripts/final_demo.py \
--images_folder "../MM-Demo/compressed_images" \
--csv_path "../MM-Demo/final_balanced_sample_dataset.csv" \
--table_path "~/.lancedb" \
--api_key "your_together_api_key" \
--default_model "BAAI/bge-large-en-v1.5" \
--use_existing_table
```

Task: We can further improve the description prompt. You will notice sometimes the description starts with the title of the cloth which causes in retrieval of "similar" clothes instead of "complementary" items
init27 marked this conversation as resolved.
Show resolved Hide resolved

- Upload an image
- 11B model describes the image
- We retrieve complementary clothes to wear based on the description
- You can keep the loop going by chatting with the model

## Resources used:

Credit and Thanks to List of models and resources used in the showcase:

Firstly, thanks to the author here for providing this dataset on which we base our exercise []()

- [Llama-3.2-11B-Vision-Instruct Model](https://www.llama.com/docs/how-to-guides/vision-capabilities/)
- [Lance-db for vector database](https://lancedb.com)
- [This Kaggle dataset]()
- [HF Dataset](https://huggingface.co/datasets/Sanyam/MM-Demo) Since output of the model can be non-deterministic every time we run, we will use the uploaded dataset to give a universal experience
- [Together API for demo](https://www.together.ai)
Loading
Loading