llm-jp-VILA

This repository contains the code for training llm-jp/llm-jp-3-vila-14b, modified from VILA repository.

Installation

Python version: 3.10.12

python3 -m venv venv
source venv/bin/activate

pip install --upgrade pip
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/[email protected]
cp -rv ./llava/train/transformers_replace/* ./venv/lib/python3.10/site-packages/transformers/

Training

1. Prepare dataset

See data_prepare/README.md and prepare the training datasets for each step.

2. Run scripts

There are three stages in the model training.

step-0

Tuning the parameters of Projector using English and Japanese image-text pairs datasets. This takes about 14-15 hours on 8xA100 (40G).

script: scripts/mdx/release/1_train_step0.sh

step-1

Perform multimodal continual pre-training using a relatively large-scale dataset. This takes about 130 hours on 8x8xA100 (40G).

script: scripts/mdx/release/2_train_step1.sh

step-2

Fine-tune the model with multimodal instruction data in both English and Japanese. This takes about 11 hours on 4x8xA100 (40G).

script: scripts/mdx/release/3_train_step2.sh

Evaluations

We used llm-jp-eval-mm for evaluation. Please note that this is currently in beta version.

Inference

python -W ignore scripts/mdx/eval/run_inference_ja.py \
    --model-path llm-jp/llm-jp-3-vila-14b \
    --query "<image>\nこの画像について説明してください。" \
    --image-file path/to/image

Model weights

llm-jp-3-vila-14b

License

The code is released under the Apache License, Version 2.0.

Acknowledgement

This codebase is built upon the following projects:

LLaVA
VILA

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data_prepare		data_prepare
llava		llava
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-jp-VILA

Installation

Training

1. Prepare dataset

2. Run scripts

step-0

step-1

step-2

Evaluations

Inference

Model weights

License

Acknowledgement

About

Releases

Packages

Languages

License

llm-jp/llm-jp-VILA

Folders and files

Latest commit

History

Repository files navigation

llm-jp-VILA

Installation

Training

1. Prepare dataset

2. Run scripts

step-0

step-1

step-2

Evaluations

Inference

Model weights

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages