Dissecting-VLMs

This repository builds upon the foundational work of the LLaVA project. Special thanks to @haotian-liu for making the research on multimodal models more accessible through open-sourcing the llava repository.

Enhancements in This Repository

Soft Mixture of Expert Projector Layer: A new architectural component for improved model performance.
Integration of Advanced Models:
- SigLIP
- AIMv2
- internViT
- DinoV2
Multinode Training with SLURM: Includes SLURM scripts to facilitate multinode training and scaling.

Code Modifications

Key changes were made to the following modules:

llava/model: Updates to support new features and integrations.
llava/train: Modifications for advanced training workflows.

Getting Started

To use the features of this repository, clone it and explore the updated model and train directories. Detailed instructions for multinode training using SLURM are available in the provided scripts.

Acknowledgments

This repository would not have been possible without the contributions of the LLaVA project and its authors. Their work lays the groundwork for further exploration and innovation in vision-language models.

For more details, refer to the LLaVA repository.

Name		Name	Last commit message	Last commit date
Latest commit History 466 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
images		images
llava		llava
playground/data		playground/data
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli-inference.sh		cli-inference.sh
cog.yaml		cog.yaml
extract_llm.sh		extract_llm.sh
extract_llm_2.py		extract_llm_2.py
extract_projector.sh		extract_projector.sh
interactive_4xa100.sh		interactive_4xa100.sh
interactive_4xa100Twins.sh		interactive_4xa100Twins.sh
interactive_a100.sh		interactive_a100.sh
interactive_a100Twins.sh		interactive_a100Twins.sh
myhostfile		myhostfile
predict.py		predict.py
pyproject.toml		pyproject.toml
requirements_soft-moe-llava-2.txt		requirements_soft-moe-llava-2.txt
send_extract-llm.sh		send_extract-llm.sh
sendjob_llava_train_4a100.sh		sendjob_llava_train_4a100.sh
sendjob_llava_train_4a100_2.sh		sendjob_llava_train_4a100_2.sh
sendjob_llava_train_4a100_3.sh		sendjob_llava_train_4a100_3.sh
sendjob_llava_train_4a100_4.sh		sendjob_llava_train_4a100_4.sh
sendjob_llava_train_4a100_5.sh		sendjob_llava_train_4a100_5.sh
senjob_llava_train_a100.sh		senjob_llava_train_a100.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dissecting-VLMs

Enhancements in This Repository

Code Modifications

Getting Started

Acknowledgments

About

Releases

Packages

Languages

License

AhmedZeer/Dissecting-VLMs

Folders and files

Latest commit

History

Repository files navigation

Dissecting-VLMs

Enhancements in This Repository

Code Modifications

Getting Started

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages