Add MLCD Model #36181

tanhuajie · 2025-02-13T17:41:01Z

Model description

The MLCD models were released by the DeepGlint-AI team in unicom, which focuses on building foundational visual models for large multimodal language models using large-scale datasets such as LAION400M and COYO700M, and employs sample-to-cluster contrastive learning to optimize performance. MLCD models are primarily used for multimodal visual large language models, such as LLaVA.

🔥MLCD-ViT-bigG🔥 series is the state-of-the-art vision transformer model enhanced with 2D Rotary Position Embedding (RoPE2D), achieving superior performance on document understanding and visual question answering tasks. Developed by DeepGlint AI, this model demonstrates exceptional capabilities in processing complex visual-language interactions.

Tips:

We adopted the official LLaVA-NeXT and the official training dataset LLaVA-NeXT-Data for evaluating the foundational visual models.
The language model is Qwen2.5-7B.

Result:

Vision Tower	RoPE2D	ChartQA	DocVQA	InfoVQA	OCRBench	MMMU
CLIP (ViT-L-14-336px)	×	66.52	75.21	38.88	525.00	44.20
SigLIP (ViT-SO400M-384px)	×	69.28	76.71	41.38	554.00	46.78
DFN5B (ViT-H-14-378px)	×	64.36	70.87	38.59	473.00	48.00
MLCD (ViT-L-14-336px)	×	67.84	76.46	43.48	531.00	44.30
MLCD (ViT-bigG-14-336px)	√	71.07	79.63	44.38	572.00	46.78
MLCD (ViT-bigG-14-448px)	√	73.80	83.34	46.59	582.00	46.00

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

No response

tanhuajie added the New model label Feb 13, 2025

tanhuajie linked a pull request Feb 13, 2025 that will close this issue

Add MLCD model #36182

Open

5 tasks

qubvel added the Vision label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLCD Model #36181

Add MLCD Model #36181

tanhuajie commented Feb 13, 2025 •

edited

Loading

Add MLCD Model #36181

Add MLCD Model #36181

Comments

tanhuajie commented Feb 13, 2025 • edited Loading

Model description

Open source status

Provide useful links for the implementation

tanhuajie commented Feb 13, 2025 •

edited

Loading