Interesting Projects

OCR_VG
Cross-Modal Search Based on MiniCPMV2.0
Complex Agent Project
MBTI Role Playing
Hybrid Modality Fine-Tuning
Running RAG with 4GB Memory
AWQ Quantization for MiniCPMV2.6
Cold Start to Acquire Agent Data

All of the above projects are original works. Feel free to use them, but please respect my intellectual property rights and give a star if you find them useful.

OCR_VG

This project integrates OCR and localization tasks, considering layout issues. The project is located in the OCR_VG folder. You can access the Text Recognition and Localization Tutorial.

Project Effects

Cross-Modal Search Based on MiniCPMV2.0

Using multi-vector and contrastive learning methods, the goal is to train an end-to-end cross-modal search model that can understand dense text and complex tables. Model Link

Demonstration of Results:

Input 20 candidate images:
Input query text for search:
Obtain the image most similar to the query.
Experimental Results: 300 validation set image-text pairs, with a Top1 match accuracy of 96%.

Usage Tutorial

See the Feishu Document

Cold Start to Acquire Agent Data

To quickly build an Agent, I have developed a tool for generating agent training data using large models, saving you 95% of the time. This includes data generation in qwen (react) and minicpm formats.

Zero Modification Data Generation Example Generate Code Click Here

Complex Agent Project

Using MiniCPMV2.6, I completed the project for the paper AutoPlan, which can plan and execute complex tasks.

Demonstration of Results:

Input query:
Obtain task decomposition
Obtain task execution
Obtain final answer

Usage Tutorial

See the Feishu Document

MBTI Role Playing

Unlike the team at Peking University's Chatlaw, which trains a model for each personality, I have completed seamless switching between 16 personalities using a single 2B model (enabling role-playing of multiple personalities).

Usage Tutorial

Role Playing

Demonstration of Results

Hybrid Modality Fine-Tuning

The fine-tuning of MiniCPMV only opened up training for text-image dual modalities. This project modified the training mode to include both pure text and text-image pairs, located in the MIniCPM_Series_Tutorial/ft_language_replace_file folder.

Usage Tutorial

You can access the Hybrid Modality Fine-Tuning Tutorial

The degradation of language modality capability due to alignment training refers to the multimodal model mllm, where the ability to respond to pure language inputs decreases, often referred to as the "alignment tax" (essentially another form of catastrophic forgetting). One simple method to mitigate catastrophic forgetting is to mix in raw data. For the loss of language capability in multimodal models, this involves mixing in language data. However, the question of which language data to mix and in what proportion is not the focus of this article, and I am also unable to solve this problem. For practical applications, mllm does not need to be a jack-of-all-trades in language capabilities; rather, it needs to maintain basic Q&A and specialized response capabilities in a specific domain while having excellent multimodal capabilities.

Running RAG with 4GB Memory

There isn't much to explain here. This project allows running RAG with very low memory.

Usage Tutorial

Tutorial available at RAG

AWQ Quantization for MiniCPMV2.6

Since bnb quantization of MiniCPMV2.6 cannot be loaded by vllm, I adapted autoawq. I have already submitted a PR to autoawq, and once merged, it will be directly usable.

Usage Tutorial

Feishu Tutorial Link Usage steps are as follows:

Get the personal autoawq branch

git clone https://github.com/LDLINGLINGLING/AutoAWQ
cd AutoAWQ
git checkout minicpmv2.6
pip install e .

Replace the modeling_minicpmv.py file in the MiniCPM_Series_Tutorial/MiniCPMV2_6_awq directory with the same-named file in the MiniCPMV2.6 model save path.
Modify the model_path in MiniCPM_Series_Tutorial/MiniCPMV2_6_awq/quantize.py to your MiniCPMV2.6 save path.
Run quantize.py.

After obtaining the awq quantized MiniCPMV2.6 model, you can deploy it using the original vllm, with the same deployment method. The model size is reduced from 16GB to 7GB of VRAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Interesting Projects

OCR_VG

Project Effects

Cross-Modal Search Based on MiniCPMV2.0

Demonstration of Results:

Usage Tutorial

Cold Start to Acquire Agent Data

Complex Agent Project

Demonstration of Results:

Usage Tutorial

MBTI Role Playing

Usage Tutorial

Demonstration of Results

Hybrid Modality Fine-Tuning

Usage Tutorial

Running RAG with 4GB Memory

Usage Tutorial

AWQ Quantization for MiniCPMV2.6

Usage Tutorial

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Interesting Projects

OCR_VG

Project Effects

Cross-Modal Search Based on MiniCPMV2.0

Demonstration of Results:

Usage Tutorial

Cold Start to Acquire Agent Data

Complex Agent Project

Demonstration of Results:

Usage Tutorial

MBTI Role Playing

Usage Tutorial

Demonstration of Results

Hybrid Modality Fine-Tuning

Usage Tutorial

Running RAG with 4GB Memory

Usage Tutorial

AWQ Quantization for MiniCPMV2.6

Usage Tutorial