Skip to content

Latest commit

 

History

History
146 lines (102 loc) · 7 KB

README_en.md

File metadata and controls

146 lines (102 loc) · 7 KB

Interesting Projects

All of the above projects are original works. Feel free to use them, but please respect my intellectual property rights and give a star if you find them useful.

OCR_VG

This project integrates OCR and localization tasks, considering layout issues. The project is located in the OCR_VG folder. You can access the Text Recognition and Localization Tutorial.

Project Effects

Project Effect 1
Project Effect 2

Cross-Modal Search Based on MiniCPMV2.0

Using multi-vector and contrastive learning methods, the goal is to train an end-to-end cross-modal search model that can understand dense text and complex tables. Model Link

Demonstration of Results:

  1. Input 20 candidate images:
    Candidate Images
  2. Input query text for search:
    Query Text
  3. Obtain the image most similar to the query.
    Most Similar Image
  4. Experimental Results: 300 validation set image-text pairs, with a Top1 match accuracy of 96%.

Usage Tutorial

See the Feishu Document

Cold Start to Acquire Agent Data

To quickly build an Agent, I have developed a tool for generating agent training data using large models, saving you 95% of the time. This includes data generation in qwen (react) and minicpm formats.

Zero Modification Data Generation Example Generate Code Click Here

Complex Agent Project

Using MiniCPMV2.6, I completed the project for the paper AutoPlan, which can plan and execute complex tasks.

Demonstration of Results:

  1. Input query:
    Input Query
  2. Obtain task decomposition
    Task Decomposition
  3. Obtain task execution
    Task Execution 1 Task Execution 2
  4. Obtain final answer
    Final Result

Usage Tutorial

See the Feishu Document

MBTI Role Playing

Unlike the team at Peking University's Chatlaw, which trains a model for each personality, I have completed seamless switching between 16 personalities using a single 2B model (enabling role-playing of multiple personalities).

Usage Tutorial

Role Playing

Demonstration of Results

ESTP Personality INTJ Personality ESTP Personality 1 INTJ Personality 1

Hybrid Modality Fine-Tuning

The fine-tuning of MiniCPMV only opened up training for text-image dual modalities. This project modified the training mode to include both pure text and text-image pairs, located in the MIniCPM_Series_Tutorial/ft_language_replace_file folder.

Usage Tutorial

You can access the Hybrid Modality Fine-Tuning Tutorial

The degradation of language modality capability due to alignment training refers to the multimodal model mllm, where the ability to respond to pure language inputs decreases, often referred to as the "alignment tax" (essentially another form of catastrophic forgetting). One simple method to mitigate catastrophic forgetting is to mix in raw data. For the loss of language capability in multimodal models, this involves mixing in language data. However, the question of which language data to mix and in what proportion is not the focus of this article, and I am also unable to solve this problem. For practical applications, mllm does not need to be a jack-of-all-trades in language capabilities; rather, it needs to maintain basic Q&A and specialized response capabilities in a specific domain while having excellent multimodal capabilities.

Running RAG with 4GB Memory

4GB Memory RAG 1 4GB Memory RAG 2

There isn't much to explain here. This project allows running RAG with very low memory.

Usage Tutorial

Tutorial available at RAG

AWQ Quantization for MiniCPMV2.6

Since bnb quantization of MiniCPMV2.6 cannot be loaded by vllm, I adapted autoawq. I have already submitted a PR to autoawq, and once merged, it will be directly usable.

Usage Tutorial

Feishu Tutorial Link Usage steps are as follows:

  1. Get the personal autoawq branch
    git clone https://github.com/LDLINGLINGLING/AutoAWQ
    cd AutoAWQ
    git checkout minicpmv2.6
    pip install e .
  2. Replace the modeling_minicpmv.py file in the MiniCPM_Series_Tutorial/MiniCPMV2_6_awq directory with the same-named file in the MiniCPMV2.6 model save path.
  3. Modify the model_path in MiniCPM_Series_Tutorial/MiniCPMV2_6_awq/quantize.py to your MiniCPMV2.6 save path.
  4. Run quantize.py.

After obtaining the awq quantized MiniCPMV2.6 model, you can deploy it using the original vllm, with the same deployment method. The model size is reduced from 16GB to 7GB of VRAM.

AWQ Quantization