Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements: Papers #146

Open
Blaizzy opened this issue Dec 7, 2024 · 0 comments
Open

Improvements: Papers #146

Blaizzy opened this issue Dec 7, 2024 · 0 comments

Comments

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 7, 2024

VisionZip
A simple yet effective method that selects a set of informative tokens for input to the language model, reducing visual token redundancy and improving efficiency while maintaining model performance.

https://arxiv.org/pdf/2412.04467

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

In this work, we propose a training-free adaptive inference method for multimodal LLMs that can accommodate a broad range of efficiency requirements with a minimum performance drop.
Our method consists of a) iterative token merging based on
embedding similarity before LLMs, and b) progressive token pruning within LLM layers based on multi-modal importance. With a minimalist design, our method can be applied to both video and image LLMs. Extensive experiments on diverse video and image benchmarks demonstrate that, our method substantially reduces computation load (e.g., a 7-fold reduction in FLOPs) while preserving the performance of video and image LLMs.

https://arxiv.org/pdf/2412.03248

@Blaizzy Blaizzy changed the title Improvements: VisionZip Improvements: Papers Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant