Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

Closed
yhyu13 opened this issue Sep 13, 2023 · 1 comment
Labels
enhancement New feature or request stale

Comments

@yhyu13
Copy link
Contributor

yhyu13 commented Sep 13, 2023

Description

https://github.com/FasterDecoding/Medusa is a method that generate a faster attention head for fine-tined models about 1/10 of model size by training on 0.1% of fine-tuning training dataset, Medusa only need to train a new "Attention Head". So this is the reason it is called "Medusa".

The result in almost 40% faster inference speed with little more memory comsumption.


Additional Context

[Medusa Roadmap]FasterDecoding/Medusa#3

Developers at Medusa are looking forwad to wider and faster adpotion for their techniques. I believe textgen webui is a great platform for landing newest LLM technologies, so we should support

1, support fine-tune Medusa Head for local models, and upload trained Medusa Head weight to HF
2, support download Medusa Head for specific models from HF if available, and apply to Medusa pipeline for faster local inference.

Let me know if this feat request is a good fit for textgen webui

@yhyu13 yhyu13 added the enhancement New feature or request label Sep 13, 2023
@github-actions github-actions bot added the stale label Oct 25, 2023
@github-actions
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants
@yhyu13 and others