[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

yhyu13 · 2023-09-13T16:54:11Z

Description

https://github.com/FasterDecoding/Medusa is a method that generate a faster attention head for fine-tined models about 1/10 of model size by training on 0.1% of fine-tuning training dataset, Medusa only need to train a new "Attention Head". So this is the reason it is called "Medusa".

The result in almost 40% faster inference speed with little more memory comsumption.

Additional Context

[Medusa Roadmap]FasterDecoding/Medusa#3

Developers at Medusa are looking forwad to wider and faster adpotion for their techniques. I believe textgen webui is a great platform for landing newest LLM technologies, so we should support

1, support fine-tune Medusa Head for local models, and upload trained Medusa Head weight to HF
2, support download Medusa Head for specific models from HF if available, and apply to Medusa pipeline for faster local inference.

Let me know if this feat request is a good fit for textgen webui

github-actions · 2023-10-25T23:16:19Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

yhyu13 added the enhancement New feature or request label Sep 13, 2023

github-actions bot added the stale label Oct 25, 2023

github-actions bot closed this as completed Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

yhyu13 commented Sep 13, 2023 •

edited

Loading

github-actions bot commented Oct 25, 2023

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head #3906

Comments

yhyu13 commented Sep 13, 2023 • edited Loading

github-actions bot commented Oct 25, 2023

yhyu13 commented Sep 13, 2023 •

edited

Loading