You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
https://github.com/FasterDecoding/Medusa is a method that generate a faster attention head for fine-tined models about 1/10 of model size by training on 0.1% of fine-tuning training dataset, Medusa only need to train a new "Attention Head". So this is the reason it is called "Medusa".
The result in almost 40% faster inference speed with little more memory comsumption.
Developers at Medusa are looking forwad to wider and faster adpotion for their techniques. I believe textgen webui is a great platform for landing newest LLM technologies, so we should support
1, support fine-tune Medusa Head for local models, and upload trained Medusa Head weight to HF
2, support download Medusa Head for specific models from HF if available, and apply to Medusa pipeline for faster local inference.
Let me know if this feat request is a good fit for textgen webui
The text was updated successfully, but these errors were encountered:
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Description
https://github.com/FasterDecoding/Medusa is a method that generate a faster attention head for fine-tined models about 1/10 of model size by training on 0.1% of fine-tuning training dataset, Medusa only need to train a new "Attention Head". So this is the reason it is called "Medusa".
The result in almost 40% faster inference speed with little more memory comsumption.
Additional Context
[Medusa Roadmap]FasterDecoding/Medusa#3
Developers at Medusa are looking forwad to wider and faster adpotion for their techniques. I believe textgen webui is a great platform for landing newest LLM technologies, so we should support
1, support fine-tune Medusa Head for local models, and upload trained Medusa Head weight to HF
2, support download Medusa Head for specific models from HF if available, and apply to Medusa pipeline for faster local inference.
Let me know if this feat request is a good fit for textgen webui
The text was updated successfully, but these errors were encountered: