-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#717: Add support for Huggingface Autotokenizer #790
#717: Add support for Huggingface Autotokenizer #790
Conversation
@bioshazard great work, very good idea! Give me a few days to settle on some decisions for this API and I'll merge this in! |
Hey, would you mind looking at my draft #809? Maybe we can squeeze this in there. That way we can incorporate the best of both worlds and potentially omit the need for an environment variable. |
Zephyr decided to have its own template. Exactly the kind of use case I want with Autotokenizer. @teleprint-me does your PR account for the stop token when using autotokenizer? |
That's ChatML, so yes. It does account for it. It wouldn't matter because I'm attempting to design my PR in a way that let's you define your own if necessary. So, you would just plug it in after creating a new I'm in the middle of 4 or 5 different projects at the moment, not including work, so it's going to take some time for each of them. I'm only one person and can only do so much. I'll get around to it though. If you have any time to spare, feel free to review what's there, and pitch some ideas. You can pull in my branch and even make some improvements if you see any room for anything. |
@bioshazard this is great. I just read the transformers chat_templating doc. and came to search for it in this repo as I've run into issues with formats in the past. For me, using the environment variable |
@gardner thanks! And yea I always use environment variables following the 12 Factor app. What alternative approach might make sense to you? An additional In this case I rather knew it was a simple path to a configurable model after passing the existing chat template argument. I am not married to the use of an environment variable and rather wanted to quickly offer a simple implementation example. I was wary of over committing to a design. Figured @abetlen would have some other preference I should align with first. |
@abetlen have you given any further thought to the use of Autotokenizer? |
Hey @bioshazard finally had a chance to merge in my functionary changes which required the introduction of a little more complexity to chat handling but it doesn't look to have impacted this PR. There's a few changes I want to make to this PR but I'll likely merge it first then make those adjustments:
|
That's great news! Thanks for the update and yea sounds good! 😁 |
I am sold on the idea of using an OpenAI style endpoint and love how this project delivers it. All that remains is a transparent chat templating process. This PR delivers universal chat templating via Huggingface Autotokenizer. Just set
HF_MODEL
before running the server and use--chat_model autotokenizer
.The templates hugging face provides don't quite seem to deliver a perfect outcome, but its got huge potential:
I plan to use this method to add my own customized templates to hugging face. This solution also adds a default stop token provided by the model. I see this as a clear path to universal chat templating that includes its own stop token so you can swap models while always keeping the same chat payload.
I also really like the use of HF_MODEL with env var as inspired by 12 Factor App, but any feedback or change is welcome to get this into the official repo. Thanks!
Relates to #717