Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#717: Add support for Huggingface Autotokenizer #790

Merged
merged 2 commits into from
Nov 5, 2023

Conversation

bioshazard
Copy link
Contributor

@bioshazard bioshazard commented Oct 3, 2023

I am sold on the idea of using an OpenAI style endpoint and love how this project delivers it. All that remains is a transparent chat templating process. This PR delivers universal chat templating via Huggingface Autotokenizer. Just set HF_MODEL before running the server and use --chat_model autotokenizer.

The templates hugging face provides don't quite seem to deliver a perfect outcome, but its got huge potential:

  • Nous hermes 13b doesn't include "### Response" in the prompt so the reply includes it and has to be stripped manually
  • mistral 7b instruct errors if you include a system message instead of just dropping it

I plan to use this method to add my own customized templates to hugging face. This solution also adds a default stop token provided by the model. I see this as a clear path to universal chat templating that includes its own stop token so you can swap models while always keeping the same chat payload.

I also really like the use of HF_MODEL with env var as inspired by 12 Factor App, but any feedback or change is welcome to get this into the official repo. Thanks!

Relates to #717

@abetlen
Copy link
Owner

abetlen commented Oct 5, 2023

@bioshazard great work, very good idea! Give me a few days to settle on some decisions for this API and I'll merge this in!

@teleprint-me
Copy link
Contributor

teleprint-me commented Oct 10, 2023

Hey, would you mind looking at my draft #809? Maybe we can squeeze this in there.

That way we can incorporate the best of both worlds and potentially omit the need for an environment variable.

@bioshazard
Copy link
Contributor Author

bioshazard commented Oct 16, 2023

Zephyr decided to have its own template. Exactly the kind of use case I want with Autotokenizer. @teleprint-me does your PR account for the stop token when using autotokenizer?

@teleprint-me
Copy link
Contributor

@bioshazard

That's ChatML, so yes. It does account for it.

It wouldn't matter because I'm attempting to design my PR in a way that let's you define your own if necessary. So, you would just plug it in after creating a new Llama instance and then start using create_chat_completion after setting the new custom template.

I'm in the middle of 4 or 5 different projects at the moment, not including work, so it's going to take some time for each of them. I'm only one person and can only do so much.

I'll get around to it though. If you have any time to spare, feel free to review what's there, and pitch some ideas. You can pull in my branch and even make some improvements if you see any room for anything.

@gardner
Copy link
Contributor

gardner commented Oct 19, 2023

@bioshazard this is great. I just read the transformers chat_templating doc. and came to search for it in this repo as I've run into issues with formats in the past.

For me, using the environment variable HF_MODEL is a bit awkward.

@bioshazard
Copy link
Contributor Author

bioshazard commented Oct 19, 2023

@gardner thanks! And yea I always use environment variables following the 12 Factor app. What alternative approach might make sense to you? An additional --flag like --hfmodel that only works with this new template option? Have you looked at @teleprint-me's #809 alternative?

In this case I rather knew it was a simple path to a configurable model after passing the existing chat template argument. I am not married to the use of an environment variable and rather wanted to quickly offer a simple implementation example. I was wary of over committing to a design. Figured @abetlen would have some other preference I should align with first.

@bioshazard
Copy link
Contributor Author

@abetlen have you given any further thought to the use of Autotokenizer?

@abetlen
Copy link
Owner

abetlen commented Nov 5, 2023

Hey @bioshazard finally had a chance to merge in my functionary changes which required the introduction of a little more complexity to chat handling but it doesn't look to have impacted this PR.

There's a few changes I want to make to this PR but I'll likely merge it first then make those adjustments:

  • This should probably be used as a helper function that registers a chat template given a path to a huggingface model
  • When users import the llama_cpp library in python they can use this function to register the template directly and using it in subsequent chat_completion calls
  • When using the server we can add a HF_MODEL or similar environment variable that registers the handler and uses it directly.

@bioshazard
Copy link
Contributor Author

That's great news! Thanks for the update and yea sounds good! 😁

@abetlen abetlen merged commit 4ff8def into abetlen:main Nov 5, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants