#717: Add support for Huggingface Autotokenizer #790

bioshazard · 2023-10-03T18:23:17Z

I am sold on the idea of using an OpenAI style endpoint and love how this project delivers it. All that remains is a transparent chat templating process. This PR delivers universal chat templating via Huggingface Autotokenizer. Just set HF_MODEL before running the server and use --chat_model autotokenizer.

The templates hugging face provides don't quite seem to deliver a perfect outcome, but its got huge potential:

Nous hermes 13b doesn't include "### Response" in the prompt so the reply includes it and has to be stripped manually
mistral 7b instruct errors if you include a system message instead of just dropping it

I plan to use this method to add my own customized templates to hugging face. This solution also adds a default stop token provided by the model. I see this as a clear path to universal chat templating that includes its own stop token so you can swap models while always keeping the same chat payload.

I also really like the use of HF_MODEL with env var as inspired by 12 Factor App, but any feedback or change is welcome to get this into the official repo. Thanks!

Relates to #717

abetlen · 2023-10-05T00:16:57Z

@bioshazard great work, very good idea! Give me a few days to settle on some decisions for this API and I'll merge this in!

teleprint-me · 2023-10-10T13:16:42Z

Hey, would you mind looking at my draft #809? Maybe we can squeeze this in there.

That way we can incorporate the best of both worlds and potentially omit the need for an environment variable.

bioshazard · 2023-10-16T18:26:04Z

Zephyr decided to have its own template. Exactly the kind of use case I want with Autotokenizer. @teleprint-me does your PR account for the stop token when using autotokenizer?

teleprint-me · 2023-10-16T20:04:38Z

@bioshazard

That's ChatML, so yes. It does account for it.

It wouldn't matter because I'm attempting to design my PR in a way that let's you define your own if necessary. So, you would just plug it in after creating a new Llama instance and then start using create_chat_completion after setting the new custom template.

I'm in the middle of 4 or 5 different projects at the moment, not including work, so it's going to take some time for each of them. I'm only one person and can only do so much.

I'll get around to it though. If you have any time to spare, feel free to review what's there, and pitch some ideas. You can pull in my branch and even make some improvements if you see any room for anything.

gardner · 2023-10-19T10:48:55Z

@bioshazard this is great. I just read the transformers chat_templating doc. and came to search for it in this repo as I've run into issues with formats in the past.

For me, using the environment variable HF_MODEL is a bit awkward.

bioshazard · 2023-10-19T13:26:29Z

@gardner thanks! And yea I always use environment variables following the 12 Factor app. What alternative approach might make sense to you? An additional --flag like --hfmodel that only works with this new template option? Have you looked at @teleprint-me's #809 alternative?

In this case I rather knew it was a simple path to a configurable model after passing the existing chat template argument. I am not married to the use of an environment variable and rather wanted to quickly offer a simple implementation example. I was wary of over committing to a design. Figured @abetlen would have some other preference I should align with first.

bioshazard · 2023-10-27T19:17:36Z

@abetlen have you given any further thought to the use of Autotokenizer?

abetlen · 2023-11-05T22:13:43Z

Hey @bioshazard finally had a chance to merge in my functionary changes which required the introduction of a little more complexity to chat handling but it doesn't look to have impacted this PR.

There's a few changes I want to make to this PR but I'll likely merge it first then make those adjustments:

This should probably be used as a helper function that registers a chat template given a path to a huggingface model
When users import the llama_cpp library in python they can use this function to register the template directly and using it in subsequent chat_completion calls
When using the server we can add a HF_MODEL or similar environment variable that registers the handler and uses it directly.

bioshazard · 2023-11-05T22:20:04Z

That's great news! Thanks for the update and yea sounds good! 😁

abetlen#717: Add support for Huggingface Autotokenizer

f93a2bb

bioshazard mentioned this pull request Oct 10, 2023

Draft: Refactor Chat Formatter for Enhanced Flexibility and Extensibility #809

Closed

Merge branch 'main' into 717-chat-template-autotokenizer

b336853

abetlen merged commit 4ff8def into abetlen:main Nov 5, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#717: Add support for Huggingface Autotokenizer #790

#717: Add support for Huggingface Autotokenizer #790

bioshazard commented Oct 3, 2023 •

edited

Loading

abetlen commented Oct 5, 2023

teleprint-me commented Oct 10, 2023 •

edited

Loading

bioshazard commented Oct 16, 2023 •

edited

Loading

teleprint-me commented Oct 16, 2023

gardner commented Oct 19, 2023

bioshazard commented Oct 19, 2023 •

edited

Loading

bioshazard commented Oct 27, 2023

abetlen commented Nov 5, 2023

bioshazard commented Nov 5, 2023

#717: Add support for Huggingface Autotokenizer #790

#717: Add support for Huggingface Autotokenizer #790

Conversation

bioshazard commented Oct 3, 2023 • edited Loading

abetlen commented Oct 5, 2023

teleprint-me commented Oct 10, 2023 • edited Loading

bioshazard commented Oct 16, 2023 • edited Loading

teleprint-me commented Oct 16, 2023

gardner commented Oct 19, 2023

bioshazard commented Oct 19, 2023 • edited Loading

bioshazard commented Oct 27, 2023

abetlen commented Nov 5, 2023

bioshazard commented Nov 5, 2023

bioshazard commented Oct 3, 2023 •

edited

Loading

teleprint-me commented Oct 10, 2023 •

edited

Loading

bioshazard commented Oct 16, 2023 •

edited

Loading

bioshazard commented Oct 19, 2023 •

edited

Loading