Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for vLLM Mistral Tokenizer #142

Merged

Conversation

gcalmettes
Copy link
Contributor

@gcalmettes gcalmettes commented Sep 25, 2024

fix #141

The vllm MistralTokenizer class is a wrapper over the mistral tokenizers from mistral-common.

When the build_vllm_token_enforcer_tokenizer_data function is called, the underlying mistral-common tokenizer is passed instead of the MistralTokenizer, which has the proper methods used by lm-format-enforcer to access the tokens and special tokens of the tokenizer.

This MR adds a check to directly pass the vllm MistralTokenizer tokenizer instead of the underlying mistral-common tokenizer when building the tokenizer data.

@gcalmettes gcalmettes force-pushed the feat/add-support-for-mistral-tokenizer branch from f767edc to 4b13606 Compare September 25, 2024 11:51
@gcalmettes gcalmettes force-pushed the feat/add-support-for-mistral-tokenizer branch from 4b13606 to 22de027 Compare September 25, 2024 11:52
@noamgat
Copy link
Owner

noamgat commented Sep 26, 2024

Thanks for the contribution! I will review it this weekend and get back to you

@noamgat noamgat merged commit f649926 into noamgat:main Sep 27, 2024
1 check passed
@noamgat
Copy link
Owner

noamgat commented Sep 27, 2024

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatibility with the vLLM Mistral Tokenizer
2 participants