You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, this is a known limitation of the approach taken by LM Format
Enforcer. I will look into how the outlines PR works and see if we can
adapt its approach.
If anyone wants to take a crack at it, they are more than welcome :)
On Thu, Jun 27, 2024 at 3:56 AM milesial ***@***.***> wrote:
Hi, using version 0.10.3 and the llama3 tokenizer, with vLLM, I can't seem
to constrain to generate emojis.
curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"messages": [
{
"content": "",
"role": "user"
}
],
"guided_decoding_backend": "lm-format-enforcer",
"guided_choice": ["🐈"],
"temperature": 0.0,
"top_p": 0.7,
"max_tokens": 100,
"stream": false
}'
[ERROR] Unknown LMFormatEnforcer Problem. Prefix: ''
Even though the tokenizer supports it
tok = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
tok.encode("🐈")
[128000, 9468, 238, 230]
It might be related to multi-tokens characters, outlines had to deal with
similar issues: dottxt-ai/outlines#738
<dottxt-ai/outlines#738>
—
Reply to this email directly, view it on GitHub
<#116>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKFA2E33F7S5V5SRKF2NUTZJNPLJAVCNFSM6AAAAABJ64YMKOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TMNJQGM3TANA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
Hi, using version 0.10.3 and the llama3 tokenizer, with vLLM, I can't seem to constrain to generate emojis.
Even though the tokenizer supports it
It might be related to multi-tokens characters, outlines had to deal with similar issues: dottxt-ai/outlines#738
The text was updated successfully, but these errors were encountered: