-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LM Enforcer cause hanged generation and what is the Sampler setting #486 #110
Comments
Interesting. I wonder if it could be related to if the token filtering in exllamav2 happens before or after the softmax is applied to the weights. If it is applied afterwards, the min_p requirement may prove impossible in some cases - if all of the tokens with p>0.06 after the softmax are marked illegal by LMFE. By hang, do you mean freeze, or crash? |
By hang i mean, the terminal just kinda not responding. I cant not kill it by ctrl+c but need to force close the terminal. If you have any idea on what I can try just let me know. The strange thing is without lm enforcer, it generated the function calling correctly, meaning the needed tokens are also the most likely tokens. However, it hanged when i ad lm enforcer. In addition, the same code work when i am using another model (same prompt, same lm enforcer), so i dont know if it is an issue with the way i defined the lm enforcer or not. Currently the way i defined the lm enforcer is a bit long winded.
Do you see any problem with how i create the token enforcer? |
I streamed the token and see what under the hood then turn out, hang just mean the generation is very very slow under certain prompt. I track the len of pass_tokens list and it hit almost 32k possible tokens. Do you have recommendation for the generation setting for this case? Also, i thought topk will kick in and limit it to top 50 token?
|
I think I am experiencing the same bug: TabbyAPI, Exllama 0.1.4, A6000 48Gb, Command-R, Windows 10 My case is similar in that the model produces valid JSON if json_schema is not enforced. |
You can refer to my issue created on exllamav2 git hub also. I closed the issue but seems like no solution at the moment |
I have been using LM enforcer for a while for function calling with exllamav2, and one in a while it will cause the exllama generation to hang.
Previously I just attributed it to the model to not be smart enough for function calling. However, I now can reliably produce the issue with a specific prompt and model. The strange thing is that the generation without lm enforcer is correct:
The prompt:
`
conversation....
coordinator_agent:
```json`
Correct result without using lm enforcer, just normal generation:
Could it be due to my sampler setting?
The model above is wizardlm 8x22b which is quite good at functional calling, as it can be seen the raw response without llm enforcer is correct.
Any advice is appreciated. Currently, I suspect it got to do with the sampling setting, as in most generation i can get correct function_calling response with the same lm enforcer setting
The text was updated successfully, but these errors were encountered: