-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model support for gemma 9b #79
Comments
Normally, only one of |
@guoqingbao thanks for checking this, but I am still getting the following error.
|
I have checked the structure of gemma-2 and found it is different from gemma that we currently supported. We will add support for gemma-2 in the later updates, something like porting candle gemma-2 to candle-vllm huggingface/candle@c1b9e07 |
@guoqingbao @sigridjineth we can perhaps base this work on EricLBuehler/mistral.rs#490 and EricLBuehler/mistral.rs#554. |
Sure! |
@guoqingbao it looks like we will need to update PA kernels for softcapping though. |
Have you encounter problems for gemma-2 in Mistral.rs using the current PA kernels? |
I actually disabled PA for that case, so I haven't looked into it but we can add it here! |
I found the introduction of soft-capping in Google's Gemma-2 release, it seems that soft-capping is designed for training, and I'm not sure if it is neccessary in the inference: https://huggingface.co/blog/gemma2 "Soft capping is a technique that prevents logits from growing excessively large without truncating them. It works by dividing the logits by a maximum value threshold (soft_cap), then passing them through a tanh layer (ensuring they are in the (-1, 1) range), and finally multiplying by the threshold again. This guarantees that the final values will be in the (-soft_cap, +soft_cap) interval without losing much information but stabilizing the training." |
Yeah, I agree. This would probably be fine to exclude, looks like vLLM does the same and doesn't even use the sliding window interleaving (they use global attention everywhere). I'll add the PR shortly! |
Yes, I also checked the newest version of vLLM and didn't found their implementation of soft-capping for attention. It seems the feature (soft-capping) is only available in Jax for training. |
@sigridjineth As discussed in #84 , Gemma-2 models are now supported, please refer to #86. Please feel free to report if there is any other issues. |
@guoqingbao Thank you very much! |
@guoqingbao I found that gemma-2b does not support at this moment then. I will work on that on my spare time. thanks for your attention!
|
@sigridjineth this is because the eos tok id's are am array for 2b, we have a struct for this which implements Deserialize if you want to add a PR! |
This PR should address the problem 124fadc |
Symptoms
I found that using google/gemma-9b-it raises an error by stating that below.
The text was updated successfully, but these errors were encountered: