-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Command-R Max Model Length #3727
[Bugfix] Command-R Max Model Length #3727
Conversation
cool thanks! |
Just so that I understand correctly, this allows people to increase the context length upto 128k and throw a warning if larger? |
A quick question: why not just change the model's |
@saurabhdash It's not a warning, it's a fatal raise. |
@esmeetu Because that's not normally done for any other models and it is not maintainable when pulling weights into cached location that may be updated. It's also not correct since rope scaling is not same as just changing the embedding size AFAIK. |
Just to clarify - we still use 8192 as the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put up a quick manual test for this?
@simon-mo Sure, here's a quick test script
On A100-80G
There's one small "visual" bug I fixed: the variable |
Currently, the max context window for
CohereForAI/c4ai-command-r-v01
is not defined bymax_position_embeddings
but a specialmodel_max_length
key instead. This has been discussed in these two threads: 1, 2We still use
max_position_embeddings
as defaultmax_model_len
for the memory concern, but when the user specifies a value higher thanmax_position_embeddings
but lower than or equal tomodel_max_length
, we will allow this to go through.This PR fixes #3676
cc @saurabhdash I'm not sure if there's a cleaner/better way to do this but please take a look.