-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_length parameter is not honored w/ Vicuna13-b #4
Comments
Hi, @rlancemartin. This sounds more like an issue for the Vicuna-13B model than the Python client library itself, so I'm going to transfer to that repo. I just tried this myself using the web UI 1, and the output was within expected range (using OpenAI's tokenizer, 308 tokens; I didn't try feeding it through the model's tokenizer, though). Can you share any more information to help us diagnose the problem? /cc @replicate/models Footnotes |
hey @rlancemartin and @dankolesnikov. This is odd behavior, I can't reproduce it - when I run the model on Replicate it's always truncated when |
@daanelson thanks! now i understand i agree: this error is weird. to reproduce, try this:
first run - i get second run - i get third run - i get strange! |
@rlancemartin and @dankolesnikov, thanks for raising this! I investigated this morning and traced the issue to our API. We're tracking it internally now and I'll let you know as soon as it's resolved. |
@rlancemartin and @dankolesnikov, turns out the issue was specifically with the replicate python client. We have a fix in this branch and we'll have a release out soon! If you don't want to wait for the release, you can just:
|
Amazing! Will test it out :) |
The fix @joehoover mentioned was merged in replicate/replicate-python#106 and released in version 0.8.3. Please take a look and let us know if you're still seeing this behavior.
🙃 Really glad that we identified and (hopefully) addressed the problem. Thank you for opening this issue, @rlancemartin and thanks to @joehoover, @daanelson, and @evilstreak for their quick response. |
Late to the party. Nice work, y'all! Can we call this done? |
Yes let's close it out. I'll share full analysis soon, but from my initial inspection the issue looks fixed :) |
BTW, great results in terms of answer quality w/ this model! I'm using LangChain auto-evaluator to benchmark it. But, something is odd w/ latency: the first call to the model is quite slow > 100s, but follow-up calls are fast < 10s. Ticket here: I can reproduce this behavior, so it's not a one-off. Strange. @joehoover or others any ideas? |
Running the model:
We specify
max_length:500
, so500 tokens
.But, the output is
940 words
, or940 * 2.5 tokens
.The text was updated successfully, but these errors were encountered: