Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_length parameter is not honored w/ Vicuna13-b #4

Closed
rlancemartin opened this issue May 17, 2023 · 11 comments
Closed

max_length parameter is not honored w/ Vicuna13-b #4

rlancemartin opened this issue May 17, 2023 · 11 comments

Comments

@rlancemartin
Copy link

Running the model:

output = replicate.run(
    "replicate/vicuna-13b:e6d469c2b11008bb0e446c3e9629232f9674581224536851272c54871f84076e",
    input={"prompt": "Which NFL team won the Super Bowl when Justin Bieber was born? Think step by step.", 
           "temperature":0.75,
           "max_length":500})

We specify max_length:500, so 500 tokens.

But, the output is 940 words, or 940 * 2.5 tokens.

@dankolesnikov
Copy link

@bfirsh @mattt @zeke please help!

@mattt
Copy link

mattt commented May 19, 2023

Hi, @rlancemartin. This sounds more like an issue for the Vicuna-13B model than the Python client library itself, so I'm going to transfer to that repo.

I just tried this myself using the web UI 1, and the output was within expected range (using OpenAI's tokenizer, 308 tokens; I didn't try feeding it through the model's tokenizer, though).

Can you share any more information to help us diagnose the problem?

/cc @replicate/models

Footnotes

  1. https://replicate.com/p/nm6teisk6rbifd3sokvlbefm5q

@mattt mattt transferred this issue from replicate/replicate-python May 19, 2023
@daanelson
Copy link

hey @rlancemartin and @dankolesnikov. This is odd behavior, I can't reproduce it - when I run the model on Replicate it's always truncated when prompt_tokens + generated_tokens = max_tokens. Do you have a prediction uuid for a prediction on Replicate where this occured that I can investigate?

@rlancemartin
Copy link
Author

rlancemartin commented May 20, 2023

@daanelson thanks!

now i understand max_tokens = prompt_tokens + generated_tokens from here.

i agree: this error is weird.

to reproduce, try this:

import replicate
output = replicate.run(
    "replicate/vicuna-13b:e6d469c2b11008bb0e446c3e9629232f9674581224536851272c54871f84076e",
    input={"prompt": "Which NFL team won the Super Bowl when Justin Bieber was born? Think step by step.", 
           "temperature":0.75,
           "max_length":500})
for i in output:
    print(i)

first run - i get 86 word answer, which is < max_length as expected.

second run - i get 1450 word answer, which exceeds max_length.

third run - i get 969 word answer, which exceeds max_length.

strange!

@joehoover
Copy link

@rlancemartin and @dankolesnikov, thanks for raising this!

I investigated this morning and traced the issue to our API. We're tracking it internally now and I'll let you know as soon as it's resolved.

@joehoover
Copy link

joehoover commented May 22, 2023

@rlancemartin and @dankolesnikov, turns out the issue was specifically with the replicate python client.

We have a fix in this branch and we'll have a release out soon!

If you don't want to wait for the release, you can just:

pip install git+https://github.com/replicate/replicate-python.git@fix-iterator-output

@rlancemartin
Copy link
Author

Amazing! Will test it out :)

@mattt
Copy link

mattt commented May 23, 2023

The fix @joehoover mentioned was merged in replicate/replicate-python#106 and released in version 0.8.3.

Please take a look and let us know if you're still seeing this behavior.

This sounds more like an issue for the Vicuna-13B model than the Python client library itself

🙃

Really glad that we identified and (hopefully) addressed the problem. Thank you for opening this issue, @rlancemartin and thanks to @joehoover, @daanelson, and @evilstreak for their quick response.

@zeke
Copy link
Member

zeke commented May 26, 2023

Late to the party. Nice work, y'all! Can we call this done?

@rlancemartin
Copy link
Author

Yes let's close it out. I'll share full analysis soon, but from my initial inspection the issue looks fixed :)

@rlancemartin
Copy link
Author

rlancemartin commented May 31, 2023

BTW, great results in terms of answer quality w/ this model!

I'm using LangChain auto-evaluator to benchmark it.

But, something is odd w/ latency: the first call to the model is quite slow > 100s, but follow-up calls are fast < 10s.

Ticket here:
#7

I can reproduce this behavior, so it's not a one-off. Strange.

@joehoover or others any ideas?

Result here:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants