max_length parameter is not honored w/ Vicuna13-b #4

rlancemartin · 2023-05-17T17:18:36Z

Running the model:

output = replicate.run(
    "replicate/vicuna-13b:e6d469c2b11008bb0e446c3e9629232f9674581224536851272c54871f84076e",
    input={"prompt": "Which NFL team won the Super Bowl when Justin Bieber was born? Think step by step.", 
           "temperature":0.75,
           "max_length":500})

We specify max_length:500, so 500 tokens.

But, the output is 940 words, or 940 * 2.5 tokens.

The text was updated successfully, but these errors were encountered:

dankolesnikov · 2023-05-19T20:00:58Z

@bfirsh @mattt @zeke please help!

mattt · 2023-05-19T22:11:43Z

Hi, @rlancemartin. This sounds more like an issue for the Vicuna-13B model than the Python client library itself, so I'm going to transfer to that repo.

I just tried this myself using the web UI ¹, and the output was within expected range (using OpenAI's tokenizer, 308 tokens; I didn't try feeding it through the model's tokenizer, though).

Can you share any more information to help us diagnose the problem?

/cc @replicate/models

https://replicate.com/p/nm6teisk6rbifd3sokvlbefm5q ↩

daanelson · 2023-05-19T22:55:30Z

hey @rlancemartin and @dankolesnikov. This is odd behavior, I can't reproduce it - when I run the model on Replicate it's always truncated when prompt_tokens + generated_tokens = max_tokens. Do you have a prediction uuid for a prediction on Replicate where this occured that I can investigate?

rlancemartin · 2023-05-20T03:17:35Z

@daanelson thanks!

now i understand max_tokens = prompt_tokens + generated_tokens from here.

i agree: this error is weird.

to reproduce, try this:

import replicate
output = replicate.run(
    "replicate/vicuna-13b:e6d469c2b11008bb0e446c3e9629232f9674581224536851272c54871f84076e",
    input={"prompt": "Which NFL team won the Super Bowl when Justin Bieber was born? Think step by step.", 
           "temperature":0.75,
           "max_length":500})
for i in output:
    print(i)

first run - i get 86 word answer, which is < max_length as expected.

second run - i get 1450 word answer, which exceeds max_length.

third run - i get 969 word answer, which exceeds max_length.

strange!

joehoover · 2023-05-22T19:05:58Z

@rlancemartin and @dankolesnikov, thanks for raising this!

I investigated this morning and traced the issue to our API. We're tracking it internally now and I'll let you know as soon as it's resolved.

joehoover · 2023-05-22T20:08:42Z

@rlancemartin and @dankolesnikov, turns out the issue was specifically with the replicate python client.

We have a fix in this branch and we'll have a release out soon!

If you don't want to wait for the release, you can just:

pip install git+https://github.com/replicate/replicate-python.git@fix-iterator-output

rlancemartin · 2023-05-22T21:17:37Z

Amazing! Will test it out :)

mattt · 2023-05-23T09:29:50Z

The fix @joehoover mentioned was merged in replicate/replicate-python#106 and released in version 0.8.3.

Please take a look and let us know if you're still seeing this behavior.

This sounds more like an issue for the Vicuna-13B model than the Python client library itself

🙃

Really glad that we identified and (hopefully) addressed the problem. Thank you for opening this issue, @rlancemartin and thanks to @joehoover, @daanelson, and @evilstreak for their quick response.

zeke · 2023-05-26T03:15:52Z

Late to the party. Nice work, y'all! Can we call this done?

rlancemartin · 2023-05-26T03:38:36Z

Yes let's close it out. I'll share full analysis soon, but from my initial inspection the issue looks fixed :)

rlancemartin · 2023-05-31T03:50:36Z

BTW, great results in terms of answer quality w/ this model!

I'm using LangChain auto-evaluator to benchmark it.

But, something is odd w/ latency: the first call to the model is quite slow > 100s, but follow-up calls are fast < 10s.

Ticket here:
#7

I can reproduce this behavior, so it's not a one-off. Strange.

@joehoover or others any ideas?

Result here:

mattt transferred this issue from replicate/replicate-python May 19, 2023

rlancemartin mentioned this issue May 20, 2023

Tips for prompting Vicuna-13b #5

Closed

mattt closed this as completed May 26, 2023

rlancemartin mentioned this issue May 29, 2023

Add Vicuna 13b along w/ Transformer test set langchain-ai/auto-evaluator#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_length parameter is not honored w/ Vicuna13-b #4

max_length parameter is not honored w/ Vicuna13-b #4

rlancemartin commented May 17, 2023

dankolesnikov commented May 19, 2023

mattt commented May 19, 2023

daanelson commented May 19, 2023

rlancemartin commented May 20, 2023 •

edited

Loading

joehoover commented May 22, 2023

joehoover commented May 22, 2023 •

edited

Loading

rlancemartin commented May 22, 2023

mattt commented May 23, 2023 •

edited

Loading

zeke commented May 26, 2023

rlancemartin commented May 26, 2023

rlancemartin commented May 31, 2023 •

edited

Loading

max_length parameter is not honored w/ Vicuna13-b #4

max_length parameter is not honored w/ Vicuna13-b #4

Comments

rlancemartin commented May 17, 2023

dankolesnikov commented May 19, 2023

mattt commented May 19, 2023

Footnotes

daanelson commented May 19, 2023

rlancemartin commented May 20, 2023 • edited Loading

joehoover commented May 22, 2023

joehoover commented May 22, 2023 • edited Loading

rlancemartin commented May 22, 2023

mattt commented May 23, 2023 • edited Loading

zeke commented May 26, 2023

rlancemartin commented May 26, 2023

rlancemartin commented May 31, 2023 • edited Loading

rlancemartin commented May 20, 2023 •

edited

Loading

joehoover commented May 22, 2023 •

edited

Loading

mattt commented May 23, 2023 •

edited

Loading

rlancemartin commented May 31, 2023 •

edited

Loading