Return log probabilities of the prompt #616

ChiaSap · 2024-09-19T15:43:56Z

Feature Request

What is the problem you're currently running into?
I am interested in obtaining log probabilities for the input tokens of my prompt, not only for the tokens that are produced by the models. Something that with Together can be achieved like this:

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
kwargs = {
    "engine": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "echo": True, # <---- this
    "logprobs": 1 # <---- and this
}
response = client.chat.completions.create(prompt="I love apples and oranges", **kwargs)

I think that OpenAI's openai.Completion.create() also supports similar behavior using the echo argument. In comparison, I am unable to achieve the same in the current llmengine implementation.

Why do you want this feature?
This feature would be extremely useful for me and many other users. Right now I can think of two main applications but I'm sure there are many more.

Multiple-choice selection: If we can get token probabilities of the input, we can test the models on multiple choice problems and force the model to choose among specified alternatives. This is useful in many evaluations. Theoretically this is already supported with the guided_choice option in llmengine.Completion.create(), but currently it does not work (I see there is already an open issue).
Evaluation with soft metrics: Sometimes it's useful to evaluate models not only with "hard" metrics like accuracy, F1 etc. but also with "soft" metrics based on log probabilities (e.g., when studying emergent abilities, see this post by Jason Wei).

Describe the solution you'd like
It would be really useful if the llmengine.Completion.create() method had an argument like echo that if set to true and if also return_token_log_probs is set to true, it would return the tokens in input and the log probabilities of all the tokens in the prompt.

Describe alternatives you've considered
An alternative would be for the guided_choice option in llmengine.Completion.create() to work, but I still think that returning log probabilities of the prompt offers more flexibility.

Additional context
Add any other context or screenshots about the feature request here.

Prioritization

Does this feature block you from using the project?
- Yes
- No
How many users will benefit from this feature?
- Just me
- Few people might benefit
- Many users will love it!
Complexity
- I believe it's a simple feature to implement
- It might require some effort to implement
- It's probably complex, and might take significant effort

Unfortunately I wouldn't be able to build the feature.

The text was updated successfully, but these errors were encountered:

yixu34 · 2024-09-19T18:36:28Z

Hi @ChiaSap , thanks for filing this issue! We're triaging.

yixu34 · 2024-09-24T00:37:10Z

Hi @ChiaSap , please see #589 - we do have this feature in the code, but we're sunsetting the free demo, which is out of date with what's in Github and what we run internally.

ChiaSap added the enhancement New feature or request label Sep 19, 2024

yixu34 assigned yunfeng-scale Sep 19, 2024

yixu34 closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return log probabilities of the prompt #616

Return log probabilities of the prompt #616

ChiaSap commented Sep 19, 2024

yixu34 commented Sep 19, 2024

yixu34 commented Sep 24, 2024

Return log probabilities of the prompt #616

Return log probabilities of the prompt #616

Comments

ChiaSap commented Sep 19, 2024

Feature Request

Prioritization

yixu34 commented Sep 19, 2024

yixu34 commented Sep 24, 2024