Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logprobs functionality #1111

Closed
wants to merge 11 commits into from
Closed

Add logprobs functionality #1111

wants to merge 11 commits into from

Conversation

arielge
Copy link
Member

@arielge arielge commented Aug 5, 2024

No description provided.

src/unitxt/processors.py Outdated Show resolved Hide resolved
src/unitxt/processors.py Outdated Show resolved Hide resolved
src/unitxt/processors.py Outdated Show resolved Hide resolved
src/unitxt/metrics.py Outdated Show resolved Hide resolved
src/unitxt/inference.py Outdated Show resolved Hide resolved
src/unitxt/processors.py Outdated Show resolved Hide resolved
src/unitxt/processors.py Outdated Show resolved Hide resolved
src/unitxt/processors.py Outdated Show resolved Hide resolved
Comment on lines 378 to 400
InferDictsToBinaryLogprobs(
binary_class_names=(
"No",
"Yes",
),
field="prediction",
process_every_value=False,
),
]
),
"processors.infer_logprobs_to_yes_no_probs",
)

add_to_catalog(
SequentialOperator(
steps=[
InferDictsToBinaryLogProbsLastToken(
binary_class_names=(
"No",
"Yes",
),
field="prediction",
process_every_value=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that we have a field called "options" assigned by many templates (such as the classification templates and multiple choice templates) with the options for completion based on the verbalization of the template. This is a safer way to do it because sometimes when you switch the template it's now "True" "False" and not "Yes" "No" and it's hard to pass it on. That's why so far all the verbalization reposibilty was in the template and externalizing verbalization to anywhere else in the pipeline can lead to issues. If you need help to integrate with the current mechanism of the "options" field you can either contact me or @perlitz who did an extensive use of it in the past specifically in the context of logprobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elronbandel Thanks, but I am not sure I follow how this post-processor is related to the template fields. We want this to be used on the inference engine predictions, across many templates for which this pattern would be applicable.
BTW, there are some other processors like this that have specific strings they expect, for example YesToOneElseZero, StanceToProCon

Copy link
Member

@yoavkatz yoavkatz Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree , it need not use a task "options" field.
A processor is tied to a template and not necessary to a task. (As was in the multiple choice work Yotam did).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "options" field (which is not a task field but an internal unitxt field) was our way to use it internally for getting log probs. It takes into account template nuances such as whether the model is expected to output the choice numeral (a. 1. etc) in multiple choice, and if we have target prefix. All I suggest is that you fully understand the current mechanism as it was developed very carefully.

Copy link
Member

@yoavkatz yoavkatz Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was not too familiar with the option mechanism (we should document it).

Looking at the code. It's not used in unitxt log prob inference but in the internal package inference.
"options" are added to task/data in the MultipleChoiceTemplate (if they are not already in the task).

In any case, it's a different scenario in the LLM as judge, where the user defines a template

"Is the following sentence correct? Answer 'with one word (Yes' or 'No')

'The sky is blue'
"

The model returns

"Yes it is."

And the post processor wants to return the probability of "Yes" vs "No" in the relevant token, (the first) so it converts the text into a float score.

Perhaps if we force the user of the LLM as Judge to use a MultipleChoiceTemplate, it would have the options fields - but do we want this for LLM as Judge? It seems cumbersome.

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a better existing solution (detailed in the comments) and should be handled differently IMHO based on the similar solutions for logprobs we did in the past.

Signed-off-by: Ariel Gera <[email protected]>
@yoavkatz
Copy link
Member

yoavkatz commented Aug 7, 2024

This PR has 4 changes:

  1. Added log inference to WML inference engine
  2. Add ability to load use space_id and not project_id credentials to WML.
  3. Add ability to calculate confidence interval for multiple scores
  4. A post processor to convert log probs dictionary to probabilities of specific class

@elronbandel - I think your concerns are only around (4).

@pawelknes - can you please review (1) + (2)? @elronbandel - do you review (3)?

Regarding (4) - I think @elronbandel thinks it is better addressed in an inference engine, but right now it's very localized changes.

@arielge - can you say how you are going to use the the new post processor?

@elronbandel
Copy link
Member

Regarding (4) The desired design here is described in this issue: #1128

The benefit of implementing it this way is that the algorithm @arielge is suggesting will be compatible with many tasks such as multiple choice tasks and classification tasks. And will work out of the box with all the related templates and tasks without requiring any modifications.

@arielge
Copy link
Member Author

arielge commented Aug 11, 2024

Thanks @yoavkatz @elronbandel. The intention was to use the post processor as part of a flow that includes an inference engine. So for example you can have a judge metric that is initialized with a specific template and task, processes the user inputs with the given template and sends to the inference engine. Then the inference engine outputs are fed to the post-processor to get the desired score.
@elronbandel I understand your desire to integrate this into the inference engine. This is also a possibility, although I assume different users may want slightly different things (infer for every class is one option, infer once and look at the top probabilities is another; and maybe binary and multi-class will not be treated the same). Currently what we are aiming for is flexibility to mix and match inference engines, templates and post-processors, without necessarily imposing our approach on the inference engine logic or requiring extra fields (as we are dealing specifically with the yes/no answer use case and not a general multi-class classification solution).

@arielge
Copy link
Member Author

arielge commented Oct 22, 2024

superseded by #1243 and #1205

@arielge arielge closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants