Add logprobs functionality #1111

arielge · 2024-08-05T10:50:51Z

No description provided.

Signed-off-by: lilacheden <[email protected]>

Signed-off-by: Ariel Gera <[email protected]>

src/unitxt/processors.py

src/unitxt/metrics.py

src/unitxt/inference.py

src/unitxt/processors.py

elronbandel · 2024-08-05T17:09:24Z

prepare/processors/processors.py

+            InferDictsToBinaryLogprobs(
+                binary_class_names=(
+                    "No",
+                    "Yes",
+                ),
+                field="prediction",
+                process_every_value=False,
+            ),
+        ]
+    ),
+    "processors.infer_logprobs_to_yes_no_probs",
+)
+
+add_to_catalog(
+    SequentialOperator(
+        steps=[
+            InferDictsToBinaryLogProbsLastToken(
+                binary_class_names=(
+                    "No",
+                    "Yes",
+                ),
+                field="prediction",
+                process_every_value=False,


Notice that we have a field called "options" assigned by many templates (such as the classification templates and multiple choice templates) with the options for completion based on the verbalization of the template. This is a safer way to do it because sometimes when you switch the template it's now "True" "False" and not "Yes" "No" and it's hard to pass it on. That's why so far all the verbalization reposibilty was in the template and externalizing verbalization to anywhere else in the pipeline can lead to issues. If you need help to integrate with the current mechanism of the "options" field you can either contact me or @perlitz who did an extensive use of it in the past specifically in the context of logprobs.

@elronbandel Thanks, but I am not sure I follow how this post-processor is related to the template fields. We want this to be used on the inference engine predictions, across many templates for which this pattern would be applicable.
BTW, there are some other processors like this that have specific strings they expect, for example YesToOneElseZero, StanceToProCon

I agree , it need not use a task "options" field.
A processor is tied to a template and not necessary to a task. (As was in the multiple choice work Yotam did).

The "options" field (which is not a task field but an internal unitxt field) was our way to use it internally for getting log probs. It takes into account template nuances such as whether the model is expected to output the choice numeral (a. 1. etc) in multiple choice, and if we have target prefix. All I suggest is that you fully understand the current mechanism as it was developed very carefully.

Yes. I was not too familiar with the option mechanism (we should document it).

Looking at the code. It's not used in unitxt log prob inference but in the internal package inference.
"options" are added to task/data in the MultipleChoiceTemplate (if they are not already in the task).

In any case, it's a different scenario in the LLM as judge, where the user defines a template

"Is the following sentence correct? Answer 'with one word (Yes' or 'No')

'The sky is blue'
"

The model returns

"Yes it is."

And the post processor wants to return the probability of "Yes" vs "No" in the relevant token, (the first) so it converts the text into a float score.

Perhaps if we force the user of the LLM as Judge to use a MultipleChoiceTemplate, it would have the options fields - but do we want this for LLM as Judge? It seems cumbersome.

elronbandel

This PR has a better existing solution (detailed in the comments) and should be handled differently IMHO based on the similar solutions for logprobs we did in the past.

Signed-off-by: Ariel Gera <[email protected]>

src/unitxt/processors.py

Signed-off-by: Ariel Gera <[email protected]>

yoavkatz · 2024-08-07T16:27:33Z

This PR has 4 changes:

Added log inference to WML inference engine
Add ability to load use space_id and not project_id credentials to WML.
Add ability to calculate confidence interval for multiple scores
A post processor to convert log probs dictionary to probabilities of specific class

@elronbandel - I think your concerns are only around (4).

@pawelknes - can you please review (1) + (2)? @elronbandel - do you review (3)?

Regarding (4) - I think @elronbandel thinks it is better addressed in an inference engine, but right now it's very localized changes.

@arielge - can you say how you are going to use the the new post processor?

src/unitxt/inference.py

Signed-off-by: lilacheden <[email protected]>

elronbandel · 2024-08-09T10:30:45Z

Regarding (4) The desired design here is described in this issue: #1128

The benefit of implementing it this way is that the algorithm @arielge is suggesting will be compatible with many tasks such as multiple choice tasks and classification tasks. And will work out of the box with all the related templates and tasks without requiring any modifications.

arielge · 2024-08-11T11:15:58Z

Thanks @yoavkatz @elronbandel. The intention was to use the post processor as part of a flow that includes an inference engine. So for example you can have a judge metric that is initialized with a specific template and task, processes the user inputs with the given template and sends to the inference engine. Then the inference engine outputs are fed to the post-processor to get the desired score.
@elronbandel I understand your desire to integrate this into the inference engine. This is also a possibility, although I assume different users may want slightly different things (infer for every class is one option, infer once and look at the top probabilities is another; and maybe binary and multi-class will not be treated the same). Currently what we are aiming for is flexibility to mix and match inference engines, templates and post-processors, without necessarily imposing our approach on the inference engine logic or requiring extra fields (as we are dealing specifically with the yes/no answer use case and not a general multi-class classification solution).

arielge · 2024-10-22T12:22:23Z

superseded by #1243 and #1205

lilacheden and others added 5 commits August 4, 2024 15:36

apply ci_scores for global metric

25216e1

Signed-off-by: lilacheden <[email protected]>

add ci_scores

5492329

Signed-off-by: lilacheden <[email protected]>

add logprobs to WML inference engine

085b358

Signed-off-by: Ariel Gera <[email protected]>

processors for inference engine logprob results

4ccafc9

Signed-off-by: Ariel Gera <[email protected]>

Merge branch 'main' into eval_metrics3

fd64f62