-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes in LLMJudge #1498
Fixes in LLMJudge #1498
Conversation
lilacheden
commented
Jan 12, 2025
- allow changing main_score and score_prefix (and change default from "score" which doesn't support score_prefix)
- Support embedded task data fields in LLMJudge template
Signed-off-by: lilacheden <[email protected]>
Signed-off-by: lilacheden <[email protected]>
Signed-off-by: lilacheden <[email protected]>
src/unitxt/llm_as_judge.py
Outdated
@@ -149,7 +150,7 @@ def get_contexts(self, task_data: List[Dict[str, Any]]) -> List[Dict[str, str]]: | |||
return [ | |||
get_parsed_context( | |||
{ | |||
context_field: td[context_field] | |||
context_field.split("/")[-1]: dict_get(td, context_field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be able to add nested fields. For examples, a user asked to use only the instruction from the original template without the full source, this way we can send metadata/template/instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elronbandel is it an acceptable way to do it, that is fully supported, and won't be prone to changes in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway you need to add a documentation to this @lilacheden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OfirArviv @elronbandel -
If you want it even cleaner we can support a dict of name:possibly-nested field
as the context_fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think @martinscooper highlighted the need to be able to rename context fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this allow for renaming context field then? If so, do the keys correspond to the final context names used in the prompts and the values correspond to the task_data key names?
I think this is useful to adapt the task data to the criteria. For example, the squad
dataset uses the term context but the coherence criteria description uses original text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinscooper - yes, this way you can send a dictionary where the key is the name of the field in the prompt and the value is the field name (or path) in the task data.
e.g. {"instructions":"metadata/template/instruction"}
- the prompt will mention context instructions
If a list is sent then the behavior is as before - each item is both the key and the value:
["question"] -> {"question":"question"}
FYI: @elronbandel, @yoavkatz and I have been discussing and working on improving the reported scores for both direct and pairwise evaluators. The changes are included in this PR. You can look at this and this commits for direct score changes, and this other one for pairwise score changes. In summary:
|
Yes, I think we should not change the score names in this PR, and wait for #1467 for the changes. |
Signed-off-by: lilacheden <[email protected]>
This reverts commit c94bfc5.
@yoavkatz - reverted this main score change, can you approve the remaining change? |
src/unitxt/llm_as_judge.py
Outdated
@@ -725,6 +730,9 @@ def get_instance_results( | |||
|
|||
winrates = [r["winrate"] for r in per_response_results.values()] | |||
all_results["score"] = max(range(len(winrates)), key=winrates.__getitem__) | |||
all_results[self.main_score] = max( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This reverts commit 9754a63.