-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add completeness judge #1410
Open
pvn25
wants to merge
5
commits into
main
Choose a base branch
from
add_completeness_judge
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add completeness judge #1410
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
67b4fad
add completeness judge
pvn25 4bd3c03
Run prepare files and dded msssing json files to catalog
yoavkatz 37b4825
Merge branch 'main' into add_completeness_judge
yoavkatz 4a32ad3
fix formatting issue
pvn25 76e7502
update inference enginer to cross provider
pvn25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
from unitxt import get_logger | ||
from unitxt.api import evaluate, load_dataset | ||
from unitxt.blocks import Task, TaskCard | ||
from unitxt.loaders import LoadFromDictionary | ||
from unitxt.templates import InputOutputTemplate, TemplatesDict | ||
from unitxt.text_utils import print_dict | ||
|
||
logger = get_logger() | ||
|
||
data = { | ||
"test": [ | ||
{ | ||
"conversation": "user: is eugene from walking dead really a scientist\nagent: No, Eugene from The Walking Dead is not a real scientist. He lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. In reality, he is a high school science teacher who does not know how to cure the virus. However, he is shown to be highly intelligent and resourceful, able to do things such as repairing the solar power system in Alexandria and making bullets for the Saviors.\nuser: Why did he lied about himself being a scientist?\nagent: Eugene lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. He initially used this lie to gain the protection of Abraham Ford. He assists the group with technical tasks even after the truth is revealed, which also improves his own confidence. His lie caused tension within the group, particularly between him and Abraham, but eventually Abraham forgave him and they resumed their friendship.\nuser: Is The Walking Dead currently shown on TV? Which station?", | ||
"document": "The Walking Dead (season 6)\nOn October 7 , 2014 , AMC renewed The Walking Dead for a sixth season . Scott M. Gimple said that the sixth season would continue to remix material from the comic and explained that there would be a flashback backstory to some of the characters :\n\nThe Walking Dead (season 6)\nThe sixth season of The Walking Dead , an American post-apocalyptic horror television series on AMC , premiered on October 11 , 2015 , and concluded on April 3 , 2016 , consisting of 16 episodes . Developed for television by Frank Darabont , the series is based on the eponymous series of comic books by Robert Kirkman , Tony Moore , and Charlie Adlard . The executive producers are Kirkman , David Alpert , Scott M. Gimple , Greg Nicotero , Tom Luse , and Gale Anne Hurd , with Gimple as showrunner for the third consecutive season . The sixth season received positive reviews by critics . It was nominated for multiple awards and won three , including Best Horror Television Series , at the 42nd Saturn Awards .", | ||
}, | ||
{ | ||
"conversation": "user: is eugene from walking dead really a scientist\nagent: No, Eugene from The Walking Dead is not a real scientist. He lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. In reality, he is a high school science teacher who does not know how to cure the virus. However, he is shown to be highly intelligent and resourceful, able to do things such as repairing the solar power system in Alexandria and making bullets for the Saviors.\nuser: Why did he lied about himself being a scientist?\nagent: Eugene lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. He initially used this lie to gain the protection of Abraham Ford. He assists the group with technical tasks even after the truth is revealed, which also improves his own confidence. His lie caused tension within the group, particularly between him and Abraham, but eventually Abraham forgave him and they resumed their friendship.\nuser: Is The Walking Dead currently shown on TV? Which station?", | ||
"document": "The Walking Dead (season 6)\nOn October 7 , 2014 , AMC renewed The Walking Dead for a sixth season . Scott M. Gimple said that the sixth season would continue to remix material from the comic and explained that there would be a flashback backstory to some of the characters :\n\nThe Walking Dead (season 6)\nThe sixth season of The Walking Dead , an American post-apocalyptic horror television series on AMC , premiered on October 11 , 2015 , and concluded on April 3 , 2016 , consisting of 16 episodes . Developed for television by Frank Darabont , the series is based on the eponymous series of comic books by Robert Kirkman , Tony Moore , and Charlie Adlard . The executive producers are Kirkman , David Alpert , Scott M. Gimple , Greg Nicotero , Tom Luse , and Gale Anne Hurd , with Gimple as showrunner for the third consecutive season . The sixth season received positive reviews by critics . It was nominated for multiple awards and won three , including Best Horror Television Series , at the 42nd Saturn Awards .", | ||
}, | ||
{ | ||
"conversation": "user: is eugene from walking dead really a scientist\nagent: No, Eugene from The Walking Dead is not a real scientist. He lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. In reality, he is a high school science teacher who does not know how to cure the virus. However, he is shown to be highly intelligent and resourceful, able to do things such as repairing the solar power system in Alexandria and making bullets for the Saviors.\nuser: Why did he lied about himself being a scientist?", | ||
"document": "Eugene Porter\nDr. Eugene Porter is a fictional character from the comic book series The Walking Dead and is portrayed by Josh McDermitt beginning in the fourth season of the American television series of the same name . In both mediums he claims to be a scientist who knows the cure to the zombie plague and is being escorted to Washington D.C. by Sgt . Abraham Ford and Rosita Espinosa , and encounter Rick Grimes and his group and recruit them to assist their mission . Eugene is overweight , possesses virtually no survival skills of his own and is extremely dependent on the group , but is highly intelligent and resourceful in using technology to ensure the group 's survival . Eventually it is revealed that Eugene is not a scientist , but a high school science teacher , and that he does n't know how to cure the virus but lied to manipulate the other survivors into taking him to Washington D.C. believing it to be the best chance for survival . This proves true as the group eventually finds the Alexandria Safe - Zone where Eugene becomes its primary engineer . Though his lie puts a strain on their friendship , Abraham eventually forgives him and they resume being friends .\n\nList of The Walking Dead (TV series) characters\nDr. Eugene Porter , portrayed by Josh McDermitt , is a survivor of the apocalypse and a member of Abraham Ford 's survivor group and later a minor antagonist after he unwillingly begins working for the Saviors . He comes across as timid and socially awkward . Initially when encountered by Rick 's group , he appears to be a scientist that knows of a cure to the outbreak located in Washington , D.C. , with Abraham and Rosita assisting him . Later , Eugene is forced to reveal that he lied to Abraham to gain his protection back when they first met , leading Abraham to punch him unconscious and creating a rift in their friendship . When the group makes it to Alexandria , Eugene assists the community with his technical knowledge to repair their solar power system . This helps to boost his confidence to learn how to defend himself , reconcile his friendship with Abraham , and help the community find a way to make more bullets . When Negan discovers that Eugene makes bullets , Eugene is taken to work for the Saviors . Although Eugene accepts the offer to become one of the Saviors , he still helps Sasha kill herself but lies about it to Negan . He is an important player for the Saviors in the All Out War arc , but is tormented with conflicting loyalties , especially after he discovers that Dwight is playing both sides .", | ||
}, | ||
] | ||
} | ||
|
||
predictions = [ | ||
"No, The Walking Dead is not currently shown on TV. The final episode of the series aired on April 17, 2022, on AMC. The show has completed its 11-season run. However, if you're interested in watching previous seasons, they are available to stream on various platforms such as AMC+, Hulu, and Peacock. Additionally, you can find The Walking Dead merchandise and spin-off series on these platforms and other marketplaces.", | ||
"Yes, The Walking Dead is currently being shown on TV. It airs on AMC, an American basic cable network. AMV airs new episodes of The Walking Dead as well as reruns of previous seasons.", | ||
"Eugene lied about being a scientist to manipulate the other survivors into taking him to Washington D.C., believing it to be the best chance for survival. He initially used this lie to gain the protection of Abraham Ford. He assists the group with technical tasks even after the truth is revealed, which also improves his own confidence. His lie caused tension within the group, particularly between him and Abraham, but eventually Abraham forgave him and they resumed their friendship.", | ||
] | ||
|
||
|
||
card = TaskCard( | ||
loader=LoadFromDictionary(data=data), | ||
task=Task( | ||
input_fields={"conversation": "str", "document": "str"}, | ||
reference_fields={}, | ||
prediction_type="str", | ||
metrics=[ | ||
"metrics.llm_as_judge.conversation_answer_completeness.llama3_1_v1_ibmgenai_judges" | ||
], | ||
), | ||
templates=TemplatesDict( | ||
{ | ||
"simple": InputOutputTemplate( | ||
input_format="{conversation}\n\n\nDocument:\n{document}", | ||
output_format="", | ||
) | ||
} | ||
), | ||
) | ||
|
||
test_dataset = load_dataset(card=card, template_card_index="simple")["test"] | ||
evaluated_dataset = evaluate(predictions=predictions, data=test_dataset) | ||
|
||
for instance in evaluated_dataset: | ||
print_dict( | ||
instance, | ||
keys_to_print=[ | ||
"source", | ||
"prediction", | ||
"processed_prediction", | ||
"references", | ||
"score", | ||
], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
from unitxt import add_to_catalog | ||
from unitxt.inference import ( | ||
GenericInferenceEngine, | ||
IbmGenAiInferenceEngine, | ||
IbmGenAiInferenceEngineParams, | ||
) | ||
from unitxt.llm_as_judge import LLMAsJudge | ||
|
||
template_name = "templates.response_assessment.judges.completeness.v5" | ||
|
||
inference_models = { | ||
"llama3_1_v1_ibmgenai": { | ||
"model_name": "llama3-1-70binstruct", | ||
"inference_model": IbmGenAiInferenceEngine( | ||
model_name="meta-llama/llama-3-1-70b-instruct", | ||
parameters=IbmGenAiInferenceEngineParams(max_new_tokens=256), | ||
), | ||
}, | ||
"generic_inference_engine": { | ||
"model_name": "generic", | ||
"inference_model": (GenericInferenceEngine()), | ||
}, | ||
} | ||
|
||
for label, inference_model in inference_models.items(): | ||
model_label = inference_model["model_name"] | ||
template_label = template_name.split(".")[-1] | ||
metric_label = ( | ||
"metrics.llm_as_judge.rating." + model_label + "_template_" + template_label | ||
) | ||
|
||
cur_metric = LLMAsJudge( | ||
inference_model=inference_model["inference_model"], | ||
template=template_name, | ||
task="rating.single_turn", | ||
main_score=metric_label, | ||
prediction_type="str", | ||
) | ||
|
||
add_to_catalog( | ||
cur_metric, | ||
f"metrics.llm_as_judge.conversation_answer_completeness.{label}_judges", | ||
overwrite=True, | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now a simpler way to support multiple inference engines in the judge using one consistent API, and that's with CrossProviderInferenceEngine.
https://www.unitxt.ai/en/latest/docs/inference.html#creating-a-cross-api-engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses the standard OpenAI params names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I updated the engine to CrossProvider. Pls check now.