-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature specs discussion board for umBRELA #1
Comments
@UShivani3 can you give some demo usage here so @lintool is aware of the exacts of the framework so far? Some snippets. |
Yes, my bad! Here is the snippet Setting up the model judge:from umbrela.vicuna_judge import VicunaJudge
judge_vicuna = VicunaJudge("dl19-passage") Passing qrel-passages for evaluations:input_dict = {
"query": {"text": "how long is life cycle of flea", "qid": "264014"},
"candidates": [
{
"doc": {
"segment": "The life cycle of a flea can last anywhere from 20 days to an entire year. It depends on how long the flea remains in the dormant stage (eggs, larvae, pupa). Outside influences, such as weather, affect the flea cycle. A female flea can lay around 20 to 25 eggs in one day."
},
"docid": "4834547",
"score": 14.971799850463867,
},
]
}
judgments = judge_vicuna.judge(input_dict) Output format for each judgment:judgment = {
"model": model_name,
"query": query,
"passage": passage,
"prompt": prompt,
"prediction": model_response,
"judgment": relevance_label_after_parsing_model_response,
} I have also added a sample code using |
@thakur-nandan can you give your thoughts on the design so far too? |
Sure, thanks @UShivani3, overall I like the minimalistic code and easy-to-use repository design. Both prompts look good. The installation instructions in the README are helpful. One suggestion I have is to decouple the prompt with LLM judge code, This will in the future complicate as one would need to keep on updating the base LLMJudge shown below with newer prompts as shown below: umbrela/src/umbrela/llm_judge.py Line 21 in 05ae426
How I think we can restructure the design:
@ronakice @UShivani3 would be happy to take your suggestions. |
One more question: @UShivani3 what does the Does it affect the LLMJudge response? |
I am starting this thread for feature spec discussion for umBRELA @lintool @ronakice.
Suggestions from my side:
The text was updated successfully, but these errors were encountered: