Implemented LLM Fair Eval example using llments #62

rohanmodi2810 · 2024-09-05T15:47:03Z

Description

References

neubig

Hey @rohanmodi2810 , I tried to run this but it wasn't working for me. I got to the third cell where it compares vicuna and chatgpt, and got the following error.

Did you have an idea what's going wrong?

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
...
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Error: 'NotFoundError' object is not subscriptable
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 40
     39 try:
---> 40     responses = APIBasedLM(eval_model).chat_generate(
     41         messages=[[{"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}] for user_prompt in user_prompts],
     42         temperature=1,
     43         max_new_tokens=512,
     44         num_return_sequences=num_sequences
     45     )
     46     return responses

File ~/miniconda3/envs/llments/lib/python3.11/site-packages/llments/lm/base/api.py:154, in APIBasedLM.chat_generate(self, messages, condition, do_sample, max_length, max_new_tokens, temperature, num_return_sequences)
    146 responses = batch_completion(
    147     model=self.model_name,
    148     temperature=temperature,
   (...)
    151     messages=messages,
    152 )
--> 154 return [
    155     [choice["message"]["content"] for choice in response["choices"]]
    156     for response in responses
    157 ]

File ~/miniconda3/envs/llments/lib/python3.11/site-packages/llments/lm/base/api.py:155, in <listcomp>(.0)
    146 responses = batch_completion(
    147     model=self.model_name,
    148     temperature=temperature,
   (...)
    151     messages=messages,
    152 )
    154 return [
--> 155     [choice["message"]["content"] for choice in response["choices"]]
    156     for response in responses
    157 ]

TypeError: 'NotFoundError' object is not subscriptable

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 5
      2 m2="vicuna-13b"
      3 eval_model="gpt-3.5-turbo-0301"
----> 5 get_results(m1, m2, eval_model)

Cell In[3], line 190
    186 output = f"review/review_{m1}_vs_{m2}_eval={eval_model}_mec={k}_bpc={bpc}.json"
    188 assert len(question_jsons) == len(answer1_jsons) == len(answer2_jsons)
--> 190 reviews = get_eval(question_jsons, answer1_jsons, answer2_jsons, eval_model, bpc, k)
    192 model1_vs_model2 = {
    193     'win': 0,
    194     'tie': 0,
    195     'loss': 0
    196 }
    198 with open(f"{output}", "w") as output_review_file:

Cell In[3], line 81
     78         user_prompt_bpc = gen_prompt(ques, ans2, ans1)
     79         user_prompts_bpc.append(user_prompt_bpc)
---> 81 responses = query_gpt(system_prompt, user_prompts, eval_model, k)
     83 if bpc == 1:
     84     responses_bpc = query_gpt(system_prompt, user_prompts_bpc, eval_model, k)

Cell In[3], line 49
     47 except Exception as e:
     48     print(f'Error: {e}')
---> 49     raise RuntimeError(f"Failed during query processing.")

RuntimeError: Failed during query processing.

* added base_url * Updated function descriptions * Added api_base to the constructor * matched structure with lm class Pull latest changes

Implemented LLM Fair Eval example using llments

6c89d12

rohanmodi2810 requested a review from neubig September 5, 2024 15:47

rohanmodi2810 self-assigned this Sep 5, 2024

rohanmodi2810 added 3 commits September 5, 2024 12:22

Fix ruff errors

1115ec5

Fix ruff errors

b720c53

Fix ruff errors

f7daba4

neubig reviewed Sep 12, 2024

View reviewed changes

zaidsheikh and others added 3 commits September 26, 2024 12:36

ignore ipynb docstring errors (#66)

5766624

feat: arguments added for LORA (#67)

2e4782b

Added API Proxy functionality to API LM (#68)

bac169f

* added base_url * Updated function descriptions * Added api_base to the constructor * matched structure with lm class Pull latest changes

rohanmodi2810 closed this Sep 27, 2024

rohanmodi2810 deleted the rohan/fair-eval branch November 4, 2024 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented LLM Fair Eval example using llments #62

Implemented LLM Fair Eval example using llments #62

rohanmodi2810 commented Sep 5, 2024

neubig left a comment

Implemented LLM Fair Eval example using llments #62

Implemented LLM Fair Eval example using llments #62

Conversation

rohanmodi2810 commented Sep 5, 2024

Description

References

neubig left a comment

Choose a reason for hiding this comment