Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented LLM Fair Eval example using llments #62

Closed
wants to merge 7 commits into from

Conversation

rohanmodi2810
Copy link
Collaborator

@rohanmodi2810 rohanmodi2810 requested a review from neubig September 5, 2024 15:47
@rohanmodi2810 rohanmodi2810 self-assigned this Sep 5, 2024
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rohanmodi2810 , I tried to run this but it wasn't working for me. I got to the third cell where it compares vicuna and chatgpt, and got the following error.

Did you have an idea what's going wrong?

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
...
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Error: 'NotFoundError' object is not subscriptable
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 40
     39 try:
---> 40     responses = APIBasedLM(eval_model).chat_generate(
     41         messages=[[{"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}] for user_prompt in user_prompts],
     42         temperature=1,
     43         max_new_tokens=512,
     44         num_return_sequences=num_sequences
     45     )
     46     return responses

File ~/miniconda3/envs/llments/lib/python3.11/site-packages/llments/lm/base/api.py:154, in APIBasedLM.chat_generate(self, messages, condition, do_sample, max_length, max_new_tokens, temperature, num_return_sequences)
    146 responses = batch_completion(
    147     model=self.model_name,
    148     temperature=temperature,
   (...)
    151     messages=messages,
    152 )
--> 154 return [
    155     [choice["message"]["content"] for choice in response["choices"]]
    156     for response in responses
    157 ]

File ~/miniconda3/envs/llments/lib/python3.11/site-packages/llments/lm/base/api.py:155, in <listcomp>(.0)
    146 responses = batch_completion(
    147     model=self.model_name,
    148     temperature=temperature,
   (...)
    151     messages=messages,
    152 )
    154 return [
--> 155     [choice["message"]["content"] for choice in response["choices"]]
    156     for response in responses
    157 ]

TypeError: 'NotFoundError' object is not subscriptable

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 5
      2 m2="vicuna-13b"
      3 eval_model="gpt-3.5-turbo-0301"
----> 5 get_results(m1, m2, eval_model)

Cell In[3], line 190
    186 output = f"review/review_{m1}_vs_{m2}_eval={eval_model}_mec={k}_bpc={bpc}.json"
    188 assert len(question_jsons) == len(answer1_jsons) == len(answer2_jsons)
--> 190 reviews = get_eval(question_jsons, answer1_jsons, answer2_jsons, eval_model, bpc, k)
    192 model1_vs_model2 = {
    193     'win': 0,
    194     'tie': 0,
    195     'loss': 0
    196 }
    198 with open(f"{output}", "w") as output_review_file:

Cell In[3], line 81
     78         user_prompt_bpc = gen_prompt(ques, ans2, ans1)
     79         user_prompts_bpc.append(user_prompt_bpc)
---> 81 responses = query_gpt(system_prompt, user_prompts, eval_model, k)
     83 if bpc == 1:
     84     responses_bpc = query_gpt(system_prompt, user_prompts_bpc, eval_model, k)

Cell In[3], line 49
     47 except Exception as e:
     48     print(f'Error: {e}')
---> 49     raise RuntimeError(f"Failed during query processing.")

RuntimeError: Failed during query processing.

zaidsheikh and others added 3 commits September 26, 2024 12:36
* added base_url

* Updated function descriptions

* Added api_base to the constructor

* matched structure with lm class

Pull latest changes
@rohanmodi2810 rohanmodi2810 deleted the rohan/fair-eval branch November 4, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants