Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSM8k tightening submit_answer tool signature as zero-shot perf of submit_answer is 0% pass #163

Open
jamesbraza opened this issue Dec 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jamesbraza
Copy link
Collaborator

As of aviary.gsm8k==0.14.0, GSM8k's submit_answer tool accepts a str answer. When directly calling this tool, an LLM will tend to answer with an entire sentence for the str. This leads to 0% zero-shot performance of the submit_answer tool.

The way HotPotQA handles this is by using normalize_answer (string cleaning) then eval_answer with "contains" (in check) grading criteria.

We need to either move:

  • GSM8k's submit_answer tool to accept a float answer
  • GSM8k's submit_answer tool to match HotPotQA with string cleaning then a contains check
@jamesbraza jamesbraza added the bug Something isn't working label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant