GSM8k tightening `submit_answer` tool signature as zero-shot perf of `submit_answer` is 0% pass #163

jamesbraza · 2024-12-20T20:15:27Z

As of aviary.gsm8k==0.14.0, GSM8k's submit_answer tool accepts a str answer. When directly calling this tool, an LLM will tend to answer with an entire sentence for the str. This leads to 0% zero-shot performance of the submit_answer tool.

The way HotPotQA handles this is by using normalize_answer (string cleaning) then eval_answer with "contains" (in check) grading criteria.

We need to either move:

GSM8k's submit_answer tool to accept a float answer
GSM8k's submit_answer tool to match HotPotQA with string cleaning then a contains check

The text was updated successfully, but these errors were encountered:

jamesbraza added the bug Something isn't working label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSM8k tightening `submit_answer` tool signature as zero-shot perf of `submit_answer` is 0% pass #163

GSM8k tightening `submit_answer` tool signature as zero-shot perf of `submit_answer` is 0% pass #163

jamesbraza commented Dec 20, 2024

GSM8k tightening submit_answer tool signature as zero-shot perf of submit_answer is 0% pass #163

GSM8k tightening submit_answer tool signature as zero-shot perf of submit_answer is 0% pass #163

Comments

jamesbraza commented Dec 20, 2024

GSM8k tightening `submit_answer` tool signature as zero-shot perf of `submit_answer` is 0% pass #163

GSM8k tightening `submit_answer` tool signature as zero-shot perf of `submit_answer` is 0% pass #163