GSM8k tightening submit_answer
tool signature as zero-shot perf of submit_answer
is 0% pass
#163
Labels
bug
Something isn't working
As of
aviary.gsm8k==0.14.0
, GSM8k'ssubmit_answer
tool accepts astr
answer. When directly calling this tool, an LLM will tend to answer with an entire sentence for thestr
. This leads to 0% zero-shot performance of thesubmit_answer
tool.The way HotPotQA handles this is by using
normalize_answer
(string cleaning) theneval_answer
with "contains" (in
check) grading criteria.We need to either move:
submit_answer
tool to accept afloat
answersubmit_answer
tool to match HotPotQA with string cleaning then a contains checkThe text was updated successfully, but these errors were encountered: