different score ranges are confusing #119

Muennighoff · 2022-08-16T06:48:34Z

it's confusing bleu scores are 0-100 & rouge 0-1 in this repo; I think either all scores should 0-100 or 0-1, probably the former

jon-tow · 2022-08-18T20:00:29Z

I agree that standardizing on the $[0, 100]$ range is ideal for the readability of these scores. The difference here is that the underlying sacreblue package scales BLEU/TER/chrF scores by $100$. These are the only metrics in the harness that are scaled (accuracy, ROUGE, SARI, etc. are not). So, to make everything consistent for now, we can re-scale BLEU back to its "natural" units in $[0, 1]$ and follow up with an optional per-metric "results-formatter". What do you think?

StellaAthena · 2022-09-22T20:43:10Z

I think this suggestion makes a lot of sense. Additionally, it would be nice to have the option to get rounded answers, e.g., 17.7%.

StellaAthena added the enhancement New feature or request label Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different score ranges are confusing #119

different score ranges are confusing #119

Muennighoff commented Aug 16, 2022

jon-tow commented Aug 18, 2022 •

edited

Loading

StellaAthena commented Sep 22, 2022

different score ranges are confusing #119

different score ranges are confusing #119

Comments

Muennighoff commented Aug 16, 2022

jon-tow commented Aug 18, 2022 • edited Loading

StellaAthena commented Sep 22, 2022

jon-tow commented Aug 18, 2022 •

edited

Loading