Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different score ranges are confusing #119

Open
Muennighoff opened this issue Aug 16, 2022 · 2 comments
Open

different score ranges are confusing #119

Muennighoff opened this issue Aug 16, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@Muennighoff
Copy link

it's confusing bleu scores are 0-100 & rouge 0-1 in this repo; I think either all scores should 0-100 or 0-1, probably the former

@jon-tow
Copy link
Collaborator

jon-tow commented Aug 18, 2022

I agree that standardizing on the $[0, 100]$ range is ideal for the readability of these scores. The difference here is that the underlying sacreblue package scales BLEU/TER/chrF scores by $100$. These are the only metrics in the harness that are scaled (accuracy, ROUGE, SARI, etc. are not). So, to make everything consistent for now, we can re-scale BLEU back to its "natural" units in $[0, 1]$ and follow up with an optional per-metric "results-formatter". What do you think?

@StellaAthena
Copy link
Collaborator

I think this suggestion makes a lot of sense. Additionally, it would be nice to have the option to get rounded answers, e.g., 17.7%.

@StellaAthena StellaAthena added the enhancement New feature or request label Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants