You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For TER, ChrF, BLEU I use sacrebleu, which works well. For the other metrics, I use evaluate. This works well as well were it not that the support of the library has (understandably) been lacking the last months due to different priorities at hugging face with the massive increase in interest in generative models.
For MATEO this is becoming problematic, though. As an example: evaluate still uses COMET v1.1.3. A PR was submitted to fix this (huggingface/evaluate#443) but it has not been committed. The last version release of evaluate dates from December 13, 2022, which indicates that for quick updates or new metrics, it is not reliable for our use-case at the moment.
Moving away from evaluate gives more freedom but also means that more manual implementation work needs to be done. However, using evaluate still means that it is relatively easy to use/implement any metric that is implemented in evaluate.
This is a low priority issue and serves as a reminder to myself and/or to start a discussion on the pros/cons.
The text was updated successfully, but these errors were encountered:
For TER, ChrF, BLEU I use sacrebleu, which works well. For the other metrics, I use
evaluate
. This works well as well were it not that the support of the library has (understandably) been lacking the last months due to different priorities at hugging face with the massive increase in interest in generative models.For MATEO this is becoming problematic, though. As an example:
evaluate
still uses COMET v1.1.3. A PR was submitted to fix this (huggingface/evaluate#443) but it has not been committed. The last version release of evaluate dates from December 13, 2022, which indicates that for quick updates or new metrics, it is not reliable for our use-case at the moment.Moving away from evaluate gives more freedom but also means that more manual implementation work needs to be done. However, using evaluate still means that it is relatively easy to use/implement any metric that is implemented in evaluate.
This is a low priority issue and serves as a reminder to myself and/or to start a discussion on the pros/cons.
The text was updated successfully, but these errors were encountered: