You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using COMET 2.2.1 (and Python 3.9) and the wmt22-comet-da model I get a score of 0.7982, but using COMET 1.1.2 (and Python 3.7) I get a score 0.8618. This is on exactly the same source, target and reference file.
I appreciate that 1.1.2 is an old version, and should not be used, but many people will have old versions installed, and be unaware that they should not be used with new models. The consequence of this bug is that research papers should provide both the model of COMET used, as well as the version of the software.
To Reproduce
Install COMET 2.2.1 on Python 3.9, score test files. I used an en->mt translation of NTREX with this model https://huggingface.co/HPLT/mt-mt-en-v1.0-hplt_opus. I have attached the src, hypo and ref files.
Install COMET 1.1.2 on Python 3.7, score the same files.
Compare scores.
Expected behaviour
With the same model and data, COMET should give the same scores.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
OS: Ubuntu 20.04.6 LTS
Versions: 2.2.1 and 1.1.2
wmt22-comet-da
🐛 Bug
Using COMET 2.2.1 (and Python 3.9) and the wmt22-comet-da model I get a score of 0.7982, but using COMET 1.1.2 (and Python 3.7) I get a score 0.8618. This is on exactly the same source, target and reference file.
I appreciate that 1.1.2 is an old version, and should not be used, but many people will have old versions installed, and be unaware that they should not be used with new models. The consequence of this bug is that research papers should provide both the model of COMET used, as well as the version of the software.
To Reproduce
Install COMET 2.2.1 on Python 3.9, score test files. I used an en->mt translation of NTREX with this model https://huggingface.co/HPLT/mt-mt-en-v1.0-hplt_opus. I have attached the src, hypo and ref files.
Install COMET 1.1.2 on Python 3.7, score the same files.
Compare scores.
Expected behaviour
With the same model and data, COMET should give the same scores.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
OS: Ubuntu 20.04.6 LTS
Versions: 2.2.1 and 1.1.2
wmt22-comet-da
hypo.txt
ref.txt
src.txt
The text was updated successfully, but these errors were encountered: