WMT24 Metrics Task Speech Subset #21

jack89roberts · 2025-01-31T08:55:45Z

The WMT 2024 metrics task had a new speech subset. You can get all the WMT metrics evaluation data using this repo: https://github.com/google-research/mt-metrics-eval (has source text, human translation, machine translations for many models, human ratings, and metric ratings). We should be able to pretty easily compute results for the speech subset and see how metrics compare between them, which I don't think WMT included in their results paper. They haven't released audio as far as I can tell, but I dropped the organisers an email to see if it's available, but either way we could also add BLASER with text at least.

There are some other subsets that might be interesting too, e.g. a social text one, plus even a separate text chat translation challenge I hadn't noticed.

jack89roberts · 2025-01-31T09:04:35Z

Audio is available here but may take some time to match up with the relevant lines in the metrics data: https://data.statmt.org/wmt24/general-mt/wmt24_GeneralMT-audio.zip

jack89roberts added datasets research labels Jan 31, 2025

klh5 self-assigned this Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WMT24 Metrics Task Speech Subset #21

WMT24 Metrics Task Speech Subset #21

jack89roberts commented Jan 31, 2025

jack89roberts commented Jan 31, 2025

WMT24 Metrics Task Speech Subset #21

WMT24 Metrics Task Speech Subset #21

Comments

jack89roberts commented Jan 31, 2025

jack89roberts commented Jan 31, 2025