Skip to content

mozilla/translations

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8199f8b · Jan 20, 2025
Dec 3, 2024
Nov 18, 2023
Jan 2, 2025
Nov 5, 2024
Jan 16, 2025
Dec 16, 2024
Jan 20, 2025
Jan 15, 2025
Jan 20, 2025
Jan 2, 2025
Jan 14, 2025
Jan 2, 2025
Oct 1, 2024
Oct 1, 2024
Oct 1, 2024
Jan 6, 2025
Apr 9, 2024
Nov 14, 2024
Aug 8, 2024
Nov 6, 2024
Jan 20, 2025
Jun 17, 2021
Apr 30, 2021
Jan 8, 2025
Jan 2, 2025
Jan 15, 2025
Jan 15, 2025

Repository files navigation

Firefox Translations

Training pipelines and the inference engine for Firefox Translations machine translation models.

The trained models are hosted in firefox-translations-models repository, compatible with bergamot-translator and power the Firefox web page translation starting with version 118.

The pipeline was originally developed as a part of Bergamot project that focuses on improving client-side machine translation in a web browser.

Documentation

Pipeline

The pipeline is capable of training a translation model for a language pair end to end. Translation quality depends on the chosen datasets, data cleaning procedures and hyperparameters. Some settings, especially low resource languages might require extra tuning.

We use fast translation engine Marian.

You can find more details about the pipeline steps in the documentation.

Orchestrators

An orchestrator is responsible for workflow management and parallelization.

  • Taskcluster - Mozilla task execution framework. It is also used for Firefox CI. It provides access to the hybrid cloud workers (GCP + on-prem) with increased scalability and observability. Usage instructions.
  • Snakemake - a file based orchestrator that allows to run the pipeline locally or on a Slurm cluster. Usage instructions. (The integration is not maintained since Mozilla has switched to Taskcluster. Contributions are welcome.)

Experiment tracking

Public training dashboard in Weights & Biases

Marian training metrics are parsed from logs and published using a custom module within the tracking directory. More information is available here.

Learning resources

Contributing

Contributions are welcome! See the documentation on Contributing for more details.

Feel free to ask questions in our Matrix channel #firefoxtranslations:mozilla.org.

Acknowledgements

This project uses materials developed by:

  • Bergamot project (github, website) that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303
  • HPLT project (github, website) that has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]
  • OPUS-MT project (github, website)
  • Many other open source projects and research papers (see References)