Skip to content

Latest commit

 

History

History
318 lines (259 loc) · 15.5 KB

README.md

File metadata and controls

318 lines (259 loc) · 15.5 KB

MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

📃 Paper | 🤗 Huggingface | 📭 Contact

Overview

Welcome to the repository of MAPO, our cutting-edge framework designed to revolutionize multilingual reasoning capabilities in large language models (LLMs).

  • 🚀 We propose a framework that enhances the reasoning multilingual reasoning capabilities by aligning reasoning processes of other languages with those of English. We use off-the-shelf translation models to estimate the alignment of reasoning processes in other languages, and then optimize this alignment as a preference using popular preference optimization methods such as DPO or PPO.

  • 📈 By utilizing our framework, you can effectively improve the consistency of multilingual reasoning, thereby enhancing the multilingual reasoning capabilities of large models in a more generalizable manner. Our approach has achieved impressive performance improvements, surpassing all baselines, including ChatGPT, and has reached state-of-the-art (SOTA) results.

  • 🌐 Overall, our method demonstrates a novel way of improving the multilingual reasoning abilities of models without the need for extensive annotation of reasoning processes in other languages, enabling a more generalizable enhancement of multilingual reasoning capabilities.

🏆 Benchmarks

Below is the average accuracy across ten languages on three multilingual mathematical reasoning datasets . Our method significantly improves the multilingual reasoning capabilities of LLMs by a large margin, achieving the SOTA performance. We also hope that in the future, more multilingual reasoning LLMs can collaborate with our work to further enhance multilingual reasoning capabilities.

System MSVAMP MGSM MNumGLUESub
GPT-3.5-Turbo 46.6 42.2 49.4
MAmmoTH 7B 26.3 21.3 24.2
WizardMath 7B 32.5 23.0 28.7
MetaMath 7B 46.2 37.0 43.2
QAlign 7B 57.2 49.6 -
MathOctopus 7B 41.2 39.5 37.1
+ MAPO-DPO(ours)🔥 57.4 41.6 50.4
MetaMathOctopus 7B 53.0 45.5 39.2
+ MAPO-DPO(ours) 👑 64.7 51.6 52.9
MistralMathOctopus 7B 59.0 58.0 56.8
+ MAPO-DPO(ours) 👑 74.6 67.3 70.0
System MSVAMP MGSM MNumGLUESub
GPT-3.5-Turbo 46.6 42.2 49.4
MAmmoTH 13B 38.6 28.9 29.5
WizardMath 13B 35.7 28.3 29.0
MetaMath 13B 46.2 43.9 43.3
QAlign 13B 62.6 57.1 -
MathOctopus 13B 51.8 46.0 40.3
+ MAPO-DPO(ours)🔥 60.1 48.5 53.8
MetaMathOctopus 13B 56.3 51.4 49.5
+ MAPO-DPO(ours) 👑 67.0 58.0 59.8

🏆 Alignment Performance

Alt text for image 1 Alt text for image 2

We report PPL-based alignment score (left) and ACR (right), respectively assessing the consistency of the reasoning process and the reasoning answer. MAPO achieves significant improvements in the consistency of both the reasoning processes and the reasoning answers of LLM across various languages.

🛠️ Training & Evaluation

  • Preference optimization data preparation

    • Generation: bash sampling.sh
    • Preference estimation: bash PreferenceEstimate.sh
    • Format paired data: python3 extract_dpo_data.py
  • Training:

    • DPO: bash dpo.sh/dpo13b.sh yourconfig.json
    • PPO: bash ppo_lora.sh yourconfig.json
  • Evaluation: bash run.sh

For more details about training/evaluating, please navigate to the Alignment/Evaluation directory.

Citation

If you find this repository helpful, feel free to cite our paper:

@misc{she2024mapo,
      title={MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization}, 
      author={Shuaijie She and Wei Zou and Shujian Huang and Wenhao Zhu and Xiang Liu and Xiang Geng and Jiajun Chen},
      year={2024},
      eprint={2401.06838},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}