Tasks

Main goal is to improve existing translation pairs using Llama 3.1 and some prompts

Find a way to call LM Studio using Python
Prepare a prompt to detect badly translated pairs
Prepare a prompt to improve mongolian to english translation pair
Publish improved translations
Train on improved translation dataset on the Colab and Tensorflow 2`

Notes

700 million word Mongolian news data set https://github.com/tugstugi/mongolian-bert

Python library for google translation, textblob https://textblob.readthedocs.io/en/dev/

Steps

sudo apt install parallel tor build-essential rar virtualenv bsdmainutils

virtualenv -p python3 env && source env/bin/activate && pip install -r requirements.txt

./install_translate.sh

python3 datasets/dl_and_preprop_mn_news.py

./generate_sentences.sh && ./criterion_sentences.sh && ./prepare_translation.sh && ./split_sentences.sh

./runtask.sh

References and links

https://lmstudio.ai/

https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF

https://medium.com/@LakshmiNarayana_U/lm-studios-python-play-crafting-code-creatively-86c21d18d34a

https://opensource.com/article/18/5/gnu-parallel

https://askubuntu.com/questions/147241/execute-sudo-without-password

Released datasets

[2019/09/10] 5K unvalidated sentences of pairs for english to mongolian.

https://gist.github.com/sharavsambuu/be9001ddcb954565606466a3556bbf27

[2019/09/13] 94K unvalidated sentences of pairs for english to mongolian.

https://drive.google.com/file/d/1GNo1XJxRFxjey5VDsHjLvj9upXJOqd3e/view?usp=sharing

[2019/10/10] 1 million mongolian to english sentence pairs.

https://drive.google.com/file/d/14AtTVgibirSdHYTBFM9G1XPS7DvM5SdE/view?usp=sharing

Neural machine translation colab experiment, based on official tensorflow transformer tutorial

Pretrained version for fast inference

Train on Colab

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
datasets		datasets
lmstudio_exps		lmstudio_exps
.gitignore		.gitignore
1m_split_sentences.sh		1m_split_sentences.sh
README.md		README.md
criterion_sentences.sh		criterion_sentences.sh
detect_bad_translations.py		detect_bad_translations.py
generate_sentences.sh		generate_sentences.sh
install_translate.sh		install_translate.sh
prepare_translation.sh		prepare_translation.sh
requirements.txt		requirements.txt
runtask.sh		runtask.sh
sample_million_sentences.sh		sample_million_sentences.sh
sentences.py		sentences.py
split_sentences.sh		split_sentences.sh
translate_sentences.sh		translate_sentences.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tasks

Notes

Steps

References and links

Released datasets

Neural machine translation colab experiment, based on official tensorflow transformer tutorial

About

Releases

Packages

Languages

sharavsambuu/english-mongolian-nmt-dataset-augmentation

Folders and files

Latest commit

History

Repository files navigation

Tasks

Notes

Steps

References and links

Released datasets

Neural machine translation colab experiment, based on official tensorflow transformer tutorial

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages