mongolian-nlp/image2bichig at master · tugstugi/mongolian-nlp

History

Name		Name	Last commit message	Last commit date
parent directory ..
fonts		fonts
.gitignore		.gitignore
README.md		README.md
crnn.py		crnn.py
generate_from_dictionary.py		generate_from_dictionary.py
generate_from_lyrics.py		generate_from_lyrics.py
mn_dict.txt.gz		mn_dict.txt.gz
ocr.py		ocr.py
synthetic.csv		synthetic.csv
test.jpg		test.jpg

README.md

Mongolian Script OCR

Synthetic Dataset

For generating a synthetic data set from Mongolian song lyrics and dictionary, first install all fonts from fonts. After that, execute the following commands:

mkdir images
./generate_from_dictionary.py > synthetic.csv
./generate_from_lyrics.py >> synthetic.csv

You can also download an already generated synthetic data set from here.

Train

To be released.

Eval

Download a pre trained model from here. To make OCR on an image, execute:

python ocr.py --checkpoint image2bichig-epoch-0157.pth test.jpg
ᠮᠢᠨᠦ ᠨᠤᠲᠠᠭ
ᠬᠡᠨᠲᠡᠢ ᠂ ᠬᠠᠩᠭᠠᠢ᠂ ᠱᠣᠶᠣᠨ ᠤ ᠥᠨᠳᠥᠷ ᠰᠠᠶ᠋ᠢᠬᠠᠨ ᠨᠢᠷᠤᠭᠤᠨ ᠤᠳᠨ
ᠬᠣᠶᠢᠲᠤ ᠵᠦᠭ ᠦᠨ ᠴᠢᠮᠡᠭ ᠪᠣᠯᠤᠭᠰᠠᠨ ᠣᠢ ᠬᠥᠪᠴᠢ ᠶᠢᠨ ᠠᠭᠤᠯᠠᠨ ᠤᠳ
ᠮᠠᠨᠠᠨ  ᠮᠠᠷᠭ᠎ᠠ᠂ ᠨᠣᠮᠢᠨ ᠤ ᠥᠷᠭᠡᠨ ᠶᠡᠬᠡ ᠭᠣᠪᠢ ᠤᠳᠨ
ᠡᠮᠦᠨ᠎ᠡ ᠵᠦᠭ ᠦᠨ ᠮᠠᠩᠯᠠᠢ ᠪᠣᠯᠤᠭᠰᠠᠨ ᠡᠯᠡᠯᠡᠳ ᠮᠠᠩᠬᠠᠨ ᠳᠠᠯᠠᠢ ᠤᠳ
 ᠡᠨᠡ ᠪᠣᠯ ᠮᠢᠨᠦ ᠲᠦᠷᠦᠭᠰᠡᠨ ᠨᠤᠲᠤᠭ ᠮᠣᠩᠭᠣᠯ ᠤᠨ ᠰᠠᠶ᠋ᠢᠬᠠᠨ ᠣᠷᠣᠨ

You can try it also online on Colab here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image2bichig

image2bichig

README.md

Mongolian Script OCR

Synthetic Dataset

Train

Eval

Files

image2bichig

Directory actions

More options

Directory actions

More options

Latest commit

History

image2bichig

Folders and files

parent directory

README.md

Mongolian Script OCR

Synthetic Dataset

Train

Eval