This repos uses PyTorch to remove ruled lines from an image while reconstructing overlapping characters with lines. The goal of this model is to make easier the word recognition from OCR
🐍 python >3.9
(not sure need testing)
python -m pip install -r requirements.txt
First go to /data/
and run python downloadData.py
Stay in /data/
directory and run python MakeDataset.py --output [output default: ./] --pages [number of pages to generate default: 1000] --split [split or no directly the pages default: False]
Run python processBlock.py --dir [directory where pages are and where will they be generated default: ./]
Run python train.py --epoch [number of epochs to train default: 50] --dataset [dataset path] ?--load[Load or not the best saved model]
You can use infer.py
functions such as:
processImg,
processImgs,
splitAndProcessImg
See docs directly on functions descs,
Word Recognition model for Eval : MLTU Tutorials