Skip to content

The aim of this project is to design and train a model that is able to read images of scanned Arabic documents and generate the text written in those images.

Notifications You must be signed in to change notification settings

moazshorbagy/OCR-Arabic-Scripts

Repository files navigation

OCR-Arabic-Scripts

The aim of this project is to design and train a model that is able to read images of scanned Arabic documents and generate the text written in those images.

Objective

This project implements a complete Machine Learning pipeline, i.e., the project includes (but not limited to) the following modules:

  • preprocessing module
  • feature extraction/selection module
  • model selection and training module
  • performance analysis module

Dataset

A dataset of images and its ground truth text was obtained from the Watan-2004 Arabic text corpus, compiled by Dr. Mourad Abbas (http://sites.google.com/site/mouradabbas9/corpora)

N.B.: This corpus is only for scientific use. However, any use of it in order to create and release other ressources or software must have the authorization of Mourad Abbas.

Run

Dependencies

  • python3
  • numpy
  • opencv
  • skimage
  • scipy
  • matplotlib

Run the Project

$ python3 ./ocr.py

About

The aim of this project is to design and train a model that is able to read images of scanned Arabic documents and generate the text written in those images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages