This application should take an image received in a POST, normalize it (tilting, contrast, grayscale) and then apply an Optical Character Recognition algorithm on the image so that all of the sentences are obtained. It has dependency only on Tesseract, the Tess4J library, and Log4J. Future step: sentiment analysis on the content of the text that is parsed.
The endpoint for uploading an image is "/upload"
You can also specify the following parameters in the body of the request:
This application support the Getting Started with Scala/Play on Heroku article - check it out.
The below example shows how to use Python in order to post an image and then receive back the content.
The resulting json contains "text", which is the text extracted and "image", which is the preprocessed image with bounding boxes drawn on it.
>>> import requests
>>> files = {'picture': open('bcrfile.jpg', 'rb')}
>>> url = 'https://licenta-ocr-parser.herokuapp.com/upload'
>>> r = requests.post(url,files=files)
>>> r
<Response [200]>
>>> r.json()['text']
'SOCIETATE ADMINISTRATA SISTEM DUALIST \n Regina Elisabeta Sector Bucuresti, cod 030016 \nInmatriculata Registrul Comertului: J40/90/1991 \nInmatriculata Registrul Bancar RB-PJR-40-008/18.02.1999 \nCod Unic lnregistrare: 361757 \nlnregistrata Registrul evidenta prelucrarilor de date ca racter personal sub 3776 377 \nCapital Social: 62534161450 lei \nSWIFT: RNCB Site: www.bcr.ro; Email: [email protected] \nWOBCRT 0800.801 .BCR (0800801227), apelabil gratuit din orice retea nationala; \n+4021 407 apelabil din strainatate tarif normal. \n efectuarea operatiuni carduri bancare 1667019 \n'
The example below shows how to set drawing params on the resulting image.
The three possible parameters are: 'drawrow', 'drawchar', 'drawword'. In case of their absence, the default is 'false'.
>>> import requests
>>> files = {'picture' : open('bcrfile.jpg','rb')}
>>> import json
>>> params = {'drawrow': 'true'}
>>> response = requests.post('http://localhost:9000/upload', files=files,data=params)
>>> response
<Response [200]>
>>> response.json()['text']
'SOCIETATE ADMINISTRATA IN SISTEM DUALIST \r\nBd. Regina Elisabeta nr. 51 Sector 31 Bucuresti, cod 030016 \r\nInmatriculata Ia Registrul Comertului: J40/90/1991 \r\nInmatriculata la Registrul Bancar Nr. RB-PJR-40-008/18.02.1999 \r\nCod Unic de lnregistrare: RO 361757 \r\nlnregistrata la Registrul de evidenta a prelucrarilor de date CU ca racter personal sub nr. 3776 si 377 \r\nCapital Social: 4 62534161450 lei \r\nSWIFT: RNCB RO BU; Site: www.bcr.ro; Email: [email protected] \r\nWOBCRT 0800.801 .BCR (0800801227), apelabil gratuit din orice retea nationala; \r\n+4021 407 42 00 apelabil din strainatate Ia tarif normal. \r\ni efectuarea de operatiuni CU carduri bancare nr. l 1667019 \r\n'
$ heroku create
$ git push heroku master
$ heroku open
or
The application can be configured with the values set in the models.config.AppConfig class. The configurations are pretty self-explanatory.
For more information about using Play and Scala on Heroku, see these Dev Center articles: