bash bash.letpdf.com

included

templates
invoice
invoice_date
jpg2text
ocr
ocr_file
ocr_folder
pdf2jpg
pdf2txt

start

clone project letpdf/bash: bash.letpdf.com

git clone https://github.com/letpdf/bash.git

Prepare python environment

cd /media/tom/projects/letpdf/bash
python3 -m pip install -r requirements.txt

import templates for invoices

git clone https://github.com/letpdf/templates.git

Invoice2textdata

PDFs are extremely difficult to scrape. Converting them to text files can make extracting their data significantly easier. I have focused on the widely used pdfminer package from python.

Convert from PDF files to this:

    { File Name:  2112.pdf Invoice Number:  INV002112 Invoice Date:  13-Jun-2016 Due Amount: Rs  1,661.09 }
    { File Name:  2137.pdf Invoice Number:  INV002137 Invoice Date:  22-Jun-2016 Due Amount: Rs  45.76 }
    { File Name:  2138.pdf Invoice Number:  INV002138 Invoice Date:  22-Jun-2016 Due Amount: Rs  45.76 }

Prerequisite

Update the config.json file and set the path as shown bellow:

    "src" : "Path where the zip file will be exctracted.",
    "des" : "path to the text file in which the complete data of pdf is exctracted.",
    "zip" : "complete path to the where the zip file is."

Execution

To extract the data from the set of pdf files, the invoive2textdata.py file is executed with the help of command:

                            python invoive2textdata.py config.json

Usage

    invoice2textdata.py : Is used to convert the file to text and find File Name, Invoice number, Invoice 
                          date, Due amount.

    functions.py        : Has set of functions like:
                            1. convert(), 
                            2. find_invoice_number(), 
                            3. find_date(),
                            4. find_amount()

    config.json         : Has path to Source and Destination.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
apidsl.txt		apidsl.txt
invoice.md		invoice.md
invoice.sh		invoice.sh
invoice_check.sh		invoice_check.sh
invoice_date.php		invoice_date.php
invoice_file.sh		invoice_file.sh
invoice_my.php		invoice_my.php
jpg2txt.py		jpg2txt.py
jpg2txt.sh		jpg2txt.sh
ocr.md		ocr.md
ocr.sh		ocr.sh
ocr_file.sh		ocr_file.sh
ocr_folder.log.txt		ocr_folder.log.txt
ocr_folder.sh		ocr_folder.sh
pdf2jpg.py		pdf2jpg.py
pdf2jpg.sh		pdf2jpg.sh
pdf2txt.md		pdf2txt.md
pdf2txt.py		pdf2txt.py
pdf2txt.sh		pdf2txt.sh
pdfs2txt.py		pdfs2txt.py
pdfs2txt.sh		pdfs2txt.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bash bash.letpdf.com

included

start

Invoice2textdata

Prerequisite

Execution

Usage

About

Releases

Packages

Languages

License

letpdf/bash

Folders and files

Latest commit

History

Repository files navigation

bash bash.letpdf.com

included

start

Invoice2textdata

Prerequisite

Execution

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages