Skip to content

Latest commit

 

History

History
53 lines (35 loc) · 3.98 KB

README.MD

File metadata and controls

53 lines (35 loc) · 3.98 KB

JPG to PDF converter

A framework made for converting an image file to PDF file.
Usage of just 4 packages of Python.
For latest code on how to convert multiple image files to pdf, check multiple-files branch.

Working

  1. First, this will seacrh for the text inside the image file(most probably .jpg/.PNG file).
  2. Then, it'll fetch the text & store it in a variable.
  3. And then, writes it into a docx file(So that if it extracts wrong text, you can manually type the correct text).
  4. And finally, convert the word file hence created into a pdf file.

Requirements

  1. You need to download tesseract.
  2. You need following packages of python:
    1. PIL
    2. pytesseract
    3. docx
    4. docx2pdf
  3. IDLE (PyCharm Community Edition preferrable)

How to run

  1. Go to main file.
  2. Just check the following lines in the code:
text = extractTextFromImg(imgPath) # - extract text from image file
appendDataToWord(wordPath, text) # - append data into word file
convertDocxToPdf(wordPath, pdfPath) # - convert it into pdf
  1. You don't need to change anything in this file. The changes which you need to make is in variables file.
  2. Just change the value of following 4 variables:
    1. tesseractPath # the tesseract path you just downloaded. The one written in the file is default path
    2. wordPath # the word file path. Put the word file path under wordFiles directory for better readability
    3. pdfPath # the pdf file path. Put the pdf file path under pdfFiles directory for better readability

Author

Nitin Kumar