Extract data from invoices.

Target Data Extraction:

Date

Prerequisites

Before you begin, ensure you have installed Anaconda. If not, download it from Anaconda.

Installation

Cloning the Repository

To get started with this project, first clone the repository on your local machine: You have two options for cloning the repository: using the command line or GitHub Desktop.

Option 1: Using the Command Line

git clone https://github.com/Emidouni/Invoice-Information-Extraction

Option 2: Using GitHub Desktop

For a more graphical interface, you can use GitHub Desktop:

1.Download and install GitHub Desktop from desktop.github.com.
2.Open GitHub Desktop and sign in to your GitHub account.
3.Click on File > Clone Repository.
In the "URL" tab, enter the URL of the repository https://github.com/Emidouni/Invoice-Information-Extraction and choose the local path where you want to clone the repository.
4.Click Clone to start the cloning process.

Setting Up a Python Environment with Conda

After installing Anaconda , you can create a new Conda environment specifically for this project. This helps to manage dependencies and avoid conflicts with other projects.

Open the Anaconda Prompt or your terminal (make sure Conda is added to your PATH) and run

conda create --name myenv python=3.9.20

Replace myenv with your preferred name for the environment. This command creates a new Conda environment named myenv with Python version 3.8.18 Activate the environment with:

conda activate myenv

After activating the environment, you can proceed with installing other required packages as mentioned in the project's

pip install -r requirements.txt

In addition to the libraries listed in requirements.txt, to test different OCR tools and to run the OCR.ipynb Jupyter notebook, you need to install OCR.

Installation Instructions for Tesseract OCR

Tesseract OCR is an open-source Optical Character Recognition (OCR) engine used for text recognition in images.

Tesseract OCR Installation and Configuration

Download Tesseract OCR:
- Windows Users: Download the installer from UB Mannheim. Follow the installation instructions provided on the website.
Locate the Tesseract Installation Path:
- After installation, locate where Tesseract OCR has been installed on your machine. The default installation path on Windows is usually C:\Program Files\Tesseract-OCR\tesseract.exe.
Update the Script with Your Tesseract Path:
- In your project's script utils.py, where Tesseract OCR is utilized, locate the line that sets the tesseract_cmd property of pytesseract. Replace the existing path with the actual path to your Tesseract installation. For example, change the line from:
```
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
```
  To:
```
pytesseract.pytesseract.tesseract_cmd = r"C:\Path\To\Your\tesseract.exe"
```

Ensure to replace C:\Path\To\Your\tesseract.exe with the correct path to where Tesseract OCR is installed on your system.

Usage

streamlit run app2.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
MODEL_stream		MODEL_stream
.gitattributes		.gitattributes
CNN_model_training.ipynb		CNN_model_training.ipynb
OCR.ipynb		OCR.ipynb
Project_report.pdf		Project_report.pdf
README.md		README.md
Tiny_model_training.ipynb		Tiny_model_training.ipynb
VIT_model_training.ipynb		VIT_model_training.ipynb
app2.py		app2.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract data from invoices.

Target Data Extraction:

Prerequisites

Installation

Cloning the Repository

Option 1: Using the Command Line

Option 2: Using GitHub Desktop

Setting Up a Python Environment with Conda

Installation Instructions for Tesseract OCR

Tesseract OCR Installation and Configuration

Usage

About

Releases

Packages

Contributors 3

Languages

Emidouni/Invoice-Information-Extraction

Folders and files

Latest commit

History

Repository files navigation

Extract data from invoices.

Target Data Extraction:

Prerequisites

Installation

Cloning the Repository

Option 1: Using the Command Line

Option 2: Using GitHub Desktop

Setting Up a Python Environment with Conda

Installation Instructions for Tesseract OCR

Tesseract OCR Installation and Configuration

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages