This project is an OCR (Optical Character Recognition) to LLM (Language Model) processing system. It extracts text from images or documents, processes the text using a pre-trained language model (LLAMA 3.1), and returns the processed text. The system is designed to be modular, scalable, and easily deployable on an on-site server.
project_root/
│
├── src/
│ ├── ocr/
│ │ ├── __init__.py
│ │ └── ocr_service.py
│ │
│ ├── llm/
│ │ ├── __init__.py
│ │ └── llama_service.py
│ │
│ ├── pipeline/
│ │ ├── __init__.py
│ │ └── pipeline_service.py
│ │
│ ├── webapp/
│ │ ├── __init__.py
│ │ └── main.py
│ │
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── logger.py
│ │ └── config.py
│ │
│ └── __init__.py
│
├── data/
│ ├── input_images/
│ └── processed_text/
│
├── tests/
│ ├── __init__.py
│ ├── test_ocr_service.py
│ ├── test_llama_service.py
│ ├── test_pipeline_service.py
│ └── test_routes.py
│
├── docs/
│ ├── requirements.txt
│ ├── README.md
│ └── architecture_diagram.png
│
├── scripts/
│ ├── deploy.sh
│ └── start_dev.sh
│
├── .env
├── .gitignore
├── Dockerfile
└── docker-compose.yml
- Python 3.8+
- FastAPI
- Uvicorn
- Gunicorn
- EasyOCR
- PaddleOCR
- Transformers
- Celery
- Redis
- SQLAlchemy
- Psycopg2-binary
- Docker
- Docker Compose
- Docker installed and running
- PostgreSQL database
- Redis for task queuing
- On-site server with sufficient resources for LLM processing
git clone <repository-url>
cd ocr-to-llm
python3 -m venv venv
source venv/bin/activate
pip install -r docs/requirements.txt
Ensure PostgreSQL is installed and running. Create a database and user:
CREATE DATABASE ocr_llm_db;
CREATE USER ocr_user WITH ENCRYPTED PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE ocr_llm_db TO ocr_user;
Create a .env
file in the project root with the following contents:
DATABASE_URL=postgresql://ocr_user:your_password@localhost/ocr_llm_db
REDIS_URL=redis://localhost:6379/0
uvicorn src.webapp.main:app --reload
Start the Celery worker to handle background tasks:
celery -A src.pipeline.pipeline_service worker --loglevel=info
To build and run the application in Docker containers, use:
docker-compose up --build
- Access the application at
http://localhost:8000
. - Upload images to extract text, which will be processed by the LLM.
Run unit tests with:
pytest
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
This project is licensed under the MIT License.