Document++ is a Django-based web application that allows users to upload documents in .docx
or .pdf
format to perform functions, currently:
- Summarization: Summarize long documents into concise versions.
- Spelling Correction: Correct spelling errors in the uploaded documents.
This application uses various libraries to handle document processing, summarization, and spelling correction.
- Upload Support: Supports
.docx
and.pdf
document uploads. - Summarization: Uses the Latent Semantic Analysis (LSA) algorithm to provide concise summaries.
- Spelling Correction: Automatically detects and corrects spelling mistakes in documents.
- Download: Provides an option to download the corrected document with spelling corrections applied.
- Django: The web framework that powers the application.
- To install Django, run:
pip install django
- To install Django, run:
- Fitz (PyMuPDF): Used for reading and processing PDF files.
- Install it using:
pip install pymupdf
- Install it using:
- python-docx: For reading and writing
.docx
files.- Install it using:
pip install python-docx
- Install it using:
- Sumy: Provides several summarization algorithms, including Latent Semantic Analysis (LSA).
- Install Sumy:
pip install sumy
- Install Sumy:
- nltk.punkt: Used for sentence tokenization during summarization.
- Install NLTK and the required Punkt package:
Additionally, download the Punkt tokenizer models:
pip install nltk
import nltk nltk.download('punkt')
- Install NLTK and the required Punkt package:
- TextBlob or other spell-checking libraries (if used) for spelling correction. If TextBlob is being used, you can install it using:
pip install textblob
- Tempfile: Used for creating temporary files during processing.
- io: Provides I/O operations for handling file streams.
- time: Used for timing operations.
- NLTK: Provides the
punkt
tokenizer required for summarization.
-
Clone the Repository:
git clone <your-repository-url> cd document-improver
-
Install Required Dependencies: Install all required dependencies using
pip
:pip install django pymupdf python-docx sumy nltk
-
Run the Development Server: Start the Django development server:
python manage.py runserver
-
Access the Application: Open your browser and navigate to:
http://127.0.0.1:8000
- Upload a
.docx
or.pdf
document. - Select whether you want to summarize or correct spelling in the document.
- Click the respective button, and the app will process the document.
- Download the summarized or corrected document.
I plan to add the following features:
- Markup Support: To be able to markup the document(s) with text, shapes, and/or drawings.
- Document Bot: An AI bot which answers questions from the uploaded document.
- PDF Merge: Feature to merge 2 or more PDFs.