Skip to content

syed-ateeb-naveed/Document-Plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document++

Overview

Document++ is a Django-based web application that allows users to upload documents in .docx or .pdf format to perform functions, currently:

  1. Summarization: Summarize long documents into concise versions.
  2. Spelling Correction: Correct spelling errors in the uploaded documents.

This application uses various libraries to handle document processing, summarization, and spelling correction.

Features

  • Upload Support: Supports .docx and .pdf document uploads.
  • Summarization: Uses the Latent Semantic Analysis (LSA) algorithm to provide concise summaries.
  • Spelling Correction: Automatically detects and corrects spelling mistakes in documents.
  • Download: Provides an option to download the corrected document with spelling corrections applied.

Dependencies

Core Dependencies

  • Django: The web framework that powers the application.
    • To install Django, run:
      pip install django

Document Handling

  • Fitz (PyMuPDF): Used for reading and processing PDF files.
    • Install it using:
      pip install pymupdf
  • python-docx: For reading and writing .docx files.
    • Install it using:
      pip install python-docx

Summarization

  • Sumy: Provides several summarization algorithms, including Latent Semantic Analysis (LSA).
    • Install Sumy:
      pip install sumy
  • nltk.punkt: Used for sentence tokenization during summarization.
    • Install NLTK and the required Punkt package:
      pip install nltk
      Additionally, download the Punkt tokenizer models:
      import nltk
      nltk.download('punkt')

Spelling Correction

  • TextBlob or other spell-checking libraries (if used) for spelling correction. If TextBlob is being used, you can install it using:
    pip install textblob

Other Dependencies

  • Tempfile: Used for creating temporary files during processing.
  • io: Provides I/O operations for handling file streams.

Additional Packages

  • time: Used for timing operations.
  • NLTK: Provides the punkt tokenizer required for summarization.

Setting Up the Project

  1. Clone the Repository:

    git clone <your-repository-url>
    cd document-improver
  2. Install Required Dependencies: Install all required dependencies using pip:

    pip install django pymupdf python-docx sumy nltk
  3. Run the Development Server: Start the Django development server:

    python manage.py runserver
  4. Access the Application: Open your browser and navigate to:

    http://127.0.0.1:8000
    

Usage

  1. Upload a .docx or .pdf document.
  2. Select whether you want to summarize or correct spelling in the document.
  3. Click the respective button, and the app will process the document.
  4. Download the summarized or corrected document.

Screenshots

image image

Future Expansions

I plan to add the following features:

  1. Markup Support: To be able to markup the document(s) with text, shapes, and/or drawings.
  2. Document Bot: An AI bot which answers questions from the uploaded document.
  3. PDF Merge: Feature to merge 2 or more PDFs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published