Skip to content

TejasGupta-27/RAG

Repository files navigation

RAG

RAG is a Streamlit-based application that allows users to interact with document databases using a conversational interface. The application supports uploading documents, extracting embeddings for efficient querying, and generating responses in real-time.

Features

  • Document Upload: Supports PDF, DOCX, and TXT file uploads.
  • Embedding Storage: Automatically processes uploaded documents and stores embeddings for efficient querying.
  • Real-Time Conversational Interface: Allows users to query the document database and receive responses in a streaming, conversational manner.
  • Session Management: Maintains chat history and handles file uploads within the session.
  • SQL Integration: Integrates SQLite to allow dynamic SQL querying and natural language querying of the database.

Installation

Prerequisites

  • Python 3.8 or higher
  • Virtual environment tool (optional but recommended)

Setup

  1. Clone the Repository:

    git clone https://github.com/TejasGupta-27/RAG.git
    cd RAG
  2. Create and Activate a Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Run the Application:

    streamlit run ui.py

    Access the Interface: Open your web browser and go to http://localhost:8501.

Usage

Uploading Documents

  • Navigate to the Sidebar:

    • Click the "Browse files" button under the "Upload a File" section.
    • Select your PDF, DOCX, or TXT file.
  • Automatic Processing:

    • The document will be automatically processed, and embeddings will be stored.
  • Querying the Database:

    • Enter your query in the text input field in the sidebar.
    • Click "Send" to submit your query.
    • The bot will respond in the main chat area with relevant information extracted from the uploaded documents.

Viewing Chat History

  • The chat history will be displayed in the main conversation area, showing both user queries and bot responses.

Project Structure

├── data/                          # Directory for temporary files
│   └── temp/                      # Temporary storage for uploaded files
├── preprocessing.py               # Script for document text extraction and chunking
├── rag_pipeline.py                # Script for generating responses and storing embeddings
├── ui.py                          # Main Streamlit application
├── database_handler.py            # Script for handling database queries and interactions
├── logo.png                       # Logo image displayed in the sidebar
├── requirements.txt               # Python dependencies
└── README.md                      # Project documentation

Future Enhancements

  • Enhanced Query Expansion: Implement more advanced query expansion techniques to improve document retrieval accuracy.
  • Multi-Language Support: Add support for processing and querying documents in multiple languages.
  • User Authentication: Introduce user authentication to manage document access and interaction history.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages