This project implements a lightweight FastAPI server designed to support a Retrieval Augmented Generation (RAG) system. The server processes various document formats, generates embeddings using Hugging Face's sentence-transformers, and provides efficient querying via ChromaDB for vector-based retrieval.
- FastAPI Server: Lightweight and asynchronous for handling non-blocking operations.
- ChromaDB Integration: Persistent vector database for storing and querying document embeddings.
- Multi-format Support: Ingestion support for PDF, DOC, DOCX, and TXT files.
- Embeddings: Utilizes sentence-transformers/all-MiniLM-L6-v2 for generating document embeddings.
- Concurrency: Efficient handling of multiple requests using FastAPI's async capabilities.
- RAG System: Retrieves relevant document chunks and generates context-aware responses.
- FastAPI: For building the server backend.
- ChromaDB: Vector database for storing document embeddings.
- Sentence-Transformers: For embedding generation.
- Hugging Face Inference API: For generating RAG-based responses.
- PyPDF2 & python-docx: For parsing and ingesting PDF and DOCX documents.
- Asyncio: For handling asynchronous operations and tasks.
-
Clone the Repository
git clone https://github.com/your-username/fastapi-rag-server.git cd fastapi-rag-server
-
Create a Virtual Environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Start the FastAPI Server
uvicorn app:app --reload
-
Access the Server
Open your browser and go tohttp://127.0.0.1:8000/docs
to see the interactive API documentation.
rag_server/
├── app.py # Main FastAPI application
├── vector_db.py # ChromaDB integration logic
├── load_data.py # Document loading and splitting logic
├── prompts.py # RAG prompt generation
├── utils.py # Utility functions (e.g., async helpers)
├── ingest.py # Document ingestion logic
├── COLLECTIONS.txt # List of document collections
└── data/ # Directory for uploaded documents
POST /upload
Upload documents (PDF, DOC, DOCX, or TXT) to the server for ingestion.
POST /query
Submit a query to retrieve relevant document chunks using the embeddings generated by sentence-transformers.
GET /collections
View all document collections stored in the database.
The RAG server is capable of:
- Uploading and processing multiple document formats.
- Efficiently querying stored document collections using vector-based retrieval.
- Providing contextual responses based on the documents ingested.
- Scalable to handle various document types and collections with ease.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch-name
. - Make your changes and commit them:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin feature-branch-name
. - Submit a pull request!
This project is licensed under the MIT License - see the LICENSE file for details.
This project demonstrates the power of modern Python async programming, advanced NLP models, and vector-based information retrieval with ChromaDB to create an efficient and scalable Retrieval Augmented Generation system.