Figure: Schematic of a Retrieval-Augmented Generation (RAG) system. Adapted from Elevating Your Retrieval Game: Insights from Real-world Deployments.
RAGSkeleton: A foundational, modular framework for building customizable Retrieval-Augmented Generation (RAG) systems across any domain.
This project was originally developed as a Retrieval-Augmented Generation (RAG) system for materials science literature. However, it can be easily adapted for any domain with minimal modifications, making it a flexible RAG skeleton architecture. Users can adjust the system prompt to tailor responses to any field, making this setup a versatile foundation for a RAG system.
With this skeleton, users can seamlessly swap out different components—such as the embedding model, vector database, or LLM for response generation—with their preferred options, and quickly build a RAG system suited to their data and needs. This architecture significantly reduces the time and effort required to create a custom RAG system.
To help you get started with RAGSkeleton, we’ve prepared a quick interactive Google Colab tutorial:
In this tutorial, you’ll learn how to:
- Parse and index PDF files into a vector database.
- Use a Retrieval-Augmented Generation (RAG) pipeline to query the documents.
- Ask questions interactively and get responses grounded in your documents, with links to the source documents.
The tutorial runs on Google Colab, requiring no setup beyond logging into Hugging Face and uploading your data to Google Drive.
This section is recommended for users who want a quick setup for running RAGSkeleton with minimal configuration from their own machine.
-
Create a Conda Environment
To ensure compatibility, first set up a new conda environment:
conda create -n rag_skeleton python=3.10 conda activate rag_skeleton
-
Install via pip
Install RAGSkeleton and its dependencies using pip:
pip install rag_skeleton
RAGSkeleton relies on Hugging Face for both the embedding model and the generative language model. Log in to Hugging Face from the terminal to access the necessary models:
huggingface-cli login
Enter your Hugging Face access token when prompted. You can obtain an API token by signing up at Hugging Face and navigating to your account settings.
Note: Some models, like Meta LLaMA, may require additional permission from the owner on Hugging Face. To use these models, request access through the model's Hugging Face page, and you’ll be notified when access is granted.
Run RAGSkeleton with:
rag_skeleton --data_path /path/to/your/pdf/folder --load_mode local --model_name "meta-llama/Llama-3.2-3B-Instruct"
-
--data_path
: Path to a directory of PDF files for creating a vector database. If omitted, the system will use the existing knowledge base (if available) or prompt you to provide a path. If you want to ground your RAG on a different set of documents, simply provide the new directory path here, and the system will create a fresh knowledge base. -
--load_mode
: Specifylocal
to use a model hosted on your system, orapi
to use Hugging Face's API.local
mode is suitable if you have the necessary computational resources, whileapi
mode is useful if you prefer not to host the model locally or lack the computational resources. -
--model_name
: Name of the language model to use. Default is "meta-llama/Llama-3.2-3B-Instruct". Any model available on Hugging Face can be specified here, allowing you to choose models best suited to your requirements. -
--api_token
: Required if using the Hugging Face API (--load_mode
api).
Note: With the API, you can opt for larger models that might otherwise be challenging to run locally. However, keep in mind that the Hugging Face Free API has a model size limit of 10GB. If you need to use larger models, consider a paid API plan or explore model optimization techniques.
When you start RAGSkeleton, you’ll be welcomed by a chatbot interface where you can ask questions. The system will retrieve relevant information from the knowledge base and generate responses grounded in the PDF documents you provided.
To exit, type exit
.
For developers and contributors who want to work with the source code or customize the setup.
First, clone the repository and navigate to the project root directory:
git clone https://github.com/hasan-sayeed/RAGSkeleton.git
cd RAGSkeleton
-
Create a Conda Environment:
conda env create -f environment.yml conda activate rag_skeleton
-
Login to Hugging Face
Similar to the pip installation, login to Hugging Face:
huggingface-cli login
-
Optional Setup for Git Hooks:
-
Install pre-commit git hooks with:
pre-commit install
-
This configuration can be modified in .pre-commit-config.yaml.
-
Install nbstripout to automatically remove notebook output cells in committed files:
nbstripout --install --attributes notebooks/.gitattributes
-
To run the RAG system directly from the source, use the -m flag with Python to specify the module path. This will invoke the main.py entry point, which manages command-line arguments and initiates the chatbot.
python -m src.rag_skeleton --data_path /path/to/your/pdf/folder --load_mode local --model_name "meta-llama/Llama-3.2-3B-Instruct"
-
--data_path
: Path to a directory of PDF files for creating a vector database. If omitted, the system will use the existing knowledge base (if available) or prompt you to provide a path. If you want to ground your RAG on a different set of documents, simply provide the new directory path here, and the system will create a fresh knowledge base. -
--load_mode
: Specifylocal
to use a model hosted on your system, orapi
to use Hugging Face's API.local
mode is suitable if you have the necessary computational resources, whileapi
mode is useful if you prefer not to host the model locally or lack the computational resources. -
--model_name
: Name of the language model to use. Default is "meta-llama/Llama-3.2-3B-Instruct". Any model available on Hugging Face can be specified here, allowing you to choose models best suited to your requirements. -
--api_token
: Required if using the Hugging Face API (--load_mode
api).
Note: With the API, you can opt for larger models that might otherwise be challenging to run locally. However, keep in mind that the Hugging Face Free API has a model size limit of 10GB. If you need to use larger models, consider a paid API plan or explore model optimization techniques.
When you start RAGSkeleton, you’ll be welcomed by a chatbot interface where you can ask questions. The system will retrieve relevant information from the knowledge base and generate responses grounded in the PDF documents you provided.
To exit, type exit
.
Full documentation is available on Read the Docs.
- Always keep your abstract (unpinned) dependencies updated in
environment.yml
and eventually insetup.cfg
if you want to ship and install your package viapip
later on. - Create concrete dependencies as
environment.lock.yml
for the exact reproduction of your environment with:For multi-OS development, consider usingconda env export -n rag_skeleton -f environment.lock.yml
--no-builds
during the export. - Update your current environment with respect to a new
environment.lock.yml
using:conda env update -f environment.lock.yml --prune
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── rag_skeleton <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
The following enhancements are coming soon to RAGSkeleton:
- Evaluate RAG Performance with Ragas Metrics – Enable users to assess the quality of generated responses using standard RAG evaluation metrics from Ragas.
- Conceptual Understanding Score (Materials Science Specific) – A domain-specific metric to assess how well the RAG system understands key concepts in materials science.
- Cross-Disciplinary Score (Materials Science Specific) – Evaluates how well the RAG system integrates knowledge from multiple disciplines (such as physics, chemistry, and mathematics) to answer complex materials science questions that require interdisciplinary understanding.
Stay tuned! If you have suggestions or feature requests, feel free to open a discussion or issue on GitHub.
Any questions, comments, or suggestions are welcome! This project is a flexible foundation for RAG-based applications, and we’re open to improvements that can make it even more useful across various domains.
This project has been set up using PyScaffold 4.6 and the dsproject extension 0.7.2.