Oncology RAG Q&A App using Meditron 7B LLM, Qdrant Vector Database, and PubMedBERT Embedding Model.
This FastAPI application integrates transformer-based language models (LLMs) and Retrieval-Augmented Generation (RAG) techniques to deliver precise medical question-answering. It utilizes SentenceTransformer embeddings and the Qdrant vector database to optimize user experience.
Before running the ingestion script, ensure Docker is installed and running on your machine.
-
Download Qdrant Image:
docker pull qdrant/qdrant
-
Run Qdrant Container:
docker run -p 6333:6333 qdrant/qdrant
-
Ingest Data: After setting up Qdrant, run
ingest.py
to create a vector database in Qdrant. Vectors are ingested into Qdrant via its API.python ingest.py
Use retriever.py
to perform semantic similarity search on a Qdrant vector database using sentence embeddings.
- Run the Retriever:
python retriever.py
Ensure you are logged into Hugging Face to access the necessary models.
-
Login to Hugging Face:
huggingface-cli login
Log in using the write token generated in your Hugging Face account.
or directly Download the model from Hugging Face - meditron-7b-Q4_K_M-GGUF
-
Run the RAG Application: From the
src
folder, start the FastAPI application using Uvicorn:uvicorn rag:app
- ingest.py: Ingests PDF documents and creates vector embeddings using SentenceTransformers. The vectors are stored in the Qdrant vector database.
- retriever.py: Uses LangChain to perform semantic similarity searches on the Qdrant vector database using sentence embeddings.
- rag.py: Implements the RAG (Retrieval-Augmented Generation) application using FastAPI. Connects to the Qdrant server using the specified URL.
- Ensure Docker is installed and running before starting the Qdrant container.
- Ensure you have the necessary permissions and tokens for Hugging Face.
This application leverages cutting-edge AI technologies to provide precise and efficient medical question-answering. By integrating LLMs, RAG, SentenceTransformer embeddings, and the Qdrant vector database, it ensures optimized performance and user experience.