An initial exploration of retrieval-augmented generation for a research software repository.
-
Clone the Repository:
git clone <repository-url>
-
Add the API URL to Your .env File:
- Add the following lines to your
.env
file in the root directory:API_URL="Your API URL here" QUERY="Your query here"
- Add the following lines to your
-
Navigate to the Project Directory:
cd <project-directory>
-
Install the Necessary Dependencies:
pip install -r requirements.txt
Using Python 3.12.4 is recommended.
-
Create the directories:
- Create a /models, /vectorisations and /data directory in the root directory.
-
Create the Dataset & Vectorizations:
-
Before running the notebook for the RAG experiment, you need to create the dataset and generate text vectorizations for the retrieval part of the RAG.
-
To do this, simply execute the
1_vectorisation.ipynb
notebook. The data will be saved to your machine and will be available the next time you open the project.
-
-
Install Ollama
- Install Ollama and download the model you want to use.
- To install
llama3
, the model we're using in this notebook, run the following commandollama run llama3
- Ollama has to run in the background for the chat-bots to work.