Skip to content

Commit

Permalink
doc: add mkdocs
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexisVLRT committed Jan 4, 2024
1 parent 5a5c71b commit 8bf54c6
Show file tree
Hide file tree
Showing 12 changed files with 374 additions and 169 deletions.
174 changes: 5 additions & 169 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
<div align="center">

# skaff-rag-accelerator

</div>

This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds)

Expand Down Expand Up @@ -45,124 +42,6 @@ You should than be able to login and chat to the bot:
![](docs/login_and_chat.gif)


## Loading documents

The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file.

Create `backend/load_my_docs.py`:
```python
from pathlib import Path

from backend.rag_components.rag import RAG


data_directory = Path("data")

config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)

for file in data_directory.iterdir():
if file.is_file():
rag.load_file(file)
```

If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects.

**TODO: example**

#### Document indexing

The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`.

[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing)

## Configuring the RAG

### The `RAG` object

It provides a unique interface to the RAG's functionalities.

Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py)

The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain.

It also manages document loading and indexing based on configuration settings.


Using the `RAG` class directly:
```python
from pathlib import Path
from backend.rag_components.rag import RAG
from backend.model import Message

config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)

message = Message(
id="123",
timestamp="2021-06-01T12:00:00",
chat_id="123",
sender="user",
content="Hello, how are you?",
)
response = rag.generate_response(message)
print(response)
```

[Go to the code.](backend/rag_components/rag.py)

### Managing the configuration (`RAGConfig`)

The overall config management works like this:
![](docs/config_architecture.png)

This means the best way to configure your RAG deployment is to modify the config.yaml file.

This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class.

In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this:
```yaml
# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file)
LLMConfig: &LLMConfig

# By default we're using a GPT model deployed on Azure. You should be able to change this to any langchain BaseChatModel here: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py
source: "AzureChatOpenAI"

# This is a key-value map of the parameters that will be passed to the langchain chat model object when it's created. Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py), we input the following params:
source_config:
openai_api_type: "azure"
openai_api_key: {{ OPENAI_API_KEY }}
openai_api_base: "https://poc-genai-gpt4.openai.azure.com/"
openai_api_version: "2023-07-01-preview"
deployment_name: "gpt4v"

# While the params in source_config are specific to each model, temperature is implemented by all BaseChatModel classes in langchain.
temperature: 0.1

# ... Rest of the config ...
```

Let's say we want to use a Vertex LLM instead. [Looking at the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19), we find this:

```python
class ChatVertexAI(_VertexAICommon, BaseChatModel):
"""`Vertex AI` Chat large language models API."""

model_name: str = "chat-bison"
"Underlying model name."
examples: Optional[List[BaseMessage]] = None
```

Updated `config.yaml` could look like this:
```yaml
LLMConfig: &LLMConfig
source: "ChatVertexAI"
source_config:
model_name: gemini-pro
temperature: 0.1
```
## Architecture

### The `frontend`, the `backend`, and the `database`
Expand All @@ -176,53 +55,10 @@ The whole goal of this repo is to decouple the "computing and LLM querying" part
- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8)


## Going further

### Extending the configuration

As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use.

For example, let's say you want to add the `foo` configuration parameter to your vector store configuration.

First, add it to `config.py` in the part relavant to the vector store:
## Documentation

```python
# ...
@dataclass
class VectorStoreConfig:
# ... rest of the VectorStoreConfig ...
foo: str = "bar" # We add foo param, of type str, with the default value "bar"
# ...
```

This parameter will now be available in your `RAG` object configuration.

```python
from pathlib import Path
from backend.rag_components.rag import RAG
config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)
print(rag.config.vector_store.foo)
# > bar
```

if you want to override its default value. You can do that in your `config.yaml`:
```yaml
VectorStoreConfig: &VectorStoreConfig
# ... rest of the VectorStoreConfig ...
foo: baz
```

```python
print(rag.config.vector_store.foo)
# > baz
To deep dive into under the hood, take a look at the documentation:
```shell
mkdocs serve
```

### Using `RagConfig` directly

TODO: Add usage example here
Then go to http://localhost:8000/
1 change: 1 addition & 0 deletions docs/cookbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Here you will find a repository of configurations that proved to work.
62 changes: 62 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# skaff-rag-accelerator

This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds)

## Features

- A configurable RAG setup based around Langchain
- `RAG` and `RagConfig` python classes to help you set things up
- A REST API based on FastAPI to provide easy access to the RAG as a web backend
- A demo Streamlit to serve as a basic working frontend (not production grade)
- A document loader for the RAG
- User authentication (unsecure for now, but usable for conversation history)
- User feedback collection
- Streamed responses

## Quickstart

In a fresh env:
```shell
pip install -r requirements.txt
```

You will need to set some env vars, either in a .env file at the project root, or just by exporting them like so:
```shell
export OPENAI_API_KEY="xxx" # API key used to query the LLM
export EMBEDDING_API_KEY="xxx" # API key used to query the embedding model
export DATABASE_URL="sqlite:///$(pwd)/database/db.sqlite3" # For local developement only. You will need a real, cloud-based SQL database URL for prod.
```

Start the backend server locally
```shell
uvicorn backend.main:app
```

Start the frontend demo
```shell
streamlit run frontend/app.py
```

You should then be able to login and chat to the bot:
![](login_and_chat.gif)


## Architecture

### The `frontend`, the `backend`, and the `database`

The whole goal of this repo is to decouple the "computing and LLM querying" part from the "rendering a user interface" part. We do this with a typical 3-tier architecture.

![](3t_architecture.png)

- The [frontend](frontend) is the end user facing part. It reches out to the backend **ONLY** through the REST API. We provide a frontend demo here for convenience, but ultimately it could live in a completely different repo, and be written in a completely different language.
- The [backend](backend) provides a REST API to abstract RAG functionalities. It handles calls to LLMs, tracks conversations and users, handles the state management using a db, and much more. To get the gist of the backend, look at the of the API: http://0.0.0.0:8000/docs
- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8)

The structure of the repo mirrors this architecture.

### The RAG

![](rag_architecture.png)

In the `backend` folder of the repository, you will find a `rag_components` directory that implements this architecture.
29 changes: 29 additions & 0 deletions docs/loading_documents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Loading documents

The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file.

```python
from pathlib import Path

from backend.rag_components.rag import RAG


data_directory = Path("data")

config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)

for file in data_directory.iterdir():
if file.is_file():
rag.load_file(file)
```

If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects.

**TODO: example**

## Document indexing

The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`.

[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing)
Binary file added docs/rag_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8bf54c6

Please sign in to comment.