Skip to content

Commit

Permalink
Merge pull request #6 from artefactory/av/streaming
Browse files Browse the repository at this point in the history
Av/streaming
  • Loading branch information
AlexisVLRT authored Jan 2, 2024
2 parents fa9c948 + de0c08a commit c8e3bc4
Show file tree
Hide file tree
Showing 42 changed files with 1,052 additions and 1,040 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,8 @@ secrets/*
# Mac OS
.DS_Store
data/
vector_database/

*.sqlite
*.sqlite3
*.sqlite3
vector_database/
234 changes: 221 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,228 @@
<div align="center">
<div align="center">

# skaff-rag-accelerator
# skaff-rag-accelerator

```bash
export PYTHONPATH="/Users/sarah.lauzeral/Library/CloudStorage/[email protected]/Mon Drive/internal_projects/skaff-rag-accelerator/"
</div>

This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds)

## Features

- A configurable RAG setup based around Langchain
- `RAG` and `RagConfig` python classes to help you set things up
- A REST API based on FastAPI to provide easy access to the RAG as a web backend
- A demo Streamlit to serve as a basic working frontend (not production grade)
- A document loader for the RAG
- User authentication (unsecure for now, but usable for conversation history)
- User feedback collection
- Streamed responses

## Quickstart

In a fresh env:
```shell
pip install -r requirements.txt
```

```bash
python "/Users/sarah.lauzeral/Library/CloudStorage/[email protected]/Mon Drive/internal_projects/skaff-rag-accelerator/backend/main.py"
You will need to set some env vars, either in a .env file at the project root, or just by exporting them like so:
```shell
export OPENAI_API_KEY="xxx" # API key used to query the LLM
export EMBEDDING_API_KEY="xxx" # API key used to query the embedding model
export DATABASE_URL="sqlite:///$(pwd)/database/db.sqlite3" # For local developement only. You will need a real, cloud-based SQL database URL for prod.
```

- comment mettre des docs dans le chatbot
- comment lancer l'API
- gestion de la config
- écrire des helpers de co, pour envoyer des messages...
- tester différents modèles
- écrire des snippets de code pour éxpliquer comment charger les docs dans le RAG
Start the backend server locally
```shell
python backend/main.py
```

</div>
Start the frontend demo
```shell
streamlit run frontend/app.py
```

You should than be able to login and chat to the bot:
![](docs/login_and_chat.gif)


## Loading documents

The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file.

Create `backend/load_my_docs.py`:
```python
from pathlib import Path

from backend.rag_components.rag import RAG


data_directory = Path("data")

config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)

for file in data_directory.iterdir():
if file.is_file():
rag.load_file(file)
```

If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects.

**TODO: example**

#### Document indexing

The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`.

[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing)

## Configuring the RAG

### The `RAG` object

It provides a unique interface to the RAG's functionalities.

Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py)

The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain.

It also manages document loading and indexing based on configuration settings.


Using the `RAG` class directly:
```python
from pathlib import Path
from backend.rag_components.rag import RAG
from backend.model import Message

config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)

message = Message(
id="123",
timestamp="2021-06-01T12:00:00",
chat_id="123",
sender="user",
content="Hello, how are you?",
)
response = rag.generate_response(message)
print(response)
```

[Go to the code.](backend/rag_components/rag.py)

### Managing the configuration (`RAGConfig`)

The overall config management works like this:
![](docs/config_architecture.png)

This means the best way to configure your RAG deployment is to modify the config.yaml file.

This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class.

In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this:
```yaml
# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file)
LLMConfig: &LLMConfig

# By default we're using a GPT model deployed on Azure. You should be able to change this to any langchain BaseChatModel here: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py
source: "AzureChatOpenAI"

# This is a key-value map of the parameters that will be passed to the langchain chat model object when it's created. Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py), we input the following params:
source_config:
openai_api_type: "azure"
openai_api_key: {{ OPENAI_API_KEY }}
openai_api_base: "https://poc-genai-gpt4.openai.azure.com/"
openai_api_version: "2023-07-01-preview"
deployment_name: "gpt4v"

# While the params in source_config are specific to each model, temperature is implemented by all BaseChatModel classes in langchain.
temperature: 0.1

# ... Rest of the config ...
```

Let's say we want to use a Vertex LLM instead. [Looking at the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19), we find this:

```python
class ChatVertexAI(_VertexAICommon, BaseChatModel):
"""`Vertex AI` Chat large language models API."""

model_name: str = "chat-bison"
"Underlying model name."
examples: Optional[List[BaseMessage]] = None
```

Updated `config.yaml` could look like this:
```yaml
LLMConfig: &LLMConfig
source: "ChatVertexAI"
source_config:
model_name: gemini-pro
temperature: 0.1
```
## Architecture
### The `frontend`, the `backend`, and the `database`

The whole goal of this repo is to decouple the "computing and LLM querying" part from the "rendering a user interface" part. We do this with a typical 3-tier architecture.

![](docs/3t_architecture.png)

- The [frontend](frontend) is the end user facing part. It reches out to the backend **ONLY** through the REST API. We provide a frontend demo here for convenience, but ultimately it could live in a completely different repo, and be written in a completely different language.
- The [backend](backend) provides a REST API to abstract RAG functionalities. It handles calls to LLMs, tracks conversations and users, handles the state management using a db, and much more. To get the gist of the backend, look at the of the API: http://0.0.0.0:8000/docs
- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8)


## Going further

### Extending the configuration

As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use.

For example, let's say you want to add the `foo` configuration parameter to your vector store configuration.

First, add it to `config.py` in the part relavant to the vector store:

```python
# ...
@dataclass
class VectorStoreConfig:
# ... rest of the VectorStoreConfig ...
foo: str = "bar" # We add foo param, of type str, with the default value "bar"
# ...
```

This parameter will now be available in your `RAG` object configuration.

```python
from pathlib import Path
from backend.rag_components.rag import RAG
config_directory = Path("backend/config.yaml")
rag = RAG(config_directory)
print(rag.config.vector_store.foo)
# > bar
```

if you want to override its default value. You can do that in your `config.yaml`:
```yaml
VectorStoreConfig: &VectorStoreConfig
# ... rest of the VectorStoreConfig ...
foo: baz
```

```python
print(rag.config.vector_store.foo)
# > baz
```

### Using `RagConfig` directly

TODO: Add usage example here
12 changes: 3 additions & 9 deletions backend/authentication.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,46 +5,40 @@
from jose import jwt
from pydantic import BaseModel

from database.database import Database
from backend.database import Database

SECRET_KEY = os.environ.get("SECRET_KEY", "default_unsecure_key")
ALGORITHM = "HS256"


class User(BaseModel):
"""Represents a user with an email and password."""

email: str = None
password: str = None


def create_user(user: User) -> None:
"""Create a new user in the database."""
with Database() as connection:
connection.query(
connection.execute(
"INSERT INTO user (email, password) VALUES (?, ?)", (user.email, user.password)
)


def get_user(email: str) -> User:
"""Retrieve a user from the database by email."""
with Database() as connection:
user_row = connection.query("SELECT * FROM user WHERE email = ?", (email,))[0]
user_row = connection.execute("SELECT * FROM user WHERE email = ?", (email,))
for row in user_row:
return User(**row)
raise Exception("User not found")


def authenticate_user(username: str, password: str) -> Optional[User]:
"""Authenticate a user by their username and password."""
user = get_user(username)
if not user or not password == user.password:
return False
return user


def create_access_token(*, data: dict, expires_delta: Optional[timedelta] = None) -> str:
"""Create a JWT access token with optional expiry."""
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
Expand Down
87 changes: 0 additions & 87 deletions backend/chatbot.py

This file was deleted.

Loading

0 comments on commit c8e3bc4

Please sign in to comment.