diff --git a/README.md b/README.md
index 2279537..beefb9c 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,5 @@
-
-
# skaff-rag-accelerator
-
This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds)
@@ -45,124 +42,6 @@ You should than be able to login and chat to the bot:
![](docs/login_and_chat.gif)
-## Loading documents
-
-The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file.
-
-Create `backend/load_my_docs.py`:
-```python
-from pathlib import Path
-
-from backend.rag_components.rag import RAG
-
-
-data_directory = Path("data")
-
-config_directory = Path("backend/config.yaml")
-rag = RAG(config_directory)
-
-for file in data_directory.iterdir():
- if file.is_file():
- rag.load_file(file)
-```
-
-If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects.
-
-**TODO: example**
-
-#### Document indexing
-
-The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`.
-
-[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing)
-
-## Configuring the RAG
-
-### The `RAG` object
-
-It provides a unique interface to the RAG's functionalities.
-
-Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py)
-
-The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain.
-
-It also manages document loading and indexing based on configuration settings.
-
-
-Using the `RAG` class directly:
-```python
-from pathlib import Path
-from backend.rag_components.rag import RAG
-from backend.model import Message
-
-config_directory = Path("backend/config.yaml")
-rag = RAG(config_directory)
-
-message = Message(
- id="123",
- timestamp="2021-06-01T12:00:00",
- chat_id="123",
- sender="user",
- content="Hello, how are you?",
-)
-response = rag.generate_response(message)
-print(response)
-```
-
-[Go to the code.](backend/rag_components/rag.py)
-
-### Managing the configuration (`RAGConfig`)
-
-The overall config management works like this:
-![](docs/config_architecture.png)
-
-This means the best way to configure your RAG deployment is to modify the config.yaml file.
-
-This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class.
-
-In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this:
-```yaml
-# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file)
-LLMConfig: &LLMConfig
-
- # By default we're using a GPT model deployed on Azure. You should be able to change this to any langchain BaseChatModel here: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py
- source: "AzureChatOpenAI"
-
- # This is a key-value map of the parameters that will be passed to the langchain chat model object when it's created. Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py), we input the following params:
- source_config:
- openai_api_type: "azure"
- openai_api_key: {{ OPENAI_API_KEY }}
- openai_api_base: "https://poc-genai-gpt4.openai.azure.com/"
- openai_api_version: "2023-07-01-preview"
- deployment_name: "gpt4v"
-
- # While the params in source_config are specific to each model, temperature is implemented by all BaseChatModel classes in langchain.
- temperature: 0.1
-
-# ... Rest of the config ...
-```
-
-Let's say we want to use a Vertex LLM instead. [Looking at the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19), we find this:
-
-```python
-class ChatVertexAI(_VertexAICommon, BaseChatModel):
- """`Vertex AI` Chat large language models API."""
-
- model_name: str = "chat-bison"
- "Underlying model name."
- examples: Optional[List[BaseMessage]] = None
-```
-
-Updated `config.yaml` could look like this:
-```yaml
-LLMConfig: &LLMConfig
- source: "ChatVertexAI"
- source_config:
- model_name: gemini-pro
- temperature: 0.1
-```
-
-
## Architecture
### The `frontend`, the `backend`, and the `database`
@@ -176,53 +55,10 @@ The whole goal of this repo is to decouple the "computing and LLM querying" part
- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8)
-## Going further
-
-### Extending the configuration
-
-As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use.
-
-For example, let's say you want to add the `foo` configuration parameter to your vector store configuration.
-
-First, add it to `config.py` in the part relavant to the vector store:
+## Documentation
-```python
-# ...
-
-@dataclass
-class VectorStoreConfig:
- # ... rest of the VectorStoreConfig ...
-
- foo: str = "bar" # We add foo param, of type str, with the default value "bar"
-
-# ...
-```
-
-This parameter will now be available in your `RAG` object configuration.
-
-```python
-from pathlib import Path
-from backend.rag_components.rag import RAG
-
-config_directory = Path("backend/config.yaml")
-rag = RAG(config_directory)
-
-print(rag.config.vector_store.foo)
-# > bar
-```
-
-if you want to override its default value. You can do that in your `config.yaml`:
-```yaml
-VectorStoreConfig: &VectorStoreConfig
- # ... rest of the VectorStoreConfig ...
- foo: baz
-```
-
-```python
-print(rag.config.vector_store.foo)
-# > baz
+To deep dive into under the hood, take a look at the documentation:
+```shell
+mkdocs serve
```
-
-### Using `RagConfig` directly
-
-TODO: Add usage example here
\ No newline at end of file
+Then go to http://localhost:8000/
diff --git a/docs/cookbook.md b/docs/cookbook.md
new file mode 100644
index 0000000..23f493e
--- /dev/null
+++ b/docs/cookbook.md
@@ -0,0 +1 @@
+Here you will find a repository of configurations that proved to work.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..f64670e
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,62 @@
+# skaff-rag-accelerator
+
+This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds)
+
+## Features
+
+- A configurable RAG setup based around Langchain
+- `RAG` and `RagConfig` python classes to help you set things up
+- A REST API based on FastAPI to provide easy access to the RAG as a web backend
+- A demo Streamlit to serve as a basic working frontend (not production grade)
+- A document loader for the RAG
+- User authentication (unsecure for now, but usable for conversation history)
+- User feedback collection
+- Streamed responses
+
+## Quickstart
+
+In a fresh env:
+```shell
+pip install -r requirements.txt
+```
+
+You will need to set some env vars, either in a .env file at the project root, or just by exporting them like so:
+```shell
+export OPENAI_API_KEY="xxx" # API key used to query the LLM
+export EMBEDDING_API_KEY="xxx" # API key used to query the embedding model
+export DATABASE_URL="sqlite:///$(pwd)/database/db.sqlite3" # For local developement only. You will need a real, cloud-based SQL database URL for prod.
+```
+
+Start the backend server locally
+```shell
+uvicorn backend.main:app
+```
+
+Start the frontend demo
+```shell
+streamlit run frontend/app.py
+```
+
+You should then be able to login and chat to the bot:
+![](login_and_chat.gif)
+
+
+## Architecture
+
+### The `frontend`, the `backend`, and the `database`
+
+The whole goal of this repo is to decouple the "computing and LLM querying" part from the "rendering a user interface" part. We do this with a typical 3-tier architecture.
+
+![](3t_architecture.png)
+
+- The [frontend](frontend) is the end user facing part. It reches out to the backend **ONLY** through the REST API. We provide a frontend demo here for convenience, but ultimately it could live in a completely different repo, and be written in a completely different language.
+- The [backend](backend) provides a REST API to abstract RAG functionalities. It handles calls to LLMs, tracks conversations and users, handles the state management using a db, and much more. To get the gist of the backend, look at the of the API: http://0.0.0.0:8000/docs
+- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8)
+
+The structure of the repo mirrors this architecture.
+
+### The RAG
+
+![](rag_architecture.png)
+
+In the `backend` folder of the repository, you will find a `rag_components` directory that implements this architecture.
diff --git a/docs/loading_documents.md b/docs/loading_documents.md
new file mode 100644
index 0000000..16ad095
--- /dev/null
+++ b/docs/loading_documents.md
@@ -0,0 +1,29 @@
+## Loading documents
+
+The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file.
+
+```python
+from pathlib import Path
+
+from backend.rag_components.rag import RAG
+
+
+data_directory = Path("data")
+
+config_directory = Path("backend/config.yaml")
+rag = RAG(config_directory)
+
+for file in data_directory.iterdir():
+ if file.is_file():
+ rag.load_file(file)
+```
+
+If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects.
+
+**TODO: example**
+
+## Document indexing
+
+The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`.
+
+[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing)
\ No newline at end of file
diff --git a/docs/rag_architecture.png b/docs/rag_architecture.png
new file mode 100644
index 0000000..d5a9596
Binary files /dev/null and b/docs/rag_architecture.png differ
diff --git a/docs/rag_config.md b/docs/rag_config.md
new file mode 100644
index 0000000..12480c5
--- /dev/null
+++ b/docs/rag_config.md
@@ -0,0 +1,109 @@
+## Architecture
+
+The overall config management works like this:
+![](config_architecture.png)
+
+This means the best way to configure your RAG deployment is to modify the `config.yaml` file. You can find configuration examples in the [Config Cookbook](cookbook.md)
+
+This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class.
+
+## Changing the configuration
+
+In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this:
+
+```yaml
+# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file)
+LLMConfig: &LLMConfig
+
+# By default we're using a GPT model deployed on Azure.
+# You should be able to use any langchain BaseChatModel listed here:
+# https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py
+source: AzureChatOpenAI"
+
+# This is a key-value map of the parameters that will be passed to the
+# langchain chat model object when it's created.
+# Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py),
+# we input the following params:
+source_config:
+ openai_api_type: azure
+ openai_api_key: {{ OPENAI_API_KEY }}
+ openai_api_base: https://poc-genai-gpt4.openai.azure.com/
+ openai_api_version: 2023-07-01-preview
+ deployment_name: gpt4v
+
+# While the params in source_config are specific to each model,
+# temperature is implemented by all BaseChatModel classes in langchain.
+# We input it outside the source_config block
+temperature: 0.1
+
+# ... Rest of the config ...
+```
+
+Let's say you want to use a gemini-pro on Vertex AI instead. [This is the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19):
+
+```python
+class ChatVertexAI(_VertexAICommon, BaseChatModel):
+ """`Vertex AI` Chat large language models API."""
+
+ model_name: str = "chat-bison"
+ "Underlying model name."
+ examples: Optional[List[BaseMessage]] = None
+
+ ...
+```
+
+The `config.yaml` could look like this:
+```yaml
+LLMConfig: &LLMConfig
+ source: ChatVertexAI
+ source_config:
+ model_name: gemini-pro
+ temperature: 0.1
+```
+
+Now, the next time you start the API, it will call gemini-pro instead of openAI.
+
+## Extending the configuration
+
+As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use.
+
+For example, let's say you want to add the `foo` configuration parameter to your vector store configuration.
+
+First, add it to `config.py` in the part relavant to the vector store:
+
+```python
+# ...
+
+@dataclass
+class VectorStoreConfig:
+ # ... rest of the VectorStoreConfig ...
+
+ foo: str = "bar" # We add foo param, of type str, with the default value "bar"
+
+# ...
+```
+
+This parameter will now be available in your `RAG` object configuration.
+
+```python
+from pathlib import Path
+from backend.rag_components.rag import RAG
+
+config_directory = Path("backend/config.yaml")
+rag = RAG(config_directory)
+
+print(rag.config.vector_store.foo)
+# > bar
+```
+
+if you want to override its default value. You can do that in your `config.yaml`:
+```yaml
+VectorStoreConfig: &VectorStoreConfig
+ # ... rest of the VectorStoreConfig ...
+ foo: baz
+```
+
+```python
+print(rag.config.vector_store.foo)
+# > baz
+```
diff --git a/docs/rag_object.md b/docs/rag_object.md
new file mode 100644
index 0000000..e6955d5
--- /dev/null
+++ b/docs/rag_object.md
@@ -0,0 +1,31 @@
+It provides a unique interface to the RAG's functionalities.
+
+## Using the `RAG` class directly:
+```python
+from pathlib import Path
+from backend.rag_components.rag import RAG
+from backend.model import Message
+
+config_directory = Path("backend/config.yaml")
+rag = RAG(config_directory)
+
+message = Message(
+ id="123",
+ timestamp="2021-06-01T12:00:00",
+ chat_id="123",
+ sender="user",
+ content="Hello, how are you?",
+)
+response = rag.generate_response(message)
+print(response)
+```
+
+## Usage in the API
+
+Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py)
+
+The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain.
+
+It also manages document loading and indexing based on configuration settings.
+
+
diff --git a/docs/recipe_databases_configs.md b/docs/recipe_databases_configs.md
new file mode 100644
index 0000000..8ba1f15
--- /dev/null
+++ b/docs/recipe_databases_configs.md
@@ -0,0 +1,42 @@
+The database config is the "easiest" as it only requires a database URL.
+
+So far, `sqlite`, `mysql`, and `postgresql` are supported.
+
+CloudSQL on GCP, RDS on AWS, or Azure Database will allow you to deploy `mysql`, and `postgresql` database instances.
+
+!!! warning
+ If using `mysql` or `postgresql` you will need to also create a database, typically named `rag`, to be able to use it.
+
+ You will also need to create a user, and get its password. Make sure there are no spacial characters in the password.
+
+
+As the database URL contains a username and password, we don't want to have it directly in the `config.yaml`.
+
+Instead, we have:
+```yaml
+DatabaseConfig: &DatabaseConfig
+ database_url: {{ DATABASE_URL }}
+```
+
+And `DATABASE_URL` is coming from an environment variable.
+
+The connection strings are formated as follows:
+
+- **SQLite:** `sqlite:///database/rag.sqlite3`
+```shell
+export DATABASE_URL=sqlite:///database/rag.sqlite3
+```
+
+- **mySQL:** `mysql://:@:/rag`
+```shell
+# The typical port is 3306 for mySQL
+export DATABASE_URL=mysql://username:abcdef12345@123.45.67.89:3306/rag
+```
+
+- **postgreSQL:** `postgresql://:@:/rag`
+```shell
+# The typical port is 5432 for postgreSQL
+export DATABASE_URL=postgresql://username:abcdef12345@123.45.67.89:5432/rag
+```
+
+When first testing the RAG locally, `sqlite` is the best since it requires no setup as the database is just a file on your machine. However, if you're working as part of a team, or looking to industrialize, you will need to deploy a `mysql`, or `postgresql` instance.
diff --git a/docs/recipe_embedding_models_configs.md b/docs/recipe_embedding_models_configs.md
new file mode 100644
index 0000000..0e607dc
--- /dev/null
+++ b/docs/recipe_embedding_models_configs.md
@@ -0,0 +1,12 @@
+## Artefact Azure-hosted embedding model
+
+```yaml
+EmbeddingModelConfig: &EmbeddingModelConfig
+ source: OpenAIEmbeddings
+ source_config:
+ openai_api_type: azure
+ openai_api_key: {{ EMBEDDING_API_KEY }}
+ openai_api_base: https://poc-openai-artefact.openai.azure.com/
+ deployment: embeddings
+ chunk_size: 500
+```
\ No newline at end of file
diff --git a/docs/recipe_llms_configs.md b/docs/recipe_llms_configs.md
new file mode 100644
index 0000000..06e8140
--- /dev/null
+++ b/docs/recipe_llms_configs.md
@@ -0,0 +1,32 @@
+## Artefact Azure-hosted GPT4-turbo
+
+```yaml
+LLMConfig: &LLMConfig
+ source: AzureChatOpenAI
+ source_config:
+ openai_api_type: azure
+ openai_api_key: {{ OPENAI_API_KEY }}
+ openai_api_base: https://genai-ds.openai.azure.com/
+ openai_api_version: 2023-07-01-preview
+ deployment_name: gpt4
+ temperature: 0.1
+```
+
+## Vertex AI gemini-pro
+
+!!! info "login to GCP"
+
+ ```shell
+ export PROJECT_ID=
+ gcloud config set project $PROJECT_ID
+ gcloud auth login
+ gcloud auth application-default login
+ ```
+
+```yaml
+LLMConfig: &LLMConfig
+ source: ChatVertexAI
+ source_config:
+ model_name: gemini-pro
+ temperature: 0.1
+```
\ No newline at end of file
diff --git a/docs/recipe_vector_stores_configs.md b/docs/recipe_vector_stores_configs.md
new file mode 100644
index 0000000..0945174
--- /dev/null
+++ b/docs/recipe_vector_stores_configs.md
@@ -0,0 +1,27 @@
+## Chroma
+
+```yaml
+VectorStoreConfig: &VectorStoreConfig
+ source: Chroma
+ source_config:
+ persist_directory: vector_database/
+ collection_metadata:
+ hnsw:space: cosine
+
+ retreiver_search_type: similarity
+ retreiver_config:
+ top_k: 20
+ score_threshold: 0.5
+
+ insertion_mode: full
+```
+
+`persist_directory`: where, locally the Chroma database will be persisted.
+
+`hnsw:space: cosine`: [distance function used. Default is `l2`.](https://docs.trychroma.com/usage-guide#changing-the-distance-function) Cosine is bounded [0; 1], making it easier to set a score threshold for retrival.
+
+`top_k`: maximum number of documents to fetch.
+
+`score_threshold`: score below which a document is deemed irrelevant and not fetched.
+
+`ìnsertion_mode`: `null` | `full` | `incremental`. [How document insertion in the vector store is handled.](https://python.langchain.com/docs/modules/data_connection/indexing#deletion-modes)
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..49a75e0
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,24 @@
+# This is only used for documentation generation in Roadie. It is not part of the vertex starter kit.
+
+site_name: GenAI RAG Accelerator
+repo_name: artefactory/skaff-rag-accelerator
+repo_url: https://github.com/artefactory/skaff-rag-accelerator
+
+plugins:
+ - techdocs-core
+ - search
+ - autorefs
+
+markdown_extensions:
+ - pymdownx.snippets:
+
+nav:
+ - Home: index.md
+ - Loading Docs in the RAG: loading_documents.md
+ - The RAG object: rag_object.md
+ - The RAGConfig object: rag_config.md
+ - Config cookbook:
+ - LLMs: recipe_llms_configs.md
+ - Vector Stores: recipe_vector_stores_configs.md
+ - Embedding Models: recipe_embedding_models_configs.md
+ - Databases: recipe_databases_configs.md
\ No newline at end of file