diff --git a/README.md b/README.md index 2279537..beefb9c 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,5 @@ -
- # skaff-rag-accelerator -
This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds) @@ -45,124 +42,6 @@ You should than be able to login and chat to the bot: ![](docs/login_and_chat.gif) -## Loading documents - -The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file. - -Create `backend/load_my_docs.py`: -```python -from pathlib import Path - -from backend.rag_components.rag import RAG - - -data_directory = Path("data") - -config_directory = Path("backend/config.yaml") -rag = RAG(config_directory) - -for file in data_directory.iterdir(): - if file.is_file(): - rag.load_file(file) -``` - -If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects. - -**TODO: example** - -#### Document indexing - -The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`. - -[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing) - -## Configuring the RAG - -### The `RAG` object - -It provides a unique interface to the RAG's functionalities. - -Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py) - -The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain. - -It also manages document loading and indexing based on configuration settings. - - -Using the `RAG` class directly: -```python -from pathlib import Path -from backend.rag_components.rag import RAG -from backend.model import Message - -config_directory = Path("backend/config.yaml") -rag = RAG(config_directory) - -message = Message( - id="123", - timestamp="2021-06-01T12:00:00", - chat_id="123", - sender="user", - content="Hello, how are you?", -) -response = rag.generate_response(message) -print(response) -``` - -[Go to the code.](backend/rag_components/rag.py) - -### Managing the configuration (`RAGConfig`) - -The overall config management works like this: -![](docs/config_architecture.png) - -This means the best way to configure your RAG deployment is to modify the config.yaml file. - -This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class. - -In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this: -```yaml -# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file) -LLMConfig: &LLMConfig - - # By default we're using a GPT model deployed on Azure. You should be able to change this to any langchain BaseChatModel here: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py - source: "AzureChatOpenAI" - - # This is a key-value map of the parameters that will be passed to the langchain chat model object when it's created. Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py), we input the following params: - source_config: - openai_api_type: "azure" - openai_api_key: {{ OPENAI_API_KEY }} - openai_api_base: "https://poc-genai-gpt4.openai.azure.com/" - openai_api_version: "2023-07-01-preview" - deployment_name: "gpt4v" - - # While the params in source_config are specific to each model, temperature is implemented by all BaseChatModel classes in langchain. - temperature: 0.1 - -# ... Rest of the config ... -``` - -Let's say we want to use a Vertex LLM instead. [Looking at the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19), we find this: - -```python -class ChatVertexAI(_VertexAICommon, BaseChatModel): - """`Vertex AI` Chat large language models API.""" - - model_name: str = "chat-bison" - "Underlying model name." - examples: Optional[List[BaseMessage]] = None -``` - -Updated `config.yaml` could look like this: -```yaml -LLMConfig: &LLMConfig - source: "ChatVertexAI" - source_config: - model_name: gemini-pro - temperature: 0.1 -``` - - ## Architecture ### The `frontend`, the `backend`, and the `database` @@ -176,53 +55,10 @@ The whole goal of this repo is to decouple the "computing and LLM querying" part - The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8) -## Going further - -### Extending the configuration - -As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use. - -For example, let's say you want to add the `foo` configuration parameter to your vector store configuration. - -First, add it to `config.py` in the part relavant to the vector store: +## Documentation -```python -# ... - -@dataclass -class VectorStoreConfig: - # ... rest of the VectorStoreConfig ... - - foo: str = "bar" # We add foo param, of type str, with the default value "bar" - -# ... -``` - -This parameter will now be available in your `RAG` object configuration. - -```python -from pathlib import Path -from backend.rag_components.rag import RAG - -config_directory = Path("backend/config.yaml") -rag = RAG(config_directory) - -print(rag.config.vector_store.foo) -# > bar -``` - -if you want to override its default value. You can do that in your `config.yaml`: -```yaml -VectorStoreConfig: &VectorStoreConfig - # ... rest of the VectorStoreConfig ... - foo: baz -``` - -```python -print(rag.config.vector_store.foo) -# > baz +To deep dive into under the hood, take a look at the documentation: +```shell +mkdocs serve ``` - -### Using `RagConfig` directly - -TODO: Add usage example here \ No newline at end of file +Then go to http://localhost:8000/ diff --git a/docs/cookbook.md b/docs/cookbook.md new file mode 100644 index 0000000..23f493e --- /dev/null +++ b/docs/cookbook.md @@ -0,0 +1 @@ +Here you will find a repository of configurations that proved to work. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..f64670e --- /dev/null +++ b/docs/index.md @@ -0,0 +1,62 @@ +# skaff-rag-accelerator + +This is a starter kit to deploy a modularizable RAG locally or on the cloud (or across multiple clouds) + +## Features + +- A configurable RAG setup based around Langchain +- `RAG` and `RagConfig` python classes to help you set things up +- A REST API based on FastAPI to provide easy access to the RAG as a web backend +- A demo Streamlit to serve as a basic working frontend (not production grade) +- A document loader for the RAG +- User authentication (unsecure for now, but usable for conversation history) +- User feedback collection +- Streamed responses + +## Quickstart + +In a fresh env: +```shell +pip install -r requirements.txt +``` + +You will need to set some env vars, either in a .env file at the project root, or just by exporting them like so: +```shell +export OPENAI_API_KEY="xxx" # API key used to query the LLM +export EMBEDDING_API_KEY="xxx" # API key used to query the embedding model +export DATABASE_URL="sqlite:///$(pwd)/database/db.sqlite3" # For local developement only. You will need a real, cloud-based SQL database URL for prod. +``` + +Start the backend server locally +```shell +uvicorn backend.main:app +``` + +Start the frontend demo +```shell +streamlit run frontend/app.py +``` + +You should then be able to login and chat to the bot: +![](login_and_chat.gif) + + +## Architecture + +### The `frontend`, the `backend`, and the `database` + +The whole goal of this repo is to decouple the "computing and LLM querying" part from the "rendering a user interface" part. We do this with a typical 3-tier architecture. + +![](3t_architecture.png) + +- The [frontend](frontend) is the end user facing part. It reches out to the backend **ONLY** through the REST API. We provide a frontend demo here for convenience, but ultimately it could live in a completely different repo, and be written in a completely different language. +- The [backend](backend) provides a REST API to abstract RAG functionalities. It handles calls to LLMs, tracks conversations and users, handles the state management using a db, and much more. To get the gist of the backend, look at the of the API: http://0.0.0.0:8000/docs +- The [database](database) is only accessed by the backend and persists the state of the RAG application. [Explore the data model here.](https://dbdiagram.io/d/RAGAAS-63dbdcc6296d97641d7e07c8) + +The structure of the repo mirrors this architecture. + +### The RAG + +![](rag_architecture.png) + +In the `backend` folder of the repository, you will find a `rag_components` directory that implements this architecture. diff --git a/docs/loading_documents.md b/docs/loading_documents.md new file mode 100644 index 0000000..16ad095 --- /dev/null +++ b/docs/loading_documents.md @@ -0,0 +1,29 @@ +## Loading documents + +The easiest but least flexible way to load documents to your RAG is to use the `RAG.load_file` method. It will semi-intellignetly try to pick the best Langchain loader and parameters for your file. + +```python +from pathlib import Path + +from backend.rag_components.rag import RAG + + +data_directory = Path("data") + +config_directory = Path("backend/config.yaml") +rag = RAG(config_directory) + +for file in data_directory.iterdir(): + if file.is_file(): + rag.load_file(file) +``` + +If you want more flexibility, you can use the `rag.load_documents` method which expects a list of `langchain.docstore.document` objects. + +**TODO: example** + +## Document indexing + +The document loader maintains an index of the loaded documents. You can change it in the configuration of your RAG at `vector_store.insertion_mode` to `None`, `incremental`, or `full`. + +[Details of what that means here.](https://python.langchain.com/docs/modules/data_connection/indexing) \ No newline at end of file diff --git a/docs/rag_architecture.png b/docs/rag_architecture.png new file mode 100644 index 0000000..d5a9596 Binary files /dev/null and b/docs/rag_architecture.png differ diff --git a/docs/rag_config.md b/docs/rag_config.md new file mode 100644 index 0000000..12480c5 --- /dev/null +++ b/docs/rag_config.md @@ -0,0 +1,109 @@ +## Architecture + +The overall config management works like this: +![](config_architecture.png) + +This means the best way to configure your RAG deployment is to modify the `config.yaml` file. You can find configuration examples in the [Config Cookbook](cookbook.md) + +This file is then loaded to instanciate a `RAGConfig` object which is used by the `RAG` class. + +## Changing the configuration + +In the default configuration template ([`backend/config.yaml`](backend/config.yaml)) you will find this: + +```yaml +# This is the LLM configuration (&LLMConfig is a yaml anchor to reference this block further down in the conf file) +LLMConfig: &LLMConfig + +# By default we're using a GPT model deployed on Azure. +# You should be able to use any langchain BaseChatModel listed here: +# https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/__init__.py +source: AzureChatOpenAI" + +# This is a key-value map of the parameters that will be passed to the +# langchain chat model object when it's created. +# Looking at the AzureChatOpenAI source code (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azure_openai.py), +# we input the following params: +source_config: + openai_api_type: azure + openai_api_key: {{ OPENAI_API_KEY }} + openai_api_base: https://poc-genai-gpt4.openai.azure.com/ + openai_api_version: 2023-07-01-preview + deployment_name: gpt4v + +# While the params in source_config are specific to each model, +# temperature is implemented by all BaseChatModel classes in langchain. +# We input it outside the source_config block +temperature: 0.1 + +# ... Rest of the config ... +``` + +Let's say you want to use a gemini-pro on Vertex AI instead. [This is the source code of this model in langchain](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/vertexai.py#L206C7-L206C19): + +```python +class ChatVertexAI(_VertexAICommon, BaseChatModel): + """`Vertex AI` Chat large language models API.""" + + model_name: str = "chat-bison" + "Underlying model name." + examples: Optional[List[BaseMessage]] = None + + ... +``` + +The `config.yaml` could look like this: +```yaml +LLMConfig: &LLMConfig + source: ChatVertexAI + source_config: + model_name: gemini-pro + temperature: 0.1 +``` + +Now, the next time you start the API, it will call gemini-pro instead of openAI. + +## Extending the configuration + +As you tune this starter kit to your needs, you may need to add specific configuration that your RAG will use. + +For example, let's say you want to add the `foo` configuration parameter to your vector store configuration. + +First, add it to `config.py` in the part relavant to the vector store: + +```python +# ... + +@dataclass +class VectorStoreConfig: + # ... rest of the VectorStoreConfig ... + + foo: str = "bar" # We add foo param, of type str, with the default value "bar" + +# ... +``` + +This parameter will now be available in your `RAG` object configuration. + +```python +from pathlib import Path +from backend.rag_components.rag import RAG + +config_directory = Path("backend/config.yaml") +rag = RAG(config_directory) + +print(rag.config.vector_store.foo) +# > bar +``` + +if you want to override its default value. You can do that in your `config.yaml`: +```yaml +VectorStoreConfig: &VectorStoreConfig + # ... rest of the VectorStoreConfig ... + foo: baz +``` + +```python +print(rag.config.vector_store.foo) +# > baz +``` diff --git a/docs/rag_object.md b/docs/rag_object.md new file mode 100644 index 0000000..e6955d5 --- /dev/null +++ b/docs/rag_object.md @@ -0,0 +1,31 @@ +It provides a unique interface to the RAG's functionalities. + +## Using the `RAG` class directly: +```python +from pathlib import Path +from backend.rag_components.rag import RAG +from backend.model import Message + +config_directory = Path("backend/config.yaml") +rag = RAG(config_directory) + +message = Message( + id="123", + timestamp="2021-06-01T12:00:00", + chat_id="123", + sender="user", + content="Hello, how are you?", +) +response = rag.generate_response(message) +print(response) +``` + +## Usage in the API + +Out of the box, A RAG object is created from your configuration and used by the `/chat/{chat_id}/user_message` endpoint in [`backend/main.py`](backend/main.py) + +The RAG class initializes key components (language model, embeddings, vector store), and generates responses to user messages using an answer chain. + +It also manages document loading and indexing based on configuration settings. + + diff --git a/docs/recipe_databases_configs.md b/docs/recipe_databases_configs.md new file mode 100644 index 0000000..8ba1f15 --- /dev/null +++ b/docs/recipe_databases_configs.md @@ -0,0 +1,42 @@ +The database config is the "easiest" as it only requires a database URL. + +So far, `sqlite`, `mysql`, and `postgresql` are supported. + +CloudSQL on GCP, RDS on AWS, or Azure Database will allow you to deploy `mysql`, and `postgresql` database instances. + +!!! warning + If using `mysql` or `postgresql` you will need to also create a database, typically named `rag`, to be able to use it. + + You will also need to create a user, and get its password. Make sure there are no spacial characters in the password. + + +As the database URL contains a username and password, we don't want to have it directly in the `config.yaml`. + +Instead, we have: +```yaml +DatabaseConfig: &DatabaseConfig + database_url: {{ DATABASE_URL }} +``` + +And `DATABASE_URL` is coming from an environment variable. + +The connection strings are formated as follows: + +- **SQLite:** `sqlite:///database/rag.sqlite3` +```shell +export DATABASE_URL=sqlite:///database/rag.sqlite3 +``` + +- **mySQL:** `mysql://:@:/rag` +```shell +# The typical port is 3306 for mySQL +export DATABASE_URL=mysql://username:abcdef12345@123.45.67.89:3306/rag +``` + +- **postgreSQL:** `postgresql://:@:/rag` +```shell +# The typical port is 5432 for postgreSQL +export DATABASE_URL=postgresql://username:abcdef12345@123.45.67.89:5432/rag +``` + +When first testing the RAG locally, `sqlite` is the best since it requires no setup as the database is just a file on your machine. However, if you're working as part of a team, or looking to industrialize, you will need to deploy a `mysql`, or `postgresql` instance. diff --git a/docs/recipe_embedding_models_configs.md b/docs/recipe_embedding_models_configs.md new file mode 100644 index 0000000..0e607dc --- /dev/null +++ b/docs/recipe_embedding_models_configs.md @@ -0,0 +1,12 @@ +## Artefact Azure-hosted embedding model + +```yaml +EmbeddingModelConfig: &EmbeddingModelConfig + source: OpenAIEmbeddings + source_config: + openai_api_type: azure + openai_api_key: {{ EMBEDDING_API_KEY }} + openai_api_base: https://poc-openai-artefact.openai.azure.com/ + deployment: embeddings + chunk_size: 500 +``` \ No newline at end of file diff --git a/docs/recipe_llms_configs.md b/docs/recipe_llms_configs.md new file mode 100644 index 0000000..06e8140 --- /dev/null +++ b/docs/recipe_llms_configs.md @@ -0,0 +1,32 @@ +## Artefact Azure-hosted GPT4-turbo + +```yaml +LLMConfig: &LLMConfig + source: AzureChatOpenAI + source_config: + openai_api_type: azure + openai_api_key: {{ OPENAI_API_KEY }} + openai_api_base: https://genai-ds.openai.azure.com/ + openai_api_version: 2023-07-01-preview + deployment_name: gpt4 + temperature: 0.1 +``` + +## Vertex AI gemini-pro + +!!! info "login to GCP" + + ```shell + export PROJECT_ID= + gcloud config set project $PROJECT_ID + gcloud auth login + gcloud auth application-default login + ``` + +```yaml +LLMConfig: &LLMConfig + source: ChatVertexAI + source_config: + model_name: gemini-pro + temperature: 0.1 +``` \ No newline at end of file diff --git a/docs/recipe_vector_stores_configs.md b/docs/recipe_vector_stores_configs.md new file mode 100644 index 0000000..0945174 --- /dev/null +++ b/docs/recipe_vector_stores_configs.md @@ -0,0 +1,27 @@ +## Chroma + +```yaml +VectorStoreConfig: &VectorStoreConfig + source: Chroma + source_config: + persist_directory: vector_database/ + collection_metadata: + hnsw:space: cosine + + retreiver_search_type: similarity + retreiver_config: + top_k: 20 + score_threshold: 0.5 + + insertion_mode: full +``` + +`persist_directory`: where, locally the Chroma database will be persisted. + +`hnsw:space: cosine`: [distance function used. Default is `l2`.](https://docs.trychroma.com/usage-guide#changing-the-distance-function) Cosine is bounded [0; 1], making it easier to set a score threshold for retrival. + +`top_k`: maximum number of documents to fetch. + +`score_threshold`: score below which a document is deemed irrelevant and not fetched. + +`ìnsertion_mode`: `null` | `full` | `incremental`. [How document insertion in the vector store is handled.](https://python.langchain.com/docs/modules/data_connection/indexing#deletion-modes) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..49a75e0 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,24 @@ +# This is only used for documentation generation in Roadie. It is not part of the vertex starter kit. + +site_name: GenAI RAG Accelerator +repo_name: artefactory/skaff-rag-accelerator +repo_url: https://github.com/artefactory/skaff-rag-accelerator + +plugins: + - techdocs-core + - search + - autorefs + +markdown_extensions: + - pymdownx.snippets: + +nav: + - Home: index.md + - Loading Docs in the RAG: loading_documents.md + - The RAG object: rag_object.md + - The RAGConfig object: rag_config.md + - Config cookbook: + - LLMs: recipe_llms_configs.md + - Vector Stores: recipe_vector_stores_configs.md + - Embedding Models: recipe_embedding_models_configs.md + - Databases: recipe_databases_configs.md \ No newline at end of file