diff --git a/Dockerfile b/Dockerfile index f7203a3..0f0c93e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -3,7 +3,7 @@ FROM registry.access.redhat.com/ubi9/ubi-minimal:latest ENV LC_ALL=C.utf8 ENV LANG=C.utf8 -ENV PYTHONUNBUFFERED=1 +ENV PYTHONUNBUFFERED=1 ENV PYTHONIOENCODING=UTF-8 ENV PIP_NO_CACHE_DIR=1 @@ -47,8 +47,8 @@ ADD connectors ./connectors ADD resources ./resources ADD app.py . -EXPOSE 5000 +EXPOSE 8000 ENV PATH="$APP_ROOT/.venv/bin:$PATH" -CMD ["flask", "run", "--host=0.0.0.0"] +CMD ["flask", "run", "--host=0.0.0.0", "--port=8000"] diff --git a/README.md b/README.md index e8af2f2..9d75f93 100644 --- a/README.md +++ b/README.md @@ -1,112 +1,156 @@ -# 🍊 tangerine (backend) + +# 🍊 tangerine (backend) tangerine is a slim and light-weight RAG (Retieval Augmented Generated) system used to create and manage chat bot agents. Each agent is intended to answer questions related to a set of documents known as a knowledge base (KB). -It relies on 4 key components: +- [Overview](#overview) +- [Development Envionment Setup](#development-envionment-setup) + - [With Docker Compose (not supported with Mac)](#with-docker-compose-not-supported-with-mac) + - [Using huggingface text-embeddings-inference server to host embedding model (deprecated)](#using-huggingface-text-embeddings-inference-server-to-host-embedding-model-deprecated) + - [Without Docker Compose (supported with Mac)](#without-docker-compose-supported-with-mac) +- [Syncrhonizing Documents from S3](#syncrhonizing-documents-from-s3) +- [Run Tangerine Frontend Locally](#run-tangerine-frontend-locally) +- [Available API Paths](#available-api-paths) -* A vector database (PostgresQL with the pgvector extension) -* A large language model (LLM) hosted on any OpenAI-compatible API service. -* An embedding model hosted on any OpenAI-compatible API service. -* (optional) An S3 bucket that you wish to sync documentation from. +## Overview -The backend service manages: -* Creating/updating/deleting chat bot agents -* Uploading documents to be used as context to assist the agents in answering questions -* Document ingestion including cleanup/conversion, chunking, and embedding into the vector database. -* Document chunk retrieval from the vector database. -* Interfacing with the LLM to prompt it and stream responses -* (optional) Interfacing with S3 to provide continuous document sync. +tangerine relies on 4 key components: -tangerine will work with any deployed instance of PostgresQL+pgvector and can be configured to use any OpenAI-compliant API service that is hosting a large language model or embedding model. +- A vector database + - (PostgresQL with the pgvector extension) +- A large language model (LLM) + - This can be hosted by any OpenAI-compatible API service. Locally, you can use ollama +- An embedding model + - This can be hosted on any OpenAI-compatible API service. Locally, you can use ollama +- (optional) An S3 bucket that you wish to sync documentation from. -This repository provides Open Shift templates for all infrastructure (except for the model hosting service) as well as a docker compose file that allows you to spin it up locally and use [ollama](https://ollama.com/). +The backend service manages: -The accompanying frontend service is [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend) and a related plugin for [Red Hat Developer Hub](https://developers.redhat.com/rhdh/overview) can be found [here](https://github.com/RedHatInsights/backstage-plugin-ai-search-frontend) +- Management of chat bot "agents" +- Document ingestion + - Upload via the API, or sync via an s3 bucket + - Text cleanup/conversion + - Chunking and embedding into the vector database. +- Querying the vector database. +- Interfacing with the LLM to prompt it and stream responses -This project is currently used by Red Hat's Hybrid Cloud Management Engineering Productivity Team. It was born out of a hack-a-thon and is still a work in progress. You will find some areas of code well developed while others are in need of attention and some tweaks to make it production-ready are needed (with that said, the project *is* currently in good enough shape to provide a working chat bot system). +tangerine will work with any deployed instance of PostgresQL+pgvector and can be configured to use any OpenAI-compliant API service that is hosting a large language model or embedding model. In addition, the model you wish to use and the prompts to instruct them are fully customizable. -## Getting started +This repository provides Open Shift templates for all infrastructure (except for the model hosting service) as well as a docker compose file that allows you to spin it up locally. -The project can be deployed to a local development environment using ollama to host the LLM and huggingface's [text-embeddings-inference](https://github.com/huggingface/text-embeddings-inference) server to host the embedding model. +The accompanying frontend service is [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend) and a related plugin for [Red Hat Developer Hub](https://developers.redhat.com/rhdh/overview) can be found [here](https://github.com/RedHatInsights/backstage-plugin-ai-search-frontend) +This project is currently used by Red Hat's Hybrid Cloud Management Engineering Productivity Team. It was born out of a hack-a-thon and is still a work in progress. You will find some areas of code well developed while others are in need of attention and some tweaks to make it production-ready are needed (with that said, the project *is* currently in good enough shape to provide a working chat bot system). -### Local Environment Setup for Linux / Intel Macs +## Development Envionment Setup -You may require further tweaks to properly make use of your GPU. Refer to the [ollama docker image documentation](https://hub.docker.com/r/ollama/ollama). +A development environment can be set up with or without docker compose. In both cases, Ollama may be able to make use of your NVIDIA or AMD GPU (see more information about GPU support [here](https://github.com/ollama/ollama/blob/main/docs/gpu.md). On a Mac, Ollama must be run as a standalone application outside of Docker containers since Docker Desktop does not support GPUs. -1. Make sure [git-lfs](https://git-lfs.com/) is installed: +### With Docker Compose (not supported with Mac) - * Fedora: `sudo dnf install git-lfs` - * MacOS: `brew install git-lfs` +The docker compose file offers an easy way to spin up all components. [ollama](https://ollama.com) is used to host the LLM and embedding model. For utilization of your GPU, refer to the comments in the compose file to see which configurations to uncomment on the 'ollama' container. - Then, activate it globally with: +1. Create the directory which will house the local environment data: ```text - git lfs install + mkdir data ``` -2. Create the directory which will house the local environment data: +2. Invoke docker compose (postgres data will persist in `data/postgres`): ```text - mkdir data + docker compose up --build ``` -3. Create a directory to house the embedding model and download the `snowflake-arctic-embed-m-long` model: +3. Pull the mistral LLM and nomic embedding model (data will persist in `data/ollama`): ```text - mkdir data/embeddings - git clone https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long \ - data/embeddings/snowflake-arctic-embed-m-long + docker exec tangerine-ollama ollama pull mistral + docker exec tangerine-ollama ollama pull nomic-embed-text ``` -4. Invoke docker compose (postgres data will persist in `data/postgres`): +4. Access the API on port `8000` + + ```sh + curl -XGET 127.0.0.1:8000/api/agents + { + "data": [] + } + ``` + +5. (optional) Run the [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend): ```text + git clone git@github.com:RedHatInsights/tangerine-frontend.git + cd tangerine-frontend docker compose up --build ``` -5. Pull the mistral LLM (data will persist in `data/ollama`): + You can now access the frontend at `http://localhost:3000` + + +#### Using huggingface text-embeddings-inference server to host embedding model (deprecated) + +ollama previously did not have an OpenAI compatible API path for interacting with an embedding models (i.e. `/v1/embeddings`). We previously used huggingface's [text-embeddings-inference](https://github.com/huggingface/text-embeddings-inference) server to host the embedding model. If you wish +to use this to test different embedding models that are not supported by ollama, follow these steps: + +1. Make sure [git-lfs](https://git-lfs.com/) is installed: + + - Fedora: `sudo dnf install git-lfs` + - MacOS: `brew install git-lfs` + + Then, activate it globally with: ```text - docker exec tangerine-ollama ollama pull mistral + git lfs install ``` -6. The API can now be accessed on `http://localhost:5000` +2. Create a directory in the 'data' folder to house the embedding model and download the model, for example to use `nomic-embed-text-v1.5`: + + ```text + mkdir data/embeddings + git clone https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 \ + data/embeddings/nomic-embed-text + ``` +3. Search for `uncomment to use huggingface text-embeddings-inference` in [./docker-compose.yml](docker-compose.yml) and uncomment all relevant lines -### Local Environment Setup for Apple Silicon Macs +### Without Docker Compose (supported with Mac) -Some of the images used in the `docker-compose.yml` are unsupported on Apple silicon. In order to develop on those systems you will need to start some of the processes manually. 1. You'll need to have the following installed and working before proceeding: - * brew - * pipenv - * pyenv - * docker or podman + - `pipenv` + - `pyenv` + - `docker` or `podman` + - (on Mac) `brew` 2. Install ollama - ```text - brew install ollama - ``` + - visit the [ollama download page](https://ollama.com/download) -2. Start ollama + - (on Mac) you can use brew: - ```sh + ```text + brew install ollama + ``` + +3. Start ollama + + ```text ollama serve ``` -3. Pull the language and embedding models +4. Pull the language and embedding models ```text ollama pull mistral ollama pull nomic-embed-text ``` -4. Install the C API for Postgres (libpq) +5. (on Mac) install the C API for Postgres (libpq) ```sh brew install libpq @@ -120,10 +164,10 @@ Some of the images used in the `docker-compose.yml` are unsupported on Apple sil export CPPFLAGS="-I/opt/homebrew/opt/libpq/include" ``` -5. Start the vector database +6. Start the vector database ```text - docker run \ + docker run -d \ -e POSTGRES_PASSWORD="citrus" \ -e POSTGRES_USER="citrus" \ -e POSTGRES_DB="citrus" \ @@ -132,23 +176,20 @@ Some of the images used in the `docker-compose.yml` are unsupported on Apple sil pgvector/pgvector:pg16 ``` -6. Prepare your python virtual environment: +7. Prepare your python virtual environment: ```sh pipenv install pipenv shell ``` -7. Start Tangerine Backend - - > [!NOTE] - > The default tangerine port, 5000, is already claimed by Bonjour on Macs, so we need to use a different port instead. +8. Start Tangerine Backend ```sh - flask run --host=127.0.0.1 --port=8000 + flask run ``` - You can now communicate with the API on port `8000` +9. Access the API on port `8000` ```sh curl -XGET 127.0.0.1:8000/api/agents @@ -157,20 +198,7 @@ Some of the images used in the `docker-compose.yml` are unsupported on Apple sil } ``` -### Available API Paths - -| Path | Method | Description | -| ---------------------------------- | -------- | -------------------------- | -| `/api/agents` | `GET` | Get a list of all agents | -| `/api/agents` | `POST` | Create a new agent | -| `/api/agents/` | `GET` | Get an agent | -| `/api/agents/` | `PUT` | Update an agent | -| `/api/agents/` | `DELETE` | Delete an agent | -| `/api/agents//chat` | `POST` | Chat with an agent | -| `/api/agents//documents` | `POST` | Agent document uploads | -| `/api/agents//documents` | `DELETE` | Delete agent documents | -| `/api/agentDefaults` | `GET` | Get agent default settings | -| `/ping` +10. (optional) Follow steps to start the [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend) ## Syncrhonizing Documents from S3 @@ -188,10 +216,34 @@ To do so you'll need to do the following: export BUCKET="mybucket" ``` + If using docker compose, store these environment variables in `.env`: + + ```sh + echo 'AWS_ACCESS_KEY_ID=MYKEYID' >> .env + echo 'AWS_DEFAULT_REGION=us-east-1' >> .env + echo 'AWS_ENDPOINT_URL_S3=https://s3.us-east-1.amazonaws.com' >> .env + echo 'AWS_SECRET_ACCESS_KEY=MYACCESSKEY' >> .env + echo 'BUCKET=mybucket' >> .env + ``` + 2. Create an `s3.yaml` file that describes your agents and the documents they should ingest. See [s3-example.yaml](s3-example.yaml) for an example. + If using docker compose, copy this config into your container: + + ```text + docker cp s3.yaml tangerine-backend:/opt/app-root/src/s3.yaml + ``` + 3. Run the S3 sync job: + - With docker compose: + + ```text + docker exec -ti tangerine-backend flask s3sync + ``` + + - Without: + ```sh flask s3sync ``` @@ -203,3 +255,18 @@ The OpenShift templates contain a CronJob configuration that is used to run this ## Run Tangerine Frontend Locally The API can be used to create/manage/update agents, upload documents, and to chat with each agent. However, the frontend provides a simpler interface to manage the service with. To run the UI in a development environment, see [tangerine-frontend](https://github.com/RedHatInsights/tangerine-frontend) + +## Available API Paths + +| Path | Method | Description | +| ---------------------------------- | -------- | -------------------------- | +| `/api/agents` | `GET` | Get a list of all agents | +| `/api/agents` | `POST` | Create a new agent | +| `/api/agents/` | `GET` | Get an agent | +| `/api/agents/` | `PUT` | Update an agent | +| `/api/agents/` | `DELETE` | Delete an agent | +| `/api/agents//chat` | `POST` | Chat with an agent | +| `/api/agents//documents` | `POST` | Agent document uploads | +| `/api/agents//documents` | `DELETE` | Delete agent documents | +| `/api/agentDefaults` | `GET` | Get agent default settings | +| `/ping` | `GET` | Health check endpoint | diff --git a/docker-compose.yml b/docker-compose.yml index f12b130..bbc3de4 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -18,27 +18,37 @@ services: timeout: 5s retries: 5 - embeddings-service: - container_name: tangerine-text-embeddings-inference - image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 - ports: - - "8001:8001" - volumes: - - ./data/embeddings:/data - environment: - - MODEL_ID=/data/snowflake-arctic-embed-m-long - - PORT=8001 - healthcheck: - test: "bash -c 'cat < /dev/null > /dev/tcp/localhost/8001'" - start_period: 10s - start_interval: 1s - interval: 10s - timeout: 5s - retries: 5 + ollama: + # --- NVIDIA GPU --- + # read https://hub.docker.com/r/ollama/ollama, install toolkit, and uncomment below + # ------------------ + #resources: + # reservations: + # devices: + # - driver: nvidia + # count:all + # capabilities: [gpu] + + # --- AMD GPU --- + # uncomment below + # --------------- + #devices: + # - /dev/kfd:/dev/kfd + # - /dev/dri:/dev/dri + + # --- AMD GPU --- + # (optional) uncomment below if needed for Radeon 780M or others + # more info at: https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux + # --------------- + #environment: + # - HSA_OVERRIDE_GFX_VERSION=11.0.2 + + # --- AMD GPU --- + # change below to 'ollama/ollama:0.5.5-rocm' + # --------------- + image: ollama/ollama:0.5.5 - llm-service: container_name: tangerine-ollama - image: ollama/ollama:0.3.0 ports: - "11434:11434" volumes: @@ -51,26 +61,49 @@ services: timeout: 5s retries: 5 + # uncomment to use huggingface text-embeddings-inference + #embeddings-service: + # container_name: tangerine-text-embeddings-inference + # image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 + # ports: + # - "8001:8001" + # volumes: + # - ./data/embeddings:/data + # environment: + # - MODEL_ID=/data/nomic-embed-text + # - PORT=8001 + # healthcheck: + # test: "bash -c 'cat < /dev/null > /dev/tcp/localhost/8001'" + # start_period: 10s + # start_interval: 1s + # interval: 10s + # timeout: 5s + # retries: 5 + tangerine-backend: container_name: tangerine-backend build: context: . dockerfile: Dockerfile ports: - - "5000:5000" + - "8000:8000" environment: - DB_HOST=postgres - DB_USERNAME=citrus - DB_PASSWORD=citrus - DB_NAME=citrus - LLM_BASE_URL=http://tangerine-ollama:11434/v1 - - EMBED_BASE_URL=http://tangerine-text-embeddings-inference:8001/v1 + - EMBED_BASE_URL=http://tangerine-ollama:11434/v1 + # uncomment below to use huggingface text-embeddings-inference + #- EMBED_BASE_URL=http://tangerine-text-embeddings-inference:8001/v1 - LLM_MODEL_NAME=mistral - - EMBED_MODEL_NAME=snowflake-arctic-embed-m-long - - EMBED_QUERY_PREFIX="Represent this sentence for searching relevant passages" - env_file: ".env" # store aws s3 env vars here + - EMBED_MODEL_NAME=nomic-embed-text + - EMBED_QUERY_PREFIX="search_query" + - EMBED_DOCUMENT_PREFUX="search_document" + # for s3sync, store AWS env vars needed to access your s3 bucket here + env_file: ".env" healthcheck: - test: curl --fail "http://localhost:5000/ping" || exit 1 + test: curl --fail "http://localhost:8000/ping" || exit 1 start_period: 10s start_interval: 1s interval: 10s @@ -79,10 +112,11 @@ services: depends_on: postgres: condition: service_healthy - llm-service: - condition: service_healthy - embeddings-service: + ollama: condition: service_healthy + # uncomment below to use huggingface text-embeddings-inference + #embeddings-service: + # condition: service_healthy networks: default: @@ -90,4 +124,3 @@ networks: driver: bridge driver_opts: com.docker.network.bridge.host_binding_ipv4: "127.0.0.1" - diff --git a/file_upload_cli.py b/file_upload_cli.py index 843fa73..af96ff7 100644 --- a/file_upload_cli.py +++ b/file_upload_cli.py @@ -54,7 +54,7 @@ def upload_files(source, directory_path, url, agent_id, html, bearer_token): ) parser.add_argument("--html", help="Include html docs", default=False, action="store_true") parser.add_argument("--bearer-token", type=str, help="Authorization bearer token") - parser.add_argument("--agent-id", type=int, help="Agent ID of the tangerine agent.") + parser.add_argument("--agent-id", type=int, help="agent ID of the tangerine agent.") parser.add_argument( "--source", type=str, diff --git a/openshift/backend.template.yaml b/openshift/backend.template.yaml index b7da92b..e71eac4 100644 --- a/openshift/backend.template.yaml +++ b/openshift/backend.template.yaml @@ -79,7 +79,7 @@ objects: readinessProbe: httpGet: path: /ping - port: 5000 + port: 8000 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 @@ -88,7 +88,7 @@ objects: livenessProbe: httpGet: path: /ping - port: 5000 + port: 8000 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 1 @@ -146,7 +146,7 @@ objects: capabilities: {} privileged: false ports: - - containerPort: 5000 + - containerPort: 8000 protocol: TCP imagePullPolicy: IfNotPresent terminationMessagePolicy: File @@ -167,5 +167,5 @@ objects: ports: - name: tangerine-backend protocol: TCP - port: 5000 - targetPort: 5000 + port: 8000 + targetPort: 8000 diff --git a/resources/agent.py b/resources/agent.py index 0d5c80e..b572713 100644 --- a/resources/agent.py +++ b/resources/agent.py @@ -29,10 +29,15 @@ def get(self): return {"data": [agent.to_dict() for agent in all_agents]}, 200 def post(self): + name = request.json.get("name") + description = request.json.get("description") + if not name: + return {"message": "agent 'name' required"}, 400 + if not description: + return {"message": "agent 'description' required"}, 400 + try: - agent = Agent.create( - request.form["name"], request.form["description"], request.form.get("system_prompt") - ) + agent = Agent.create(name, description, request.json.get("system_prompt")) except Exception: log.exception("error creating agent") return {"message": "error creating agent"}, 500 @@ -44,14 +49,14 @@ class AgentApi(Resource): def get(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 return agent.to_dict(), 200 def put(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 data = request.get_json() # ignore 'id' or 'filenames' if provided in JSON payload @@ -59,23 +64,23 @@ def put(self, id): data.pop("id", None) agent.update(**data) - return {"message": "Agent updated successfully"}, 200 + return {"message": "agent updated successfully"}, 200 def delete(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 agent.delete() vector_db.delete_document_chunks({"agent_id": agent.id}) - return {"message": "Agent deleted successfully"}, 200 + return {"message": "agent deleted successfully"}, 200 class AgentDocuments(Resource): def post(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 # Check if the post request has the file part if "file" not in request.files: @@ -107,7 +112,7 @@ def generate_progress(): def delete(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 source = request.json.get("source") full_path = request.json.get("full_path") @@ -139,7 +144,7 @@ class AgentChatApi(Resource): def post(self, id): agent = Agent.get(id) if not agent: - return {"message": "Agent not found"}, 404 + return {"message": "agent not found"}, 404 query = request.json.get("query") stream = request.json.get("stream") == "true"