Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip(deploy): Sample docker-compose files for different scenarios #552

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Julep Deployment Configurations

This directory contains various Docker Compose configurations for deploying Julep in different scenarios. Each configuration is tailored to specific use cases and deployment requirements.

## Available Configurations

### 1. Single-Tenant Mode with CPU Embeddings & Managed DB
- **File:** `docker-compose.single-tenant-cpu-managed.yml`
- **Description:** Deploys Julep in single-tenant mode using CPU-based embedding services with managed Temporal and LiteLLM databases.
- **Suitable for:** Development, testing, or small-scale deployments prioritizing simplicity and cost-effectiveness.

### 2. Multi-Tenant Mode with CPU Embeddings & Managed DB
- **File:** `docker-compose.multi-tenant-cpu-managed.yml`
- **Description:** Deploys Julep in multi-tenant mode using CPU-based embedding services with managed Temporal and LiteLLM databases.
- **Suitable for:** Multi-tenant environments requiring manageable complexity with efficient resource usage.

### 3. Single-Tenant Mode with GPU Embeddings & Managed DB
- **File:** `docker-compose.single-tenant-gpu-managed.yml`
- **Description:** Deploys Julep in single-tenant mode using GPU-based embedding services with managed Temporal and LiteLLM databases.
- **Suitable for:** Single-tenant deployments needing enhanced performance through GPU-powered embeddings.

### 4. Multi-Tenant Mode with GPU Embeddings & Managed DB
- **File:** `docker-compose.multi-tenant-gpu-managed.yml`
- **Description:** Deploys Julep in multi-tenant mode using GPU-based embedding services with managed Temporal and LiteLLM databases.
- **Suitable for:** Large-scale multi-tenant deployments demanding high-performance embeddings.

### 5. Single-Tenant Mode with CPU Embeddings & Self-Hosted DB
- **File:** `docker-compose.single-tenant-cpu-selfhosted.yml`
- **Description:** Deploys Julep in single-tenant mode using CPU-based embedding services with self-hosted Temporal and LiteLLM databases.
- **Suitable for:** Deployments where controlling the database infrastructure is preferred over managed services.

### 6. Multi-Tenant Mode with CPU Embeddings & Self-Hosted DB
- **File:** `docker-compose.multi-tenant-cpu-selfhosted.yml`
- **Description:** Deploys Julep in multi-tenant mode using CPU-based embedding services with self-hosted Temporal and LiteLLM databases.
- **Suitable for:** Multi-tenant deployments with greater control over database services, ideal for organizations with specific compliance or customization needs.

### 7. Single-Tenant Mode with GPU Embeddings & Self-Hosted DB
- **File:** `docker-compose.single-tenant-gpu-selfhosted.yml`
- **Description:** Deploys Julep in single-tenant mode using GPU-based embedding services with self-hosted Temporal and LiteLLM databases.
- **Suitable for:** High-performance single-tenant deployments that require self-managed databases for specialized configurations.

### 8. Multi-Tenant Mode with GPU Embeddings & Self-Hosted DB
- **File:** `docker-compose.multi-tenant-gpu-selfhosted.yml`
- **Description:** Deploys Julep in multi-tenant mode using GPU-based embedding services with self-hosted Temporal and LiteLLM databases.
- **Suitable for:** High-performance, multi-tenant deployments that require full control over both embedding services and database infrastructure.

## Configuration Components

Each configuration file combines the following components:

- **Tenancy Mode:** Single-tenant or Multi-tenant
- **Embedding Service:** CPU-based or GPU-based
- **Database Management:** Managed or Self-hosted

## Additional Services

- **Temporal UI:** Available as an optional add-on for all configurations to provide a web-based interface for monitoring Temporal workflows.

## Choosing a Configuration

Select the configuration that best matches your deployment requirements, considering factors such as:

- Number of tenants
- Performance needs
- Infrastructure control preferences
- Scalability requirements
- Development vs. Production environment

Refer to the individual Docker Compose files for detailed service configurations and environment variable requirements.
175 changes: 175 additions & 0 deletions deploy/docker-compose.multi-tenant-cpu-managed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
version: '3.8'

services:
memory-store:
image: julepai/memory-store:${TAG}
environment:
COZO_AUTH_TOKEN: ${COZO_AUTH_TOKEN}
COZO_PORT: 9070
COZO_MNT_DIR: /data
COZO_BACKUP_DIR: /backup
volumes:
- cozo_data:/data
- cozo_backup:/backup
ports:
- "9070:9070"

gateway:
image: julepai/gateway:${TAG}
environment:
GATEWAY_PORT: 80
JWT_SHARED_KEY: ${JWT_SHARED_KEY}
AGENTS_API_URL: http://agents-api-multi-tenant:8080
AGENTS_API_KEY: ${AGENTS_API_KEY}
AGENTS_API_KEY_HEADER_NAME: Authorization
TRAEFIK_LOG_LEVEL: INFO
ports:
- "80:80"

embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
environment:
MODEL_ID: Alibaba-NLP/gte-large-en-v1.5
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
GROQ_API_KEY: ${GROQ_API_KEY}
CLOUDFLARE_API_KEY: ${CLOUDFLARE_API_KEY}
CLOUDFLARE_ACCOUNT_ID: ${CLOUDFLARE_ACCOUNT_ID}
NVIDIA_NIM_API_KEY: ${NVIDIA_NIM_API_KEY}
GITHUB_API_KEY: ${GITHUB_API_KEY}
VOYAGE_API_KEY: ${VOYAGE_API_KEY}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS}
TRUNCATE_EMBED_TEXT: True
volumes:
- ~/.cache/huggingface/hub:/data

agents-api-multi-tenant:
image: julepai/agents-api:${TAG}
environment:
AGENTS_API_KEY: ${AGENTS_API_KEY}
AGENTS_API_KEY_HEADER_NAME: Authorization
AGENTS_API_HOSTNAME: gateway
AGENTS_API_PUBLIC_PORT: 80
AGENTS_API_PROTOCOL: http
COZO_AUTH_TOKEN: ${COZO_AUTH_TOKEN}
COZO_HOST: http://memory-store:9070
DEBUG: False
EMBEDDING_MODEL_ID: Alibaba-NLP/gte-large-en-v1.5
INTEGRATION_SERVICE_URL: http://integrations:8000
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
LITELLM_URL: http://litellm:4000
SUMMARIZATION_MODEL_NAME: gpt-4-turbo
TEMPORAL_ENDPOINT: temporal:7233
TEMPORAL_NAMESPACE: default
TEMPORAL_TASK_QUEUE: julep-task-queue
TEMPORAL_WORKER_URL: temporal:7233
WORKER_URL: temporal:7233
AGENTS_API_MULTI_TENANT_MODE: true
AGENTS_API_PREFIX: "/api"
ports:
- "8080:8080"
depends_on:
memory-store:
condition: service_started
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The depends_on condition service_started is deprecated. Consider using service_healthy for better reliability. This is applicable in other files as well.

gateway:
condition: service_started

litellm:
image: ghcr.io/berriai/litellm-database:main-v1.46.6
environment:
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
DATABASE_URL: ${LITELLM_DATABASE_URL:-postgresql://${LITELLM_POSTGRES_USER:-llmproxy}:${LITELLM_POSTGRES_PASSWORD}@litellm-db:5432/${LITELLM_POSTGRES_DB:-litellm}?sslmode=prefer_ssl}
REDIS_URL: ${LITELLM_REDIS_URL:-redis://default:${LITELLM_REDIS_PASSWORD}@litellm-redis:6379}
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
GROQ_API_KEY: ${GROQ_API_KEY}
CLOUDFLARE_API_KEY: ${CLOUDFLARE_API_KEY}
CLOUDFLARE_ACCOUNT_ID: ${CLOUDFLARE_ACCOUNT_ID}
NVIDIA_NIM_API_KEY: ${NVIDIA_NIM_API_KEY}
GITHUB_API_KEY: ${GITHUB_API_KEY}
VOYAGE_API_KEY: ${VOYAGE_API_KEY}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS}
volumes:
- ./litellm-config.yaml:/app/config.yaml
- .keys:/app/.keys:ro
ports:
- "4000:4000"
depends_on:
litellm-db:
condition: service_started
litellm-redis:
condition: service_started

litellm-db:
image: postgres:16
restart: unless-stopped
volumes:
- litellm-db-data:/var/lib/postgresql/data
environment:
POSTGRES_DB: ${LITELLM_POSTGRES_DB:-litellm}
POSTGRES_USER: ${LITELLM_POSTGRES_USER:-llmproxy}
POSTGRES_PASSWORD: ${LITELLM_POSTGRES_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -d ${LITELLM_POSTGRES_DB:-litellm} -U ${LITELLM_POSTGRES_USER:-llmproxy}"]
interval: 1s
timeout: 5s
retries: 10

litellm-redis:
image: redis/redis-stack-server
restart: unless-stopped
environment:
REDIS_ARGS: --requirepass ${LITELLM_REDIS_PASSWORD}
volumes:
- litellm-redis-data:/data

temporal:
image: temporalio/auto-setup:1.25
hostname: temporal
environment:
POSTGRES_PWD: ${TEMPORAL_POSTGRES_PASSWORD}
POSTGRES_DB: ${TEMPORAL_POSTGRES_DB:-temporal}
POSTGRES_SEEDS: temporal-db
DB_HOST: temporal-db
DB_PORT: 5432
POSTGRES_USER: temporal
TEMPORAL_ADDRESS: temporal:7233
POSTGRES_TLS_ENABLED: false
POSTGRES_TLS_CA_FILE: /cert/ca.crt
SQL_TLS_ENABLED: false
SQL_CA: /cert/ca.crt
POSTGRES_TLS_DISABLE_HOST_VERIFICATION: false
VISIBILITY_DBNAME: temporal_visibility
SKIP_SCHEMA_SETUP: false
SKIP_DB_CREATE: false
DYNAMIC_CONFIG_FILE_PATH: config/dynamicconfig/temporal-postgres.yaml
DB: postgres12
LOG_LEVEL: info
volumes:
- ./scheduler/dynamicconfig:/etc/temporal/config/dynamicconfig
- ./scheduler/cert:/cert
depends_on:
temporal-db:
condition: service_started

temporal-db:
image: postgres:16
restart: unless-stopped
volumes:
- temporal-db-data:/var/lib/postgresql/data
environment:
POSTGRES_DB: ${TEMPORAL_POSTGRES_DB:-temporal}
POSTGRES_USER: temporal
POSTGRES_PASSWORD: ${TEMPORAL_POSTGRES_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -d ${TEMPORAL_POSTGRES_DB:-temporal} -U temporal"]
interval: 1s
timeout: 5s
retries: 10

volumes:
cozo_data:
cozo_backup:
litellm-db-data:
litellm-redis-data:
temporal-db-data:
144 changes: 144 additions & 0 deletions deploy/docker-compose.multi-tenant-cpu-selfhosted.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
version: '3.8'

services:
memory-store:
image: julepai/memory-store:${TAG}
environment:
COZO_AUTH_TOKEN: ${COZO_AUTH_TOKEN}
COZO_PORT: 9070
COZO_MNT_DIR: /data
COZO_BACKUP_DIR: /backup
volumes:
- cozo_data:/data
- cozo_backup:/backup
ports:
- "9070:9070"

gateway:
image: julepai/gateway:${TAG}
environment:
GATEWAY_PORT: 80
JWT_SHARED_KEY: ${JWT_SHARED_KEY}
AGENTS_API_URL: http://agents-api-multi-tenant:8080
AGENTS_API_KEY: ${AGENTS_API_KEY}
AGENTS_API_KEY_HEADER_NAME: Authorization
TRAEFIK_LOG_LEVEL: INFO
ports:
- "80:80"

embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
environment:
MODEL_ID: Alibaba-NLP/gte-large-en-v1.5
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
GROQ_API_KEY: ${GROQ_API_KEY}
CLOUDFLARE_API_KEY: ${CLOUDFLARE_API_KEY}
CLOUDFLARE_ACCOUNT_ID: ${CLOUDFLARE_ACCOUNT_ID}
NVIDIA_NIM_API_KEY: ${NVIDIA_NIM_API_KEY}
GITHUB_API_KEY: ${GITHUB_API_KEY}
VOYAGE_API_KEY: ${VOYAGE_API_KEY}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS}
TRUNCATE_EMBED_TEXT: True
volumes:
- ~/.cache/huggingface/hub:/data

agents-api-multi-tenant:
image: julepai/agents-api:${TAG}
environment:
AGENTS_API_KEY: ${AGENTS_API_KEY}
AGENTS_API_KEY_HEADER_NAME: Authorization
AGENTS_API_HOSTNAME: gateway
AGENTS_API_PUBLIC_PORT: 80
AGENTS_API_PROTOCOL: http
COZO_AUTH_TOKEN: ${COZO_AUTH_TOKEN}
COZO_HOST: http://memory-store:9070
DEBUG: False
EMBEDDING_MODEL_ID: Alibaba-NLP/gte-large-en-v1.5
INTEGRATION_SERVICE_URL: http://integrations:8000
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
LITELLM_URL: http://litellm:4000
SUMMARIZATION_MODEL_NAME: gpt-4-turbo
TEMPORAL_ENDPOINT: temporal:7233
TEMPORAL_NAMESPACE: default
TEMPORAL_TASK_QUEUE: julep-task-queue
TEMPORAL_WORKER_URL: temporal:7233
WORKER_URL: temporal:7233
AGENTS_API_MULTI_TENANT_MODE: true
AGENTS_API_PREFIX: "/api"
ports:
- "8080:8080"
depends_on:
memory-store:
condition: service_started
gateway:
condition: service_started

litellm:
image: ghcr.io/berriai/litellm-database:main-v1.46.6
environment:
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
DATABASE_URL: postgresql://${LITELLM_POSTGRES_USER:-llmproxy}:${LITELLM_POSTGRES_PASSWORD}@litellm-db:5432/${LITELLM_POSTGRES_DB:-litellm}?sslmode=prefer
REDIS_URL: redis://default:${LITELLM_REDIS_PASSWORD}@litellm-redis:6379
volumes:
- ./litellm-config.yaml:/app/config.yaml
- .keys:/app/.keys:ro
ports:
- "4000:4000"
depends_on:
litellm-db:
condition: service_healthy
litellm-redis:
condition: service_started

litellm-db:
image: postgres:16
environment:
POSTGRES_DB: ${LITELLM_POSTGRES_DB:-litellm}
POSTGRES_USER: ${LITELLM_POSTGRES_USER:-llmproxy}
POSTGRES_PASSWORD: ${LITELLM_POSTGRES_PASSWORD}
volumes:
- litellm-db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${LITELLM_POSTGRES_USER:-llmproxy}"]
interval: 5s
timeout: 5s
retries: 5

litellm-redis:
image: redis:6
command: redis-server --requirepass ${LITELLM_REDIS_PASSWORD}
volumes:
- litellm-redis-data:/data

temporal:
image: temporalio/auto-setup:1.25
environment:
- DB=postgresql
- DB_PORT=5432
- POSTGRES_USER=${TEMPORAL_POSTGRES_USER:-temporal}
- POSTGRES_PWD=${TEMPORAL_POSTGRES_PASSWORD}
- POSTGRES_SEEDS=temporal-db
depends_on:
temporal-db:
condition: service_healthy

temporal-db:
image: postgres:16
environment:
POSTGRES_PASSWORD: ${TEMPORAL_POSTGRES_PASSWORD}
POSTGRES_USER: ${TEMPORAL_POSTGRES_USER:-temporal}
volumes:
- temporal-db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${TEMPORAL_POSTGRES_USER:-temporal}"]
interval: 5s
timeout: 5s
retries: 5

volumes:
cozo_data:
cozo_backup:
litellm-db-data:
litellm-redis-data:
temporal-db-data:
Loading
Loading