Add AI extproc server #10745

npolshakova · 2025-03-04T18:03:46Z

Description

Adds the extproc extension server to kgateway. Requires #10627 to merge in first to introduce plugin changes.

API changes

Depends on: #10493

Code changes

Plugin changes are in: #10627

CI changes

Introduces the AIExtensions e2e suite as a separate cluster for PRs and adds the tests to the nightlys.
TODO: need to setup github env:

OPENAI_API_KEY
AZURE_OPENAI_API_KEY
GEMINI_API_KEY
vertex ai gcloud account

Docs changes

Tracked in kgateway-dev/kgateway.dev#59

Context

Adds AI extension extproc to kgateway

Interesting decisions

N/A

Testing steps

Setup a kind cluster:

CONFORMANCE=true ./hack/kind/setup-kind.sh

Install kgateway with ai extensions enabled:

helm upgrade -i -n kgateway-system kgateway ./_test/kgateway-1.0.0-ci1.tgz --create-namespace --set gateway.aiExtension.enabled=true

Apply an example configuration to setup basic routing and debugging:

kubectl apply -f - <<EOF 
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: http-gw-for-test
  namespace: gwtest
  annotations:
    gateway.kgateway.dev/gateway-parameters-name: kgateway-override
spec:
  gatewayClassName: kgateway
  listeners:
    - protocol: HTTP
      port: 8080
      name: http
      allowedRoutes:
        namespaces:
          from: All
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
  name: kgateway-override
  namespace: gwtest
spec:
  kube:
    aiExtension:
      enabled: true
      ports:
      - name: ai-monitoring
        containerPort: 9092
      env:
      - name: LOG_LEVEL
        value: DEBUG
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: HTTPListenerPolicy
metadata:
  name: accesslog
  namespace: gwtest
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gw-for-test
  accessLog:
    - fileSink:
        path: /dev/stdout
        jsonFormat:
          start_time: "%START_TIME%"
          method: "%REQ(X-ENVOY-ORIGINAL-METHOD?:METHOD)%"
          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
          protocol: "%PROTOCOL%"
          response_code: "%RESPONSE_CODE%"
          response_flags: "%RESPONSE_FLAGS%"
          bytes_received: "%BYTES_RECEIVED%"
          bytes_sent: "%BYTES_SENT%"
          total_duration: "%DURATION%"
          resp_upstream_service_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
          req_x_forwarded_for: "%REQ(X-FORWARDED-FOR)%"
          user_agent: "%REQ(USER-AGENT)%"
          request_id: "%REQ(X-REQUEST-ID)%"
          authority: "%REQ(:AUTHORITY)%"
          upstreamHost: "%UPSTREAM_HOST%"
          upstreamCluster: "%UPSTREAM_CLUSTER%"
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: openai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /openai
    backendRefs:
    - name: openai
      group: gateway.kgateway.dev
      kind: Backend
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  labels:
    app: kgateway
  name: openai
  namespace: gwtest
spec:
  type: ai
  ai:
    llm:
      provider:
        openai:
          authToken:
            kind: "SecretRef"
            secretRef:
              name: openai-secret
EOF

Create a secret:

kubectl create secret generic openai-secret -n gwtest \
--from-literal="Authorization=Bearer $OPENAI_API_KEY" \
--dry-run=client -oyaml | kubectl apply -f -

Port forward:

k port-forward services/http-gw-for-test -n gwtest 8080:8080

Send some traffic:

❯ curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]}'
{"id": "chatcmpl-B4xiX8utYqcV3AfdNpjpbk1EbdQYt", "object": "chat.completion", "created": 1740522565, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "In the realm of code, there lies a curious art,\nA loop that's elegant, with a recursive heart.\nA function that calls upon itself in creation,\nTo solve problems with magical iteration.\n\nLike a Russian nesting doll, a tale it does weave,\nUnraveling layers with every recursive heave.\nThrough cycles of repetition, it dances and turns,\nUntil the final solution brightly burns.\n\nEach call spawns a new instance, a branching tree,\nExploring paths unseen, forging destiny.\nWith elegance and grace, it unwinds and entwines,\nA dance of logic, in intricate designs.\n\nSo when in your code, a problem does confound,\nJust remember recursion, a concept profound.\nEmbrace its beauty, its power and delight,\nAnd watch as it conquers the darkest night.", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 39, "completion_tokens": 161, "total_tokens": 200, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": null}%

Stream:

 curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]}'

Prompt guard:
Apply example policy (regex on EMAIL)

kubectl apply -f - <<EOF 
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openai
  namespace: gwtest
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gw-for-test
  rules:
  - backendRefs:
    - filters:
      - extensionRef:
          group: gateway.kgateway.dev
          kind: RoutePolicy
          name: route-test
        type: ExtensionRef
      group: gateway.kgateway.dev
      kind: Backend
      name: openai
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /openai
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: RoutePolicy
metadata:
  name: route-test
  namespace: gwtest
spec:
  ai:
    promptEnrichment:
      append:
      - content: Make sure the tone is friendly and professional.
        role: SYSTEM
    promptGuard:
      response:
        regex:
          action: MASK
          builtins:
          - PHONE_NUMBER
          - EMAIL
          - SSN
          - CREDIT_CARD
EOF

Send request:

❯ curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "List five examples email addresses for an engineer called Nina I can use for testing."
    }
  ]}'
{"id": "chatcmpl-B5bs9KUuxlFs4hMRYXO3FTx4y3Cbc", "object": "chat.completion", "created": 1740676921, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "1. <EMAIL_ADDRESS>\n2. <EMAIL_ADDRESS>\n3. <EMAIL_ADDRESS>\n4. <EMAIL_ADDRESS>\n5. <EMAIL_ADDRESS>", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 36, "completion_tokens": 51, "total_tokens": 87, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": null}%

Vertex-AI example:

Apply config:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: vertex-ai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /vertex-ai
    backendRefs:
    - name: vertex-ai
      group: gateway.kgateway.dev
      kind: Backend
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  labels:
    app: kgateway
  name: vertex-ai
  namespace: gwtest
spec:
  type: ai
  ai:
    llm:
      provider:
        vertexai:
            model: gemini-1.5-flash-001
            apiVersion: v1
            location: us-central1
            projectId: gloo-ee
            publisher: GOOGLE
            authToken:
              kind: "SecretRef"
              secretRef:
                name: vertex-ai-secret
EOF

Create a secret:

kubectl create secret generic vertex-ai-secret -n ai-test \
--from-literal="Authorization=$(gcloud auth print-access-token <account> --project <project>)" \
--dry-run=client -oyaml | kubectl apply -f -

Send request:

curl localhost:8080/vertex-ai -H content-type:application/json -d '{"contents":[{"role": "user", "parts":[{"text":"say hello."}]}]}'

For streaming request you need to update the config:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: vertex-ai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /vertex-ai
    backendRefs:
    - name: vertex-ai
      group: gateway.kgateway.dev
      kind: Backend
      filters:
      - type: ExtensionRef
        extensionRef:
          group: gateway.kgateway.dev
          kind: RoutePolicy
          name: route-test
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: RoutePolicy
metadata:
  name: route-test
  namespace: gwtest
spec:
  ai:
    routeType: CHAT_STREAMING

Send a streaming request:

> curl localhost:8080/vertex-ai -H content-type:application/json -d '{"contents":[{"role": "user", "parts":[{"text":"say hello in multiple languages."}]}]}' -v

data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Here"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.056640625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.055908203},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.0390625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.064453125},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.103515625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05834961},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.041503906,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05419922}]}],"usageMetadata": {},"modelVersion": "gemini-1.5-flash-001","createTime": "2025-02-27T23:13:57.716637Z","responseId": "tfHAZ93eK_y42PgPqsnFuQM"}

data: {"candidates": [{"content": {"role": "model","parts": [{"text": " are some ways"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.064453125,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.053466797},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.022583008,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.048095703},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.08642578,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.04272461},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.034179688,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.07373047}]}],"modelVersion": "gemini-1.5-flash-001","createTime": "2025-02-27T23:13:57.716637Z","responseId": "tfHAZ93eK_y42PgPqsnFuQM"}

...

Notes for reviewers

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works

timflannagan · 2025-03-05T03:46:01Z

Just a heads up that goreleaser doesn't support python (yet ™️ ). We'll need to release the ai-extension component with the rest of our container images in the release.yaml workflow. Can be done in a follow-up, but needs to be done before cutting beta1 imho.

timflannagan · 2025-03-05T03:47:02Z

(I don't think multi-arch support is a hard requirement though.)

npolshakova · 2025-03-05T21:04:40Z

hack/kind/cluster.yaml

@@ -1,5 +1,7 @@
 kind: Cluster
 apiVersion: kind.x-k8s.io/v1alpha4
+networking:
+  ipFamily: dual # Enable IPv4 and IPv6 support


Let's not change kind cluster setup, maybe change to IPV4_PREFFERED default as part of: #10755 @lgadban

npolshakova · 2025-03-05T21:06:42Z

install/helm/kgateway/values.yaml

@@ -126,11 +126,10 @@ gateway:
  aiExtension:
    enabled: false
    image:
-      repository: gloo-ai-extension
-      registry: quay.io/solo-io
+      registry: ghcr.io/kgateway-dev


Delete ghcr.io/kgateway-dev, use default from chart.

npolshakova · 2025-03-05T21:07:43Z

install/helm/kgateway/values.yaml

-      repository: gloo-ai-extension
-      registry: quay.io/solo-io
+      registry: ghcr.io/kgateway-dev
+      repository: kgateway-ai-extension
      pullPolicy: IfNotPresent


Remove pull policy (inherit from Chart)

npolshakova · 2025-03-05T21:12:49Z

internal/ai-extension/.dockerignore

@@ -0,0 +1 @@
+__pycache__


@timflannagan @lgadban Any thoughts on moving this outside of internal? We were thinking:

Move dockerfile to cmd

Move ai-extensions to a top level python directory

Good question: I'm not sure what's best here. I think my preference is non-Go code doesn't live in the internal/ directory, but I just looked at the transformation PR, and we're adding rust code to the internal/envoyinit directory, so moving the AI extensions to a top-level directory could introduce some inconsistency in the codebase.

npolshakova mentioned this pull request Mar 4, 2025

AI extproc extension server #10701

Closed

4 tasks

npolshakova force-pushed the add-ai-extproc-server branch from 8c70222 to 960c855 Compare March 4, 2025 18:30

npolshakova marked this pull request as ready for review March 4, 2025 18:31

npolshakova requested review from lgadban and EItanya March 4, 2025 18:31

npolshakova force-pushed the add-ai-extproc-server branch from 960c855 to 6e1542e Compare March 4, 2025 18:35

npolshakova commented Mar 5, 2025

View reviewed changes

extrpoc server

9c18d21

npolshakova force-pushed the add-ai-extproc-server branch from cf7c878 to 9c18d21 Compare March 5, 2025 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI extproc server #10745

Add AI extproc server #10745

npolshakova commented Mar 4, 2025 •

edited

Loading

timflannagan commented Mar 5, 2025

timflannagan commented Mar 5, 2025

npolshakova Mar 5, 2025

npolshakova Mar 5, 2025

npolshakova Mar 5, 2025

npolshakova Mar 5, 2025

timflannagan Mar 5, 2025

		@@ -0,0 +1 @@
		__pycache__

Add AI extproc server #10745

Are you sure you want to change the base?

Add AI extproc server #10745

Conversation

npolshakova commented Mar 4, 2025 • edited Loading

Description

API changes

Code changes

CI changes

Docs changes

Context

Interesting decisions

Testing steps

Notes for reviewers

Checklist:

timflannagan commented Mar 5, 2025

timflannagan commented Mar 5, 2025

npolshakova Mar 5, 2025

Choose a reason for hiding this comment

npolshakova Mar 5, 2025

Choose a reason for hiding this comment

npolshakova Mar 5, 2025

Choose a reason for hiding this comment

npolshakova Mar 5, 2025

Choose a reason for hiding this comment

timflannagan Mar 5, 2025

Choose a reason for hiding this comment

npolshakova commented Mar 4, 2025 •

edited

Loading