Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AI extproc server #10745

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npolshakova
Copy link
Contributor

@npolshakova npolshakova commented Mar 4, 2025

Description

Adds the extproc extension server to kgateway. Requires #10627 to merge in first to introduce plugin changes.

API changes

Depends on: #10493

Code changes

Plugin changes are in: #10627

CI changes

Introduces the AIExtensions e2e suite as a separate cluster for PRs and adds the tests to the nightlys.
TODO: need to setup github env:

  • OPENAI_API_KEY
  • AZURE_OPENAI_API_KEY
  • GEMINI_API_KEY
  • vertex ai gcloud account

Docs changes

Tracked in kgateway-dev/kgateway.dev#59

Context

Adds AI extension extproc to kgateway

Interesting decisions

N/A

Testing steps

  1. Setup a kind cluster:
CONFORMANCE=true ./hack/kind/setup-kind.sh
  1. Install kgateway with ai extensions enabled:
helm upgrade -i -n kgateway-system kgateway ./_test/kgateway-1.0.0-ci1.tgz --create-namespace --set gateway.aiExtension.enabled=true

  1. Apply an example configuration to setup basic routing and debugging:
kubectl apply -f - <<EOF 
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: http-gw-for-test
  namespace: gwtest
  annotations:
    gateway.kgateway.dev/gateway-parameters-name: kgateway-override
spec:
  gatewayClassName: kgateway
  listeners:
    - protocol: HTTP
      port: 8080
      name: http
      allowedRoutes:
        namespaces:
          from: All
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
  name: kgateway-override
  namespace: gwtest
spec:
  kube:
    aiExtension:
      enabled: true
      ports:
      - name: ai-monitoring
        containerPort: 9092
      env:
      - name: LOG_LEVEL
        value: DEBUG
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: HTTPListenerPolicy
metadata:
  name: accesslog
  namespace: gwtest
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gw-for-test
  accessLog:
    - fileSink:
        path: /dev/stdout
        jsonFormat:
          start_time: "%START_TIME%"
          method: "%REQ(X-ENVOY-ORIGINAL-METHOD?:METHOD)%"
          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
          protocol: "%PROTOCOL%"
          response_code: "%RESPONSE_CODE%"
          response_flags: "%RESPONSE_FLAGS%"
          bytes_received: "%BYTES_RECEIVED%"
          bytes_sent: "%BYTES_SENT%"
          total_duration: "%DURATION%"
          resp_upstream_service_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
          req_x_forwarded_for: "%REQ(X-FORWARDED-FOR)%"
          user_agent: "%REQ(USER-AGENT)%"
          request_id: "%REQ(X-REQUEST-ID)%"
          authority: "%REQ(:AUTHORITY)%"
          upstreamHost: "%UPSTREAM_HOST%"
          upstreamCluster: "%UPSTREAM_CLUSTER%"
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: openai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /openai
    backendRefs:
    - name: openai
      group: gateway.kgateway.dev
      kind: Backend
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  labels:
    app: kgateway
  name: openai
  namespace: gwtest
spec:
  type: ai
  ai:
    llm:
      provider:
        openai:
          authToken:
            kind: "SecretRef"
            secretRef:
              name: openai-secret
EOF
  1. Create a secret:
kubectl create secret generic openai-secret -n gwtest \
--from-literal="Authorization=Bearer $OPENAI_API_KEY" \
--dry-run=client -oyaml | kubectl apply -f -
  1. Port forward:
k port-forward services/http-gw-for-test -n gwtest 8080:8080
  1. Send some traffic:
❯ curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]}'
{"id": "chatcmpl-B4xiX8utYqcV3AfdNpjpbk1EbdQYt", "object": "chat.completion", "created": 1740522565, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "In the realm of code, there lies a curious art,\nA loop that's elegant, with a recursive heart.\nA function that calls upon itself in creation,\nTo solve problems with magical iteration.\n\nLike a Russian nesting doll, a tale it does weave,\nUnraveling layers with every recursive heave.\nThrough cycles of repetition, it dances and turns,\nUntil the final solution brightly burns.\n\nEach call spawns a new instance, a branching tree,\nExploring paths unseen, forging destiny.\nWith elegance and grace, it unwinds and entwines,\nA dance of logic, in intricate designs.\n\nSo when in your code, a problem does confound,\nJust remember recursion, a concept profound.\nEmbrace its beauty, its power and delight,\nAnd watch as it conquers the darkest night.", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 39, "completion_tokens": 161, "total_tokens": 200, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": null}%

Stream:

 curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]}'

Prompt guard:
Apply example policy (regex on EMAIL)

kubectl apply -f - <<EOF 
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openai
  namespace: gwtest
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gw-for-test
  rules:
  - backendRefs:
    - filters:
      - extensionRef:
          group: gateway.kgateway.dev
          kind: RoutePolicy
          name: route-test
        type: ExtensionRef
      group: gateway.kgateway.dev
      kind: Backend
      name: openai
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /openai
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: RoutePolicy
metadata:
  name: route-test
  namespace: gwtest
spec:
  ai:
    promptEnrichment:
      append:
      - content: Make sure the tone is friendly and professional.
        role: SYSTEM
    promptGuard:
      response:
        regex:
          action: MASK
          builtins:
          - PHONE_NUMBER
          - EMAIL
          - SSN
          - CREDIT_CARD
EOF

Send request:

❯ curl localhost:8080//openai -H content-type:application/json -d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "List five examples email addresses for an engineer called Nina I can use for testing."
    }
  ]}'
{"id": "chatcmpl-B5bs9KUuxlFs4hMRYXO3FTx4y3Cbc", "object": "chat.completion", "created": 1740676921, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "1. <EMAIL_ADDRESS>\n2. <EMAIL_ADDRESS>\n3. <EMAIL_ADDRESS>\n4. <EMAIL_ADDRESS>\n5. <EMAIL_ADDRESS>", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 36, "completion_tokens": 51, "total_tokens": 87, "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}, "service_tier": "default", "system_fingerprint": null}%

Vertex-AI example:

  1. Apply config:
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: vertex-ai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /vertex-ai
    backendRefs:
    - name: vertex-ai
      group: gateway.kgateway.dev
      kind: Backend
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
  labels:
    app: kgateway
  name: vertex-ai
  namespace: gwtest
spec:
  type: ai
  ai:
    llm:
      provider:
        vertexai:
            model: gemini-1.5-flash-001
            apiVersion: v1
            location: us-central1
            projectId: gloo-ee
            publisher: GOOGLE
            authToken:
              kind: "SecretRef"
              secretRef:
                name: vertex-ai-secret
EOF
  1. Create a secret:
kubectl create secret generic vertex-ai-secret -n ai-test \
--from-literal="Authorization=$(gcloud auth print-access-token <account> --project <project>)" \
--dry-run=client -oyaml | kubectl apply -f -
  1. Send request:
curl localhost:8080/vertex-ai -H content-type:application/json -d '{"contents":[{"role": "user", "parts":[{"text":"say hello."}]}]}'
  1. For streaming request you need to update the config:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: vertex-ai
  namespace: gwtest
spec:
  parentRefs:
    - name: http-gw-for-test
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /vertex-ai
    backendRefs:
    - name: vertex-ai
      group: gateway.kgateway.dev
      kind: Backend
      filters:
      - type: ExtensionRef
        extensionRef:
          group: gateway.kgateway.dev
          kind: RoutePolicy
          name: route-test
---
apiVersion: gateway.kgateway.dev/v1alpha1
kind: RoutePolicy
metadata:
  name: route-test
  namespace: gwtest
spec:
  ai:
    routeType: CHAT_STREAMING
  1. Send a streaming request:
> curl localhost:8080/vertex-ai -H content-type:application/json -d '{"contents":[{"role": "user", "parts":[{"text":"say hello in multiple languages."}]}]}' -v

data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Here"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.056640625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.055908203},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.0390625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.064453125},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.103515625,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05834961},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.041503906,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05419922}]}],"usageMetadata": {},"modelVersion": "gemini-1.5-flash-001","createTime": "2025-02-27T23:13:57.716637Z","responseId": "tfHAZ93eK_y42PgPqsnFuQM"}

data: {"candidates": [{"content": {"role": "model","parts": [{"text": " are some ways"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.064453125,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.053466797},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.022583008,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.048095703},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.08642578,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.04272461},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.034179688,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.07373047}]}],"modelVersion": "gemini-1.5-flash-001","createTime": "2025-02-27T23:13:57.716637Z","responseId": "tfHAZ93eK_y42PgPqsnFuQM"}

...

Notes for reviewers

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@npolshakova npolshakova mentioned this pull request Mar 4, 2025
4 tasks
@npolshakova npolshakova force-pushed the add-ai-extproc-server branch from 8c70222 to 960c855 Compare March 4, 2025 18:30
@npolshakova npolshakova marked this pull request as ready for review March 4, 2025 18:31
@npolshakova npolshakova requested review from lgadban and EItanya March 4, 2025 18:31
@npolshakova npolshakova force-pushed the add-ai-extproc-server branch from 960c855 to 6e1542e Compare March 4, 2025 18:35
@timflannagan
Copy link
Member

Just a heads up that goreleaser doesn't support python (yet ™️ ). We'll need to release the ai-extension component with the rest of our container images in the release.yaml workflow. Can be done in a follow-up, but needs to be done before cutting beta1 imho.

@timflannagan
Copy link
Member

(I don't think multi-arch support is a hard requirement though.)

@@ -1,5 +1,7 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: dual # Enable IPv4 and IPv6 support
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not change kind cluster setup, maybe change to IPV4_PREFFERED default as part of: #10755 @lgadban

@@ -126,11 +126,10 @@ gateway:
aiExtension:
enabled: false
image:
repository: gloo-ai-extension
registry: quay.io/solo-io
registry: ghcr.io/kgateway-dev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete ghcr.io/kgateway-dev, use default from chart.

repository: gloo-ai-extension
registry: quay.io/solo-io
registry: ghcr.io/kgateway-dev
repository: kgateway-ai-extension
pullPolicy: IfNotPresent
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove pull policy (inherit from Chart)

@@ -0,0 +1 @@
__pycache__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timflannagan @lgadban Any thoughts on moving this outside of internal? We were thinking:

  1. Move dockerfile to cmd
  2. Move ai-extensions to a top level python directory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question: I'm not sure what's best here. I think my preference is non-Go code doesn't live in the internal/ directory, but I just looked at the transformation PR, and we're adding rust code to the internal/envoyinit directory, so moving the AI extensions to a top-level directory could introduce some inconsistency in the codebase.

@npolshakova npolshakova force-pushed the add-ai-extproc-server branch from cf7c878 to 9c18d21 Compare March 5, 2025 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants