Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] How to host a azure open ai using docker #1159

Open
3 tasks done
EZHU331 opened this issue Nov 12, 2024 · 19 comments
Open
3 tasks done

[docs] How to host a azure open ai using docker #1159

EZHU331 opened this issue Nov 12, 2024 · 19 comments
Labels
documentation Improvements or additions to documentation

Comments

@EZHU331
Copy link

EZHU331 commented Nov 12, 2024

Description
This document aims to guide users on deploying Azure OpenAI Service within a Docker container. The current documentation lacks a comprehensive, step-by-step guide for setting up Azure OpenAI in a Dockerized environment, including authentication, configuration, and deployment processes.

Current documentation
Currently, there is limited or no specific guidance on hosting Azure OpenAI using Docker in the Azure OpenAI Service documentation.

Suggested changes
Add a dedicated section or guide that covers:

  1. Prerequisites

    • Docker installation and setup.
    • Azure account with the necessary permissions for OpenAI service.
    • API keys for authentication.
  2. Dockerfile Configuration

    • A sample Dockerfile to set up an environment with the required dependencies.
    • Instructions on installing necessary libraries such as the Azure SDK for Python, if applicable.
  3. Environment Variables

    • Configuration of environment variables to securely pass Azure OpenAI API keys and other configuration settings.
  4. Sample Docker Compose File

    • A docker-compose.yml file example to demonstrate how to set up multi-container applications if additional services are needed.
  5. Running the Container

    • Step-by-step commands for building and running the Docker container.
    • Guidance on exposing necessary ports and accessing the service.
  6. Testing and Verification

    • Instructions for verifying successful deployment, including sample API calls to Azure OpenAI.
  7. Security Best Practices

    • Suggestions for securely handling API keys, using .env files, and avoiding hardcoding credentials.

Additional context
Adding this section will help users deploy Azure OpenAI Service using Docker more efficiently and securely, improving accessibility for development and production deployments. Screenshots or command line examples would be beneficial for each step.

Checklist

  • I have checked that this issue hasn't already been reported.
  • I have checked the latest version of the documentation to ensure this issue still exists.
  • For simple typos or fixes, I have considered submitting a pull request instead.
@EZHU331 EZHU331 added the documentation Improvements or additions to documentation label Nov 12, 2024
@JosephCatrambone
Copy link
Contributor

Thank you for the report. I think this makes sense. There is a little bit on using Azure with OpenAI here: https://www.guardrailsai.com/docs/how_to_guides/using_llms#azure-openai

That should cover the extra environment variables that one needs to set to use things with Azure. Additionally, for Docker deployments, there's this set of docs: https://www.guardrailsai.com/docs/how_to_guides/hosting_with_docker

I'm not sure if that will fully answer the questions you have, but it's perhaps a starting point while we figure out how to make the documentation clearer.

@EZHU331
Copy link
Author

EZHU331 commented Nov 14, 2024

Given the exisiting documentation:
https://www.guardrailsai.com/docs/how_to_guides/hosting_with_docker
https://www.guardrailsai.com/docs/getting_started/guardrails_server

The below API only works for standard open ai; what would be the procedure to revise below code so it will work with azure open ai?

from openai import OpenAI

client = OpenAI(
base_url='http://127.0.0.1:8000/guards/gibberish_guard/openai/v1',
)

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{
"role": "user",
"content": "Make up some gibberish for me please!"
}]
)

print(response.choices[0].message.content)
print(response.guardrails['validation_passed'])

@JosephCatrambone
Copy link
Contributor

Assuming your Azure environment variables are set according to the linked documentation, it should only be a matter of

#Assumes "AZURE_API_KEY", "AZURE_API_BASE", and AZURE_API_VERSION" are set.
from guardrails import Guard
from guardrails.hub import GibberishText

guard = Guard().use(
    GibberishText, threshold=0.5, validation_method="sentence", on_fail="exception"
)

result = guard(
    model="azure/<your_deployment_name>",
    messages=[
        {"role":"user", "content":"Make up some gibberish for me, please?"}
    ],
)

@EZHU331
Copy link
Author

EZHU331 commented Nov 14, 2024

Yes, this works with guard() but does not work when using as a server:
http://127.0.0.1:8000/guards/gibberish_guard**/openai/v1**

Any idea on how to modify this link, assume up on docker creation the env has "AZURE_API_KEY", "AZURE_API_BASE", and AZURE_API_VERSION"

@JosephCatrambone
Copy link
Contributor

JosephCatrambone commented Nov 15, 2024

Aha! I think I see the difficulty. It's worth trying from openai import AzureOpenAI. I'm struggling a bit to test this because of resource availability, but in theory that should be API-compatible with OpenAI and would be the only real change.

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = 'http://127.0.0.1:8000/guards/gibberish_guard/openai/v1'
  azure_ad_token_provider="your token provider",
  api_version="2024-09-01-preview"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{
        "role": "user",
        "content": "Make up some gibberish for me please!"
    }]
)

I'll update this comment again if I can verify that this works.

EDIT: For reference, I'm running over Microsoft's Azure OpenAI Service here: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure%2Cglobal-standard%2Cstandard-chat-completions

@EZHU331
Copy link
Author

EZHU331 commented Nov 16, 2024

Using 'http://127.0.0.1:8000/guards/gibberish_guard/openai/v1' will simply return the return the URL is not found in this setting.

I'm also trying to follow your guide for cloud environment setting, but the documentation is outdated:
https://www.guardrailsai.com/docs/how_to_guides/continuous_integration_continuous_deployment

This command is no longer available: guardrails create --template hub:template://guardrails/chatbot

The json file cannot be installed either:


{
  "name": "chatbot",
  "description": "guard to validate chatbot output",
  "template_version": "0.0.1",
  "namespace": "guardrails",
  "guards": [
    {
      "id": "chatbot",
      "name": "chatbot",
      "validators": [
        {
          "id": "guardrails/detect_pii",
          "on": "$",
          "onFail": "exception",
          "kwargs": {
            "pii_entities": ["PERSON"]
          }
        }
      ]
    }
  ]
}


 => [7/8] COPY chatbot.json /app/chatbot.json                                              0.0s 
 => ERROR [8/8] RUN guardrails create --template /app/chatbot.json                       195.1s 
------                                                                                          
 > [8/8] RUN guardrails create --template /app/chatbot.json:                                    
1.134 Installing...                                                                             
1.136 Installing hub://guardrails/detect_pii...                                                 
193.2 ERROR:guardrails-cli:Failed to install guardrails-grhub-detect-pii                        
193.2 Exit code: 1                                                                              
193.2 stderr: error: subprocess-exited-with-error
193.2   
193.2   × pip subprocess to install build dependencies did not run successfully.
193.2   │ exit code: 1
193.2   ╰─> [391 lines of output]
193.2       Looking in indexes: https://__token__:****@pypi.guardrailsai.com/simple, https://pypi.org/simple
193.2       Ignoring numpy: markers 'python_version < "3.9"' don't match your environment
193.2       Collecting setuptools
193.2         Using cached setuptools-75.5.0-py3-none-any.whl.metadata (6.8 kB)
193.2       Collecting cython<3.0,>=0.25
193.2         Using cached Cython-0.29.37-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl.metadata (3.1 kB)
193.2       Collecting cymem<2.1.0,>=2.0.2

@JosephCatrambone
Copy link
Contributor

Would it be possible to see a copy/paste of your config.py file (with the auth token redacted)? It should really be only a matter of those env variables.

I can look into the issue with --template, but I think that might be worth separating out as a different issue.

@OrenRachmil
Copy link

I'm encountering the same issue when attempting to use Azure OpenAI models. However, the standard OpenAI models are functioning properly.

@JosephCatrambone
Copy link
Contributor

My access to Azure has strangely disappeared, but I haven't lost site of this issue.

I was seeing the "can't connect" matter with the OpenAI client, even outside of Guardrails. I posted a comment on their help thread: https://learn.microsoft.com/en-us/answers/questions/1315650/unable-to-log-into-azure-ai-studio-after-approved

When I get my access back I'll set up a local repro and be able to check everything in mroe depth.

@OrenRachmil
Copy link

Thank you for the effort, keep me posted @JosephCatrambone

@OrenRachmil
Copy link

Hey @JosephCatrambone ,
yesterday I spent some time debugging this issue and identified the root cause.

The problem with using Azure OpenAI models lies in how LiteLLM processes the base URL, which differs from its handling of pure OpenAI models. This discrepancy leads to different HTTP requests being generated, causing requests for Azure OpenAI models to target undefined routes in the FastAPI app.

For example:

OpenAI models generate the following request:
POST http://localhost:8000/guards/gibberish_guard/openai/v1/chat/completions

Azure OpenAI models, however, produce this request:
POST /guards/gibberish_guard/openai/v1/openai/deployments/GPT4/chat/completions?api-version=2024-05-01-preview
(Where "GPT4" is the deployment name of my model.)

The second request attempts to reach a route that isn't defined on the server, resulting from the way LiteLLM handles the request for Azure-specific endpoints.

@CalebCourier
Copy link
Collaborator

@OrenRachmil Good find, and thank you for sharing this information with us!

@zsimjee looks like we need either a new route for AzureOpenAI support via the proxy endpoint or to add some additional parameters/wildcards after /openai/v1/.

@OrenRachmil
Copy link

Hi @CalebCourier,

Are there any updates on resolving this issue?

If you could use some assistance, I'd be happy to contribute. Could you please provide any guidelines or details on how you envision the solution?

Here's what I’ve tried so far: I created a new function in the FastAPI app with appropriate routing for Azure OpenAI calls. I based it on the internal logic of the existing function handling OpenAI calls, making modifications to the routing for the new function. However, this didn’t resolve the issue for some reason.

If there are any specific areas where I could focus my efforts or adjustments I might have overlooked, please let me know.

Thanks!

@CalebCourier
Copy link
Collaborator

Hi @OrenRachmil, we have an issue logged but we do not have a full solution specified, nor have we assigned out this work yet. We are always open to Pull Requests on our open source projects including the API which you can find here: https://github.com/guardrails-ai/guardrails-api

I think what you've tried so far is a step in the right direction. Below is a route I threw together to see if I could get the request to go through to the FastAPI and it worked in the sense that I see the log outputs and receive the 418 error back on the client. What's left would be to try to patch in the logic from the /guards/{guard_name}/openai/v1/chat/completions route and update any calls to the guard with the kwargs necessary to pass along the correct information to litellm and in turn the Azure deployment.

guardrails-api new route

@router.post("/guards/{guard_name}/openai/v1/openai/deployments/{deployment_name}/chat/completions")
@handle_error
async def azure_openai_v1_chat_completions(guard_name: str, deployment_name: str, request: Request):
    payload = await request.json()
    print("payload: ", payload)
    decoded_guard_name = unquote_plus(guard_name)
    print("guard_name: ", decoded_guard_name)
    decoded_deployment_name = unquote_plus(deployment_name)
    print("deployment_name: ", decoded_deployment_name)
    query_params = request.query_params
    print("query_params: ", query_params)
    headers = request.headers
    print("headers: ", headers)

    raise HTTPException(418, detail="I'm a teapot")

config.py

from guardrails import Guard

my_guard = Guard(name="my-guard")

Client script

import os
from litellm import completion

## set ENV variables
os.environ["AZURE_API_KEY"] = "azure-api-key"
os.environ["AZURE_API_BASE"] = "http://localhost:8000/guards/my-guard/openai/v1"
os.environ["AZURE_REGION"] = "eastus"
os.environ["AZURE_API_VERSION"] = "2024-02-01"

# azure call
response = completion(
    model = "azure/test", 
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Start command

guardrails-api start --config ./config.py

@OrenRachmil
Copy link

Thank you very much for the effort @CalebCourier.
I will try to investigate.

@OrenRachmil
Copy link

OrenRachmil commented Nov 27, 2024

Hi @CalebCourier,
I was able to successfully host the Azure model on the server using your proposed solution.
I created the following function with an appropriate route and added it to the \guardrails_api\api\guards.py script:

@router.post("/guards/{guard_name}/openai/v1/openai/deployments/{deployment_name}/chat/completions")
@handle_error
async def azure_openai_v1_chat_completions(guard_name: str, deployment_name: str, request: Request):
    payload = await request.json()
    decoded_guard_name = unquote_plus(guard_name)
    guard_struct = guard_client.get_guard(decoded_guard_name)
    if guard_struct is None:
        raise HTTPException(
            status_code=404,
            detail=f"A Guard with the name {decoded_guard_name} does not exist!",
        )
    guard = (
        AsyncGuard.from_dict(guard_struct.to_dict())
        if not isinstance(guard_struct, Guard)
        else guard_struct
    )
    stream = payload.get("stream", False)
    has_tool_gd_tool_call = any(
        tool.get("function", {}).get("name") == "gd_response_tool"
        for tool in payload.get("tools", [])
    )
    if not stream:
        if 'model' in payload and isinstance(payload['model'], str):
            payload['model'] = f"azure/{payload['model']}"
        else:
            raise ValueError("Invalid or missing 'model' in the payload")
        execution = guard(num_reasks=0, **payload)
        if inspect.iscoroutine(execution):
            validation_outcome: ValidationOutcome = await execution
        else:
            validation_outcome: ValidationOutcome = execution

        llm_response = guard.history.last.iterations.last.outputs.llm_response_info
        result = outcome_to_chat_completion(
            validation_outcome=validation_outcome,
            llm_response=llm_response,
            has_tool_gd_tool_call=has_tool_gd_tool_call,
        )
        return JSONResponse(content=result)
    else:

        async def openai_streamer():
            try:
                guard_stream = await guard(num_reasks=0, **payload)
                async for result in guard_stream:
                    chunk = json.dumps(
                        outcome_to_stream_response(validation_outcome=result)
                    )
                    yield f"data: {chunk}\n\n"
                yield "\n"
            except Exception as e:
                yield f"data: {json.dumps({'error': {'message':str(e)}})}\n\n"
                yield "\n"

        return StreamingResponse(openai_streamer(), media_type="text/event-stream")

I had also to configure the api_key,api_base of the deployment in the configuration file of the server.

@EZHU331
Copy link
Author

EZHU331 commented Dec 3, 2024

Thank @OrenRachmil , will this be incorporated in the latest release with official support using a docker image?

Copy link

github-actions bot commented Jan 3, 2025

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@CalebCourier
Copy link
Collaborator

@zsimjee we should try to prio pulling in these changes next sprint along with a couple other open PR's we have on the API. After this we should be able to retest and cut 0.1.x of the API which has the switchover to FastAPI and some other 0.6.x (OSS) related updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants