Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"tool_calls" not returning on native http request on a llama cpp server #1856

Open
4 tasks done
celsowm opened this issue Dec 7, 2024 · 0 comments
Open
4 tasks done

Comments

@celsowm
Copy link

celsowm commented Dec 7, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Behavior similar to https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Functions.ipynb

Current Behavior

is returning without "tool_calls":

{'id': 'chatcmpl-a8d9c08c-6260-4537-aa3a-b80ca7716de9', 'object': 'chat.completion', 'created': 1733614667, 'model': 'gpt-3.5-turbo-1106', 'choices': [{'index': 0, 'message': {'content': '{"name": "get_current_weather", "parameters": {"location": "San Francisco, CA", "unit": "fahrenheit"}}; {"name": "get_current_weather", "parameters": {"location": "Tokyo, JP", "unit": "celsius"}}; {"name": "get_current_weather", "parameters": {"location": "Paris, FR", "unit": "fahrenheit"}}', 'role': 'assistant'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 239, 'completion_tokens': 83, 'total_tokens': 322}}
None

Environment and Context

llama cpp server: python -m llama_cpp.server --n_gpu_layers -1 --n_ctx 8000 --model .\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

windows 11

# Client code

import requests
import json

# Função exemplo
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps(
            {"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"}
        )
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})


def run_conversation():
    # Mensagem inicial do usuário
    messages = [
        {
            "role": "user",
            "content": "What's the weather like in San Francisco, Tokyo, and Paris?",
        }
    ]

    # Definição das ferramentas (funções) disponíveis
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    # Primeiro request para obter a resposta inicial do modelo
    payload = {
        "model": "gpt-3.5-turbo-1106",
        "messages": messages,
        "tools": tools,
        "tool_choice": "auto"
    }

    response = requests.post("http://localhost:8000/v1/chat/completions", json=payload)
    response_data = response.json()
    
    print(response_data)

    # Extrai a primeira resposta do modelo
    response_message = response_data["choices"][0]["message"]
    tool_calls = response_message.get("tool_calls", [])

    if tool_calls:
        # Funções disponíveis
        available_functions = {
            "get_current_weather": get_current_weather,
        }

        # Adiciona a mensagem de resposta do modelo ao histórico
        messages.append(response_message)

        # Executa cada chamada de ferramenta solicitada
        for tool_call in tool_calls:
            function_name = tool_call["function"]["name"]
            function_args = json.loads(tool_call["function"]["arguments"])
            function_to_call = available_functions[function_name]
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            
            # Adiciona a resposta da função ao histórico
            messages.append(
                {
                    "tool_call_id": tool_call["id"],
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )

        # Faz uma nova requisição ao modelo, agora incluindo a resposta da ferramenta
        second_payload = {
            "model": "gpt-3.5-turbo-1106",
            "messages": messages,
        }

        second_response = requests.post("http://localhost:8000/v1/chat/completions", json=second_payload)
        return second_response.json()

print(run_conversation())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant