"tool_calls" not returning on native http request on a llama cpp server #1856

celsowm · 2024-12-07T23:51:35Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Behavior similar to https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Functions.ipynb

Current Behavior

is returning without "tool_calls":

{'id': 'chatcmpl-a8d9c08c-6260-4537-aa3a-b80ca7716de9', 'object': 'chat.completion', 'created': 1733614667, 'model': 'gpt-3.5-turbo-1106', 'choices': [{'index': 0, 'message': {'content': '{"name": "get_current_weather", "parameters": {"location": "San Francisco, CA", "unit": "fahrenheit"}}; {"name": "get_current_weather", "parameters": {"location": "Tokyo, JP", "unit": "celsius"}}; {"name": "get_current_weather", "parameters": {"location": "Paris, FR", "unit": "fahrenheit"}}', 'role': 'assistant'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 239, 'completion_tokens': 83, 'total_tokens': 322}}
None

Environment and Context

llama cpp server: python -m llama_cpp.server --n_gpu_layers -1 --n_ctx 8000 --model .\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

windows 11

# Client code

import requests
import json

# Função exemplo
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps(
            {"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"}
        )
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})


def run_conversation():
    # Mensagem inicial do usuário
    messages = [
        {
            "role": "user",
            "content": "What's the weather like in San Francisco, Tokyo, and Paris?",
        }
    ]

    # Definição das ferramentas (funções) disponíveis
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    # Primeiro request para obter a resposta inicial do modelo
    payload = {
        "model": "gpt-3.5-turbo-1106",
        "messages": messages,
        "tools": tools,
        "tool_choice": "auto"
    }

    response = requests.post("http://localhost:8000/v1/chat/completions", json=payload)
    response_data = response.json()
    
    print(response_data)

    # Extrai a primeira resposta do modelo
    response_message = response_data["choices"][0]["message"]
    tool_calls = response_message.get("tool_calls", [])

    if tool_calls:
        # Funções disponíveis
        available_functions = {
            "get_current_weather": get_current_weather,
        }

        # Adiciona a mensagem de resposta do modelo ao histórico
        messages.append(response_message)

        # Executa cada chamada de ferramenta solicitada
        for tool_call in tool_calls:
            function_name = tool_call["function"]["name"]
            function_args = json.loads(tool_call["function"]["arguments"])
            function_to_call = available_functions[function_name]
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            
            # Adiciona a resposta da função ao histórico
            messages.append(
                {
                    "tool_call_id": tool_call["id"],
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )

        # Faz uma nova requisição ao modelo, agora incluindo a resposta da ferramenta
        second_payload = {
            "model": "gpt-3.5-turbo-1106",
            "messages": messages,
        }

        second_response = requests.post("http://localhost:8000/v1/chat/completions", json=second_payload)
        return second_response.json()

print(run_conversation())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"tool_calls" not returning on native http request on a llama cpp server #1856

"tool_calls" not returning on native http request on a llama cpp server #1856

celsowm commented Dec 7, 2024

"tool_calls" not returning on native http request on a llama cpp server #1856

"tool_calls" not returning on native http request on a llama cpp server #1856

Comments

celsowm commented Dec 7, 2024

Prerequisites

Expected Behavior

Current Behavior

Environment and Context