-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hermes Pro function calling to llama-cpp-python #5
Comments
I dont quite understand how it works. The only difference in format is that after you get the result json from your function call you have to return it through a role Unless I'm misunderstanding? |
Ok, so actually using the Here is the formatted
Now, to get the best performance one would want to use the correct Hermes Pro format, which does not work with this handler, as it forces pure json output (without XML tags) for fixed tool_choice and stops at a But looking at it closer as I did just now, I might be able to adapt this myself (or have Claude Opus do it lol). |
Hi @Benjoyo I just had a look at the chatml-function-calling handler and while i understand how tools are passed into the prompt as function signatures, like @teknium1 i don't understand how exactly function call parsing is done. for chatml-function-calling you could change the system prompt in the jinja template but the method for adding function signatures to the system prompt would remain the same. # System message
"{% if message.role == 'system' %}"
"{{ message.content }}"
"{% if tool_calls %}"
"\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions."
"\nHere are the available tools:"
"\n<tools> tools </tools>"
"\n\nYou can respond to users messages with either a single message or one or more function calls."
"\n\nTo respond with a message begin the message with 'message:', use the following format:"
"\n\nmessage:"
"\n<message>"
"\nFor each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:"
"\n<tool_call>"
"\n{"arguments": <args-dict>, "name": <function-name>}"
"\n</tool_call>"
"{% endif %}"
"<|im_end|>\n"
"{% endif %}" For parsing function calls, you'd have to add a parsing function such as below: import ast
import xml.etree.ElementTree as ET
class FunctionCall(BaseModel):
arguments: dict
name: str
def parse_function_calls(completion):
function_calls = []
root = ET.fromstring(f"<root>{text}</root>")
for tool_call in root.findall(".//tool_call"):
try:
function_call_json = json.loads(tool_call.text.strip())
function_call = FunctionCall(**function_call_json)
function_calls.append(function_call)
except json.JSONDecodeError:
try:
# Try parsing with ast.literal_eval if json.loads fails
function_call_json = ast.literal_eval(tool_call.text.strip())
function_call = FunctionCall(**function_call_json)
function_calls.append(function_call)
except (ValueError, SyntaxError):
pass
return function_calls It looks like this is how they are parsing function calls in the functionary-chat-handler which you can modify by using the function above: if function_call is None or (
isinstance(function_call, str) and function_call == "auto"
):
stop = "\n"
completion: llama_types.Completion = llama.create_completion(
prompt=prompt, stop=stop, stream=False
) # type: ignore
completion_text = completion["choices"][0]["text"]
# strip " to=functions." and ending ":"
function_call = completion_text.split(".")[-1][:-1]
new_prompt = prompt + completion_text + stop But for chatml-function-calling handler I couldn't understand exactly how they are parsing function calls. Hope this helps 🙂 |
Thanks @interstellarninja for the hints. I will have a closer look and try to make a PR soon. |
At another look, my guess is they are completely relying on grammars for parsing the function calls in both handlers Here's how they are defining grammars using function signatures in chatml-function-calling handler # One or more function calls
tool_name = text[len("functions.") :]
tool = next((tool for tool in tools if tool["function"]["name"] == tool_name), None)
if not stream:
completions = []
completions_tool_name = []
while tool is not None:
prompt += f"functions.{tool_name}:\n"
try:
grammar = llama_grammar.LlamaGrammar.from_json_schema(
json.dumps(tool["function"]["parameters"]), verbose=llama.verbose
)
except Exception as e:
grammar = llama_grammar.LlamaGrammar.from_string(
llama_grammar.JSON_GBNF, verbose=llama.verbose
)
if llama.verbose:
print(
"Failed to parse function body as JSON schema, falling back to default grammar"
)
print(e) So perhaps changing the system prompt and a few other things might just work even if function calls are wrapped around XML tags. |
Yeah, I think one needs to change the stop sequence to the starting tag to detect the model wanting to generate a call, then use grammar, and then stop at the closing tag. These stop criteria are different in the other format. Rest could be very similar or same. |
hey @adrienbrault has implemented OpenAI function calling format for Ollama, maybe this serves as a guide even though it is written in golang. |
Hey, thank you so much for the great model and this repo!
Would you be willing to add support for this chat format to llama-cpp-python, so that we can use function calling (and JSON mode) with their OpenAI compatible server?
Right now, llama-cpp-python offers the only OpenAI compatible server with constrained/grammar based sampling for CPU that I am aware of. It has been very convenient to use with the functionary models, as it is plug&play with the openai client and very reliable thanks to the grammar sampling.
Besides functionary, there is already support for a format called chatml-function-calling which might be similar enough to the Hermes format to be able to just adapt it instead of writing something from scratch:
https://github.com/abetlen/llama-cpp-python/blob/6eb25231e4dafeb792ffc9597c27330344c970b1/llama_cpp/llama_chat_format.py#L2045
All that would need to be added to the library is a handler like that.
Thanks!
The text was updated successfully, but these errors were encountered: