Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 2.42 KB

README.md

File metadata and controls

49 lines (38 loc) · 2.42 KB

Function-calling & JSON-mode Evaluation

A framework for evaluating function calls and json output by LLMs using Hermes Tool Call and JSON-mode format.

This script evaluates the performance of a language model on a function calling and JSON output tasks. It preprocesses prompts, runs model completions, parses the function calls/json objects in the completions, validates the function calls/json objects, and calculates the pass rate.

Usage

  1. Clone the repository or copy the script to your local machine.
git clone https://github.com/your-repo/function-calling-eval.git
cd function-calling-eval/tool_eval
  1. Install the required dependencies:
pip -r requirements.txt
MAX_JOBS=4 pip install flash-attn --no-build-isolation

Arguments

  • --model_path: Path to the model folder (required).
  • --chat_template: Chat template for prompt formatting (default: "chatml").
  • --num_fewshot: Option to subset the evaluation dataset (default: None).
  • --dataset_path: Path to the Hugging Face dataset (default: function-calling: "NousResearch/func-calling-eval" & json-mode: "NousResearch/json-mode-eval").
  • --load_in_4bit: Option to load the model in 4-bit mode with bitsandbytes (default: "False").
  • --dpo: Option to save the dataset for DPO (default: "False").

Example

Function-calling

python evaluator.py --model_path /path/to/model --chat_template chatml --dataset_path dataset/path --load_in_4bit True --dpo False

Output

The script generates the following outputs:

  • function_calling_eval_results.json: A JSON file containing the function-calling evaluation results, including prompts, completions, model outputs, and pass/fail status.
  • function_calling_dpo_pairs.json (if --dpo is set to "True"): A JSON file containing the DPO dataset for function-calling consisting of system messages, questions, chosen completions, and rejected completions.

JSON-mode

python evaluator_json_mode.py --model_path /path/to/model --load_in_4bit True --dpo False

Output

The script generates the following outputs:

  • json_mode_eval_results.json: A JSON file containing the json-mode evaluation results, including prompts, completions, model outputs, and pass/fail status.
  • json_mode_dpo_pairs.json (if --dpo is set to "True"): A JSON file containing the DPO dataset for json-mode consisting of system messages, questions, chosen completions, and rejected completions.