Skip to content

Latest commit

 

History

History
190 lines (145 loc) · 8.29 KB

README.md

File metadata and controls

190 lines (145 loc) · 8.29 KB

Invoker

The one who calls upon... Functions!

Invoker is a suite of large language models based on Llama-2 and is finetuned to plan between calling functions and providing responses directly. Currently, we have released the 13B version and there are plans for the 7B and 34B versions to be trained and released in the future.

News

  • [2023/09] We released Invoker-13B-GPTQ, which is a 4-bit quantized GPTQ implementation of Invoker-13B. Download weights. We also added ExllamaV2 integration!
  • [2023/09] We released Invoker-13B, a model trained on function-calling and multi-turn conversation datasets. Download weights

Installation & Usage

The usage of Invoker follows exactly like OpenAI's function calling. Simply install the required dependencies:

pip install -r requirements.txt

Launching the Server

Kick-start the FastAPI server. You can indicate the model details via environment variables. The Invoker server currently supports 2 different ways to load the model. If you would like to load the full fp16 model using HuggingFace transformers, run the following commands:

export INVOKER_MODEL_TYPE=hf
export INVOKER_MODEL_NAME_OR_PATH=jeffrey-fong/invoker-13b
uvicorn server_fastapi:app

If you would like to load 4-bit quantized Invoker GPTQ models using ExLlamaV2, clone the model repository into your local machine. Then, run the following commands:

export INVOKER_MODEL_TYPE=exllamav2
export INVOKER_MODEL_NAME_OR_PATH=path_to_downloaded_invoker-13b-GPTQ-model_dir
uvicorn server_fastapi:app

The full list of models are indicated here.

Inference

Inference can then be performed exactly like OpenAI function-calling. Provide the chat and the functions in the messages and functions arguments respectively. Invoker also supports the following generation hyperparameters:

  • temperature: float = 0.5 Accepts values between 0.0 and 1.0. Defaults to 0.5 if the temperature is not passed in.
  • top_p: float = 1.0 Accepts values between 0.0 and 1.0. Defaults to 1.0 if the top_p is not passed in.
import openai

openai.api_base = "http://localhost:8000"
openai.api_key = "test"

messages = [{"role": "user", "content": "Can you check what is the time in Singapore?"}]
response = openai.ChatCompletion.create(
    model="jeffrey-fong/invoker-13b",
    messages=messages,
    functions=[
        {
            "name": "get_time",
            "description": "Get the current time",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. New York City, NY"
                    },
                    "format": {
                    "type": "string",
                    "enum": ["12-hour", "24-hour"]
                    }
                },
                "required": ["location"]
            }
        }
    ]
)
response_message = response["choices"][0]["message"]

The model can choose to call a function; if so, the content will be a stringified JSON object indicating a function call with the function name and arguments generated by the model (note: the model may generate invalid JSON or hallucinate parameters). To allow the model to summarize the results of the function response, parse the string into JSON in your code, and call your function with the provided arguments if they exist. Perform another inference with the model after appending the function response as a new message.

Using the above example again,

if response_message.get("function_call"):
  available_functions = {"get_time": get_time}
  function_name = response_message["function_call"]["name"]
  function_to_call = available_functions[function_name]
  function_args = json.loads(response_message["function_call"]["arguments"])
  function_response = function_to_call(
      location=function_args.get("location"),
      unit=function_args.get("format"),
  )
  messages.append(response_message)
  messages.append(
      {
          "role": "function",
          "name": function_name,
          "content": function_response,
      }
  )
  second_response = openai.ChatCompletion.create(
      model="jeffrey-fong/invoker-13b",
      messages=messages,
  )
  print(second_response["choices"][0]["message"])

Refer to the example client code here for a more detailed example.

Using the model directly

Please refer to the model card in HuggingFace to see how to use the model directly, including the prompt format, etc.

Model Download

Model Link Version
Invoker-13B Huggingface Repo v1.0
Invoker-13B-GPTQ Huggingface Repo v1.0
Invoker-7B Coming Soon v1.0
Invoker-34B Coming Soon v1.0

Training

Training was performed using QLora which significantly reduces the computational resources required to train the models. Similar to FastChat, we only consider the gradients for the assistant responses when computing the loss for backpropagation and ignore all other outputs and responses.

We accelerated training with DeepSpeed Zero Stage 2 for fast data parallelism. QLora is currently not compatible with DeepSpeed Zero Stage 3 which shards the model into multiple GPUs.

Training code will released in the future.

Training hyperparameters

Hyperparameter Value
Total batch size 192
Epochs 2
Learning rate 2e-05
Lora r 64
Lora alpha 16
Lora dropout 0.05
Weight decay 0.0
Warmup ratio 0.03

Training Data

We use a variety of sources when building our training dataset. All the datasets are carefully chosen to improve both the conversational and function-calling capability of the model.

  • ToolBench (0830 updated) ToolBench is an open-source, large-scale and high quality instruction tuning SFT dataset to facilitate the training of LLMs with general tool-use capability. It consists of multi-turn conversations where the assistant, who is presented with several potential functions to call, will call one or multiple functions before returning its response to the user. We had undergone rigorous cleaning of the data where we

    1. Removed all datapoints that do not end with the assistant returning a summarized response
    2. Cleaned datapoints with unnecessary calls to the same function
    3. Changed all function names and descriptions to not include the domain name, so the functions feels more generic
  • ShareGPT-34K ShareGPT-34K is a filtered dataset containing high quality multi-turn conversations between a user and an assistant. Some of the assistant responses are generated from OpenAI's GPT-3.5-Turbo while some are from GPT-4.

  • OASST1 OASST1 is a human-generated and human-annotated assistant-style conversation corpus. We filtered out the conversations in English.

All the datasets used are under Apache-2.0 License. Therefore, this dataset will also be under the same license.

To-Dos

  • Quantize 13B model
  • Work on GPTQ-based servers (ExLlama and/or ExLlamaV2)
  • Work on validating function names, descriptions, etc. Just like OpenAI's function calling
  • Converting Invoker to other formats like:
    • GGUF
    • AWQ
  • Train 7B Llama-2 model and 34B CodeLlama model
  • Investigate ways to evaluate function calling

Citation

If this work is helpful, please kindly cite as:

@Misc{invoker-function-calling,
  title = {Invoker: The one who calls upon functions - Function-Calling Language Model},
  author = {jeffrey-fong},
  howpublished = {\url{https://github.com/jeffrey-fong/Invoker}},
  year = {2023}
}