diff --git a/README.md b/README.md index dcfbcb6..bc455d2 100644 --- a/README.md +++ b/README.md @@ -19,10 +19,11 @@ This project solves the issues by filtering the tokens that the language model i ## Installation ```pip install lm-format-enforcer``` -## Simple example +## Basic Tutorial ```python from pydantic import BaseModel -from lmformatenforcer import JsonSchemaParser, generate_enforced +from lmformatenforcer import JsonSchemaParser, build_transformers_prefix_allowed_tokens_fn +from transformers import pipeline class AnswerFormat(BaseModel): first_name: str @@ -30,23 +31,32 @@ class AnswerFormat(BaseModel): year_of_birth: int num_seasons_in_nba: int -question = f'Please give me information about Michael Jordan. You MUST answer using the following json schema: {AnswerFormat.schema_json()}' +# Create a transformers pipeline +hf_pipeline = pipeline('text-generation', model='meta-llama/Llama-2-7b-hf') +prompt = f'Here is information about Michael Jordan in the following json schema: {AnswerFormat.schema_json()} :\n' + +# Create a character level parser and build a transformers prefix function from it parser = JsonSchemaParser(AnswerFormat.schema()) +prefix_function = build_transformers_prefix_allowed_tokens_fn(hf_pipeline.tokenizer, parser) + +# Call the pipeline with the prefix function +output_dict = hf_pipeline(prompt, prefix_allowed_tokens_fn=prefix_function) -# Call generate_enforced(model, tokenizer, parser, ...) instead of model.generate(...): -inputs = tokenizer([question], return_tensors='pt', add_special_tokens=False, return_token_type_ids=False).to(device) -result = generate_enforced(model, tokenizer, parser, inputs=inputs) +# Extract the results +result = output_dict[0]['generated_text'][len(prompt):] print(result) # {'first_name': 'Michael', 'last_name': 'Jordan', 'year_of_birth': 1963, 'num_seasons_in_nba': 15} ``` + ## Capabilities / Advantages - Works with any Python language model and tokenizer. Already supports transformers and LangChain. Can be adapted to others. - Supports batched generation and beam searches - each input / beam can have different tokens filtered at every timestep -- Supports both JSON Schema (strong) and Regular Expression (partial) formats +- Supports both JSON Schema and Regular Expression formats - Supports both required and optional fields in JSON schemas - Supports nested fields, arrays and dictionaries in JSON schemas -- Gives the language model freedom to control whitespacing and field ordering in JSON schemas, reducing hallucinations +- Gives the language model freedom to control whitespacing and field ordering in JSON schemas, reducing hallucinations. +- Does not modify the high level loop of transformers API, so can be used in any scenario. ## Detailed example @@ -57,6 +67,9 @@ We created a Google Colab Notebook which contains a full example of how to use t You can also [view the notebook in GitHub](https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llama2_enforcer.ipynb). + +For the different ways to integrate with huggingface transformers, see the [unit tests](https://github.com/noamgat/lm-format-enforcer/blob/main/tests/test_transformerenforcer.py). + ## How does it work? The library works by combining a character level parser and a tokenizer prefix tree into a smart token filtering mechanism. diff --git a/pyproject.toml b/pyproject.toml index ca59b12..a844601 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "lm-format-enforcer" -version = "0.3.1" +version = "0.3.2" description = "Enforce the output format (JSON Schema, Regex etc) of a language model" authors = ["Noam Gat "] license = "MIT"