Skip to content

πŸ€— A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers

License

Notifications You must be signed in to change notification settings

epfl-dlab/transformers-CFG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€— Transformers CFG

Python 3.9+ License

πŸ’­ Release news

Latest release

v0.2.7 Latest (2025-03-02)

Features

  • (CLI) Types and MLX support (#93)
  • (Regex) Negation, wildcard, and repetition bracket operators (#94, #95, #96, #104)
  • (Models) Qwen2 and Qwen2.5 (#97)
  • (Logits) Resuable GrammarConstrainedLogitsProcessor across generations for efficiency (#100)
  • (Backend) Pytest for testing (#109)
  • (CI/CD) GitHub Actions workflow for automation (#110)

Bug fixes

  • Avoid computing full masks and optimized type additions (#101)
  • Refactored grammar encoding to improve structure (#99)
  • EOS token now correctly masks (#108)
  • Multiple bugs removed and aesthetics improved (#107)

Recent releases

  • Gemma-2 β€” @fillassuncao (2024-08-16)
  • DeepSeek (2024-07-24)
  • LLaMA-3 (2024-07-08)
  • JSON Schema (2024-05-13)
  • Token masking optimization (2024-04-25)
  • Phi (2024-04-16)
  • Online demo with JSON grammar at HF Space (2024-04-10)
  • Unicode and multilingual grammar (2024-02-29)
  • Text-Generation-WebUI (2023-12-17)
    • We are pleased to announce that transformers-cfg has been integrated into the Text-Generation-WebUI project, allowing users to leverage CFG capabilities within this widely used text-generation interface (Pull).

πŸš€ Introduction

Initially developed as a pull request to the Hugging Face Transformers library (Pull), transformers-cfg extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.

πŸ’» Installation

Stable

Install the stable version via pip:

pip install transformers-cfg

Development

For the latest updates, install directly from GitHub:

pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main

πŸ”§ Grammar quickstart

Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text.

Command-line interface (CLI)

The transformers-cfg-cli tool enables text generation using a model and a specified grammar. Unicode is supported.

transformers-cfg-cli generate \
    -m "microsoft/Phi-3-mini-4k-instruct" \
    -g "examples/grammars/json.ebnf" \
    -p "This is a valid JSON string for an HTTP request:" \
    --use_4bit \
    --max_new_tokens 60 \
    --repetition_penalty 1.1
# {"name":"John","age":30,"car":null}

Run transformers-cfg-cli generate --help for available options.

Transformers Torch

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor

if __name__ == "__main__":
    # Detect if GPU is available, otherwise use CPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    model_id = "facebook/opt-125m"

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id

    # Define grammar string
    json_grammar = """

    root   ::= "The animal is a " animal "."

    animal ::= "cat" | "fish"

    """
    
    grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)

    # Generate
    prompts = [
        'The text says, "The animal is a dog." The answer is obvious. ', 'I\'m going to say "The animal is a dog." Here I go! '
              ]
    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)

    output = model.generate(
        input_ids,
        max_length=50,
        logits_processor=[grammar_processor],
        repetition_penalty=1.1,
        num_return_sequences=1,
    )
    
    # Decode output
    generations = tokenizer.batch_decode(output, skip_special_tokens=True)

    # Print all generations in for loop
    for generation in generations:
        print(generation)

Stream

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor

if __name__ == "__main__":
    # Detect if GPU is available, otherwise use CPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    model_id = "facebook/opt-125m"

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id

    # Define grammar as a string
    grammar_str = """

    root   ::= "The animal is a " animal "."

    animal ::= "cat" | "fish"

    """
    
    grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)

    # Generate
    prompts = [
        'The text says, "The animal is a dog." The answer is obvious. ', #'I\'m going to say "The animal is a dog." Here I go! '
              ]
    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)

    # Set up streaming
    streamer = TextStreamer(tokenizer)

    output = model.generate(
        input_ids,
        max_length=50,
        logits_processor=[grammar_processor],
        repetition_penalty=1.1,
        num_return_sequences=1,
        streamer=streamer
    )

Transformers Pipeline

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor

# Load model and tokenizer
model_id = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# Detect if GPU is available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

# Define grammar string
json_grammar = """

root   ::= "The animal is a " animal "."

animal ::= "cat" | "fish"

"""

grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
grammar_processor = GrammarConstrainedLogitsProcessor(grammar)

# Initialize pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    max_new_tokens=100,
    batch_size=2,
)

# Generate text
generations = pipe(
    [
        'The text says, "The animal is a dog." The answer is obvious. ',
        'I\'m going to say "The animal is a dog." Here I go! '
    ],
    do_sample=False,
    logits_processor=[grammar_processor],
)

# Print results
for generation_group in generations:
    for generation in generation_group:
        print(generation['generated_text'])

LlamaCPP Python

Coming soon!

πŸ’‘ Why use transformers-cfg?

  • EBNF Grammar Support: Uses Extended Backus-Naur Form (EBNF) for grammar description.
  • Seamless Integration: Compatible with the llama-cpp project for easy replacement.
  • Broad Model Compatibility: Works with all models in the πŸ€— Transformers library.
  • Multilingual Grammar Support: Enables grammars in various languages, including Chinese (δΈ­ζ–‡), Japanese (ζ—₯本θͺž), Korean (ν•œκ΅­μ–΄), Hindi (ΰ€Ήΰ€Ώΰ€¨ΰ₯ΰ€¦ΰ₯€), Hebrew (Χ’Χ‘Χ¨Χ™Χͺ), Arabic (Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©), and emoji (πŸ€—).

πŸ€” What is a grammar?

Think of it as an enhanced version of regular expressions.

Valid JSON object

root ::= object
object ::= "{" pair ("," pair)* "}"
pair ::= string ":" value
string ::= '"' [a-zA-Z0-9]* '"'
value ::= string | object | "true" | "false" | "null"

For advanced grammar debugging, see our debugging guide.

πŸ›  JSON schema

Learn to create grammars for complex JSON objects in our documentation.

πŸ“œ Grammar collection

We maintain a collection of grammars in examples/grammars, aligned with llama-cpp:

βœ… Supported models

See supported_models.yaml for the full list whose extent is constantly being updated.

If you encounter an unsupported model, please open an issue or submit a pull request.

πŸ“– Citation

If you find this work useful, please cite it with the reccomended citation:

@inproceedings{geng-etal-2023-grammar,
  title        = {Grammar-Constrained Decoding for Structured {NLP} Tasks without Finetuning},
  author       = {Geng, Saibo and Josifoski, Martin and Peyrard, Maxime and West, Robert},
  year         = 2023,
  month        = dec,
  booktitle    = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  publisher    = {Association for Computational Linguistics},
  address      = {Singapore},
  url          = {https://aclanthology.org/2023.emnlp-main.674},
  editor       = {Bouamor, Houda and Pino, Juan and Bali, Kalika}
}

πŸ“œ License

This project is licensed under the MIT License.

πŸ™Œ Acknowledgements

Derived from torch-grammars, which was based on llama-cpp.

About

πŸ€— A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages