Stop the generation when <|eom_id|> token is encountered (needed for llama 3.1 tool call support) #8858

fairydreaming · 2024-08-04T19:18:01Z

This PR adds support for <|eom_id|> token introduced by llama 3.1 models. It adds the EOM token to the list of tokens that stop the generation. This is necessary to allow proper tool call support, see https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/ for more details.

Note that it doesn't add any tool call support in llama.cpp, it only stops the generation after <|eom_id|> to allow implementation of tool calls in other software that uses llama.cpp for inference.

I don't feel confident enough to tinker with LlamaModel::set_vocab() in convert_hf_to_gguf.py script to explicitly set the EOM token value during conversion, so it's currently found during vocabulary loading like EOT tokens.

I created a simple script allowing to test the llama 3.1 tool calling with llama-server: https://github.com/fairydreaming/tlcl

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…ll tokens.

… Llama 3.1 tool call support (ggerganov#8858) * gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token * llama : find Llama-3.1 <|eom_id|> token id during vocab loading * llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

sszymczy and others added 5 commits July 30, 2024 16:57

llama-vocab, llama : handle <|eom_id|> Llama-3.1 token

cc50e78

gguf-py : add constants and method related to <|eom_id|> token

f10b0e2

Merge branch 'ggerganov:master' into handle-eom-token

3878b39

llama : Use token_to_id map find() method instead of iterating over a…

0b72113

…ll tokens.

llama : whitespace formatting

5efd826

github-actions bot added the python python script changes label Aug 4, 2024

ggerganov approved these changes Aug 5, 2024

View reviewed changes

fairydreaming merged commit d3f0c71 into ggerganov:master Aug 5, 2024
54 checks passed

fairydreaming mentioned this pull request Aug 6, 2024

Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop the generation when <|eom_id|> token is encountered (needed for llama 3.1 tool call support) #8858

Stop the generation when <|eom_id|> token is encountered (needed for llama 3.1 tool call support) #8858

fairydreaming commented Aug 4, 2024 •

edited

Loading

Stop the generation when <|eom_id|> token is encountered (needed for llama 3.1 tool call support) #8858

Stop the generation when <|eom_id|> token is encountered (needed for llama 3.1 tool call support) #8858

Conversation

fairydreaming commented Aug 4, 2024 • edited Loading

fairydreaming commented Aug 4, 2024 •

edited

Loading