Add support for encoder-only T5 models #8960

fairydreaming · 2024-08-09T20:59:27Z

This PR adds support for encoder-only T5 models. It also adds support for LLAMA_POOLING_TYPE_NONE in llama-embedding tool for purposes of testing this PR. Fixes #8900.

Model for testing this PR can be downloaded from https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
You have to copy files from text_encoder_3 and tokenizer_3 into a single directory and convert the resulting model with convert_hf_to_gguf.py.

Here's an example transformers script that can be used to print embeddings generated with the original tensorflow model:

from transformers import T5EncoderModel, T5TokenizerFast

id = "stabilityai/stable-diffusion-3-medium-diffusers"

tokenizer_3 = T5TokenizerFast.from_pretrained(
    id,
    subfolder='tokenizer_3'
)

text_inputs = tokenizer_3("translate English to German: The house is wonderful.", return_tensors="pt")
text_input_ids = text_inputs.input_ids
print(text_input_ids)

text_encoder_3 = T5EncoderModel.from_pretrained(
    id, 
    subfolder="text_encoder_3"
)

prompt_embeds = text_encoder_3(text_input_ids)[0]
print(prompt_embeds)

compare the resulting embeddings with the output of the following command:

./llama-embedding -m models/sd3-text-encoder.gguf -p "translate English to German: The house is wonderful." --pooling none --embd-normalize -1

Stable diffusion 3 uses T5TokenizerFast, so there may be some tokenization differences with more complex prompts (llama.cpp Unigram tokenizer is compatible with the "slow" tokenizer class T5Tokenizer).

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* gguf-py : add T5ENCODER model architecture * convert-hf : add T5EncoderModel * llama : add llama_model_has_decoder() API function * llama : split build_t5() into build_t5_encoder() and build_t5_decoder() * llama : add support for LLM_ARCH_T5ENCODER * llama-embedding : add support for LLAMA_POOLING_TYPE_NONE * llama-embedding : add support for encoder-only models

…e graph only if the model contains encoder

sszymczy added 3 commits August 9, 2024 21:10

llama : whitespace formatting

94597ec

common : call llama_decode() during warmup only if the model has decoder

788b4d1

github-actions bot added examples python python script changes labels Aug 9, 2024

ggerganov approved these changes Aug 10, 2024

View reviewed changes

sszymczy added 2 commits August 10, 2024 10:35

llama : for clarity set is_encoding to true before building worst-cas…

f356e27

…e graph only if the model contains encoder

Merge remote-tracking branch 'upstream/master' into t5-encoder

31d9233

fairydreaming merged commit 7c3f55c into ggerganov:master Aug 10, 2024
54 of 55 checks passed

hitzelc mentioned this pull request Aug 11, 2024

Bug: Unintended behavior in llama_decode_internal when cparams.embeddings is True and cparams.pooling_type is LLAMA_POOLING_TYPE_NONE #8956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for encoder-only T5 models #8960

Add support for encoder-only T5 models #8960

fairydreaming commented Aug 9, 2024 •

edited

Loading

Add support for encoder-only T5 models #8960

Add support for encoder-only T5 models #8960

Conversation

fairydreaming commented Aug 9, 2024 • edited Loading

fairydreaming commented Aug 9, 2024 •

edited

Loading