Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for encoder-only T5 models #8960

Merged
merged 5 commits into from
Aug 10, 2024

Conversation

fairydreaming
Copy link
Collaborator

@fairydreaming fairydreaming commented Aug 9, 2024

This PR adds support for encoder-only T5 models. It also adds support for LLAMA_POOLING_TYPE_NONE in llama-embedding tool for purposes of testing this PR. Fixes #8900.

Model for testing this PR can be downloaded from https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
You have to copy files from text_encoder_3 and tokenizer_3 into a single directory and convert the resulting model with convert_hf_to_gguf.py.

Here's an example transformers script that can be used to print embeddings generated with the original tensorflow model:

from transformers import T5EncoderModel, T5TokenizerFast

id = "stabilityai/stable-diffusion-3-medium-diffusers"

tokenizer_3 = T5TokenizerFast.from_pretrained(
    id,
    subfolder='tokenizer_3'
)

text_inputs = tokenizer_3("translate English to German: The house is wonderful.", return_tensors="pt")
text_input_ids = text_inputs.input_ids
print(text_input_ids)

text_encoder_3 = T5EncoderModel.from_pretrained(
    id, 
    subfolder="text_encoder_3"
)

prompt_embeds = text_encoder_3(text_input_ids)[0]
print(prompt_embeds)

compare the resulting embeddings with the output of the following command:

./llama-embedding -m models/sd3-text-encoder.gguf -p "translate English to German: The house is wonderful." --pooling none --embd-normalize -1

Stable diffusion 3 uses T5TokenizerFast, so there may be some tokenization differences with more complex prompts (llama.cpp Unigram tokenizer is compatible with the "slow" tokenizer class T5Tokenizer).

* gguf-py : add T5ENCODER model architecture

* convert-hf : add T5EncoderModel

* llama : add llama_model_has_decoder() API function

* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()

* llama : add support for LLM_ARCH_T5ENCODER

* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE

* llama-embedding : add support for encoder-only models
@github-actions github-actions bot added examples python python script changes labels Aug 9, 2024
@mofosyne mofosyne added medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level and removed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Support T5-based encoder-only models
4 participants