Add support for encoder-only T5 models #8960
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for encoder-only T5 models. It also adds support for LLAMA_POOLING_TYPE_NONE in llama-embedding tool for purposes of testing this PR. Fixes #8900.
Model for testing this PR can be downloaded from https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
You have to copy files from text_encoder_3 and tokenizer_3 into a single directory and convert the resulting model with convert_hf_to_gguf.py.
Here's an example transformers script that can be used to print embeddings generated with the original tensorflow model:
compare the resulting embeddings with the output of the following command:
Stable diffusion 3 uses T5TokenizerFast, so there may be some tokenization differences with more complex prompts (llama.cpp Unigram tokenizer is compatible with the "slow" tokenizer class T5Tokenizer).