Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flan-t5 models load fail with llama-server #8993

Closed
wants to merge 1 commit into from

Conversation

kylo5aby
Copy link
Contributor

@kylo5aby kylo5aby commented Aug 12, 2024

Fixes: #8990

@kylo5aby kylo5aby changed the title Fix T5 model load fail with llama-server Fix flan-t5 models load fail with llama-server Aug 12, 2024
@fairydreaming
Copy link
Collaborator

I'm not sure if there is a point in doing this, since T5 models require llama_encode() call and preparation of a custom input for llama_decode() (with decoder start tokens). This is not implemented in the current llama-server implementation, so even if the model loads it won't work correctly. But if you are willing to extend your PR by implementing the comprehensive support for T5 models in llama-server then you have my blessings.

@fairydreaming
Copy link
Collaborator

It looks like T5 model loading in llama-server is already fixed by #8997

@ggerganov
Copy link
Owner

Yup, this should fix the loading, but llama_encode() support needs more work

@ggerganov ggerganov closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: GGML_ASSERT(llama_add_eos_token(model) != 1) failed llama-server critical error with flan-t5 models
3 participants