-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the future plan of model expansion? #1380
Comments
My personal take on how to tc might support a broader set of models: Because the model description is part of the torchchat tree, there's a natural limit to the types of models that can be supported to those that can fit the general infra that torchchat supports. Of course, the model.py could be made arbitrarily complex, but that doesn't seem desirable. I can see three possible directions: (2) requires gguf import to track new features, and limits models to those supported bu GGUF. Here's an attempt at implementing a solution that allows users to bring their own models (does not support export, and sidesteps the query formatting by adding support for and using pre-tokenized text inputs) for phi-3-mini: This introduces an option --cuxtom-builder, which can be using the following invocation:
Example run: To make it exportable, we'd want to avoid using components that can't be exported (likely the HF Cache, possibly others), either by changing the source code directly, or using a model rewrited for those components similar to what we use today for quantization in torchchat for aoti & et, or to introduce the et optimization sdpa_with_kv_cache for mobile backends. |
Great Question @jenniew. Like you mentioned, model support is currently biased towards Llama/Transformer architectures, but we intend for the inference pipeline to be built model agnostic. The upcoming models are Llava and Granite Code Models (though both are Transformer based), with Mamba's (SSM) being on my radar. The ultimate plan is to create a simple interface between Model Definitions (architecture, compile, export) and Inference Pipeline (generate, chat, browser, openai api) such that onboarding becomes easier (e.g. leaning on torchtune for models instead of hosting it ourselves). @mikekgfb shows a promising approach above as well as mentioning GGUF being an approach. @jenniew If you have a particular model/architecture/artifact in mind, you can share here or send me a message, and we can give more detailed suggestions |
Like @Jack-Khuu mentioned. We need to make some architecture changes and create a model adding flow so it's easy for anyone to add models. In the meantime, feel free to ask for a specific model. |
🚀 The feature, motivation and pitch
I see current torchchat only support a few kinds of model, like llama based(liked) architecture, or pre-defined Transformer architecture models. Is there any plan to support other kinds of model architecture in the future? which kinds of model you're considering to add? If there is a new model whose architecture is not in the supporting list, is there a way to run it?
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
The text was updated successfully, but these errors were encountered: