Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the future plan of model expansion? #1380

Open
jenniew opened this issue Nov 15, 2024 · 3 comments
Open

What is the future plan of model expansion? #1380

jenniew opened this issue Nov 15, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request Question Question about the repo as a whole

Comments

@jenniew
Copy link

jenniew commented Nov 15, 2024

🚀 The feature, motivation and pitch

I see current torchchat only support a few kinds of model, like llama based(liked) architecture, or pre-defined Transformer architecture models. Is there any plan to support other kinds of model architecture in the future? which kinds of model you're considering to add? If there is a new model whose architecture is not in the supporting list, is there a way to run it?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

@Jack-Khuu Jack-Khuu self-assigned this Nov 16, 2024
@Jack-Khuu Jack-Khuu added enhancement New feature or request Question Question about the repo as a whole labels Nov 16, 2024
@mikekgfb
Copy link
Contributor

mikekgfb commented Nov 18, 2024

My personal take on how to tc might support a broader set of models:

Because the model description is part of the torchchat tree, there's a natural limit to the types of models that can be supported to those that can fit the general infra that torchchat supports.

Of course, the model.py could be made arbitrarily complex, but that doesn't seem desirable. I can see three possible directions:
1 - add additional model-variant.py files for other types. This ultimately triggers the same limitation, because the number of models that may be supported is limited by the number of models distributed. It may also involve rights issues, because some of these models may contain copyrighted or patented portions.
2 - build models from GGUF, following the --gguf-path approach as per docs/GGUF.md
3 - allow users to bring their own model descriptions.

(2) requires gguf import to track new features, and limits models to those supported bu GGUF.
(3) allows users to build new models, but requires integration for tokenization and and for export (e.g., the HF cache is at present not exportable via AOTI and/or ET afaik)

Here's an attempt at implementing a solution that allows users to bring their own models (does not support export, and sidesteps the query formatting by adding support for and using pre-tokenized text inputs) for phi-3-mini:
https://github.com/mikekg/torchchat/tree/phichat

This introduces an option --cuxtom-builder, which can be using the following invocation:

python torchchat.py generate --custom-builder torchchat/model_python/phi-3-mini.py:model_builder --tokenizer-path /content/torchchat/tokenizer.model --prompt "[32010, 739, 471, 263, 6501, 322, 14280, 29891, 4646, 29892, 322, 32007, 2]"

Example run:
https://colab.research.google.com/drive/1HHONUbKqqXU9yU3BIrjH0dRWKdwgY34H?usp=sharing

To make it exportable, we'd want to avoid using components that can't be exported (likely the HF Cache, possibly others), either by changing the source code directly, or using a model rewrited for those components similar to what we use today for quantization in torchchat for aoti & et, or to introduce the et optimization sdpa_with_kv_cache for mobile backends.

@Jack-Khuu
Copy link
Contributor

Great Question @jenniew.

Like you mentioned, model support is currently biased towards Llama/Transformer architectures, but we intend for the inference pipeline to be built model agnostic. The upcoming models are Llava and Granite Code Models (though both are Transformer based), with Mamba's (SSM) being on my radar.

The ultimate plan is to create a simple interface between Model Definitions (architecture, compile, export) and Inference Pipeline (generate, chat, browser, openai api) such that onboarding becomes easier (e.g. leaning on torchtune for models instead of hosting it ourselves).

@mikekgfb shows a promising approach above as well as mentioning GGUF being an approach.

@jenniew If you have a particular model/architecture/artifact in mind, you can share here or send me a message, and we can give more detailed suggestions

@byjlw
Copy link
Contributor

byjlw commented Nov 19, 2024

Like @Jack-Khuu mentioned. We need to make some architecture changes and create a model adding flow so it's easy for anyone to add models.

In the meantime, feel free to ask for a specific model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Question Question about the repo as a whole
Projects
None yet
Development

No branches or pull requests

4 participants