Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Chameleon #8543

Merged
merged 24 commits into from
Sep 28, 2024
Merged

Add support for Chameleon #8543

merged 24 commits into from
Sep 28, 2024

Conversation

nopperl
Copy link
Contributor

@nopperl nopperl commented Jul 17, 2024

This PR adds support for the Chameleon model. For now, this implementation only supports text->text inference and serves as base to implement the (more interesting) image->text, text->image and interleaved pipelines. However, such an implementation will probably require some changes to the CLI and internal architecture, so I suggest to do this in a separate PR.

Chameleon is based on the Llama-2 architecture with the following changes:

  • different (pre-)tokenizer
  • qk-norm
  • swin-norm

Note 1: in order to enable text->text inference, the image token logits are suppressed similar to the HF implementation. This needs to be removed when support for images is added.

Note 2: I implemented swin-norm, but I haven't tested it yet, as it is only used by Chameleon-30B.

To test it:

git clone https://huggingface.co/facebook/chameleon-7b
./convert-hf-to-gguf.py chameleon-7b
build/bin/llama-cli -m chameleon-7b/ggml-model-f16.gguf --temp 0.8 -s 1000 -n 50 -p "Language modeling is " -ngl 33

Output:

Language modeling is “the task of predicting the next word in a sequence of text, given the previous words.”

To implement a language model, we can use a neural network with a bidirectional LSTM layer and a softmax output layer.

Reference (requires transformers>=4.43.0.dev0):

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
set_seed(1000)
model = AutoModelForCausalLM.from_pretrained("facebook/chameleon-7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/chameleon-7b")
prompt = "Language modeling is "
inputs = tokenizer.encode(prompt, return_pt=True)
out = model.generate(inputs, max_new_tokens=40)
tokenizer.decode(out)

Reference output:

Language modeling is “the task of predicting the next word in a sequence of text given the previous words.”

In other words, it's a machine learning model that takes a sequence of text as input

Partially addresses #7995.

@github-actions github-actions bot added the python python script changes label Jul 17, 2024
@nopperl
Copy link
Contributor Author

nopperl commented Jul 17, 2024

I have uploaded GGUFs to test this PR with here.

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jul 19, 2024
gguf-py/gguf/gguf_writer.py Outdated Show resolved Hide resolved
src/llama.cpp Outdated Show resolved Hide resolved
src/llama.cpp Outdated Show resolved Hide resolved
src/llama.cpp Outdated Show resolved Hide resolved
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 15, 2024
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 16, 2024
@nate-lrt
Copy link

will this ever get added :(

@nopperl
Copy link
Contributor Author

nopperl commented Sep 26, 2024

I think it would still be a good addition. I've resolved all conflicts with master now, so it should be ready to merge.

@ggerganov ggerganov merged commit 9a91311 into ggerganov:master Sep 28, 2024
55 checks passed
@arch-btw
Copy link
Contributor

Thank you @nopperl looks like it got merged!

matiaslin pushed a commit to matiaslin/llama.cpp that referenced this pull request Sep 28, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <[email protected]>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <[email protected]>
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <[email protected]>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <[email protected]>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <[email protected]>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <[email protected]>
@MasterScrat
Copy link

@nopperl any plans to tackle image->text and text->image?

@nopperl
Copy link
Contributor Author

nopperl commented Dec 19, 2024

@MasterScrat currently no plans, sorry for the late reply. AFAIK multimodal support would require a refactor of llama.cpp (#8010 (comment)). I'd love to work on it, but don't have the time right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants