b3612

github-actions released this 21 Aug 13:00

b40eb84

llama : support for `falcon-mamba` architecture (#9074)

* feat: initial support for llama.cpp

* fix: lint

* refactor: better refactor

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix: address comments

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* fix: add more cleanup and harmonization

* fix: lint

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* fix: change name

* Apply suggestions from code review

Co-authored-by: compilade <[email protected]>

* add in operator

* fix: add `dt_b_c_rms` in `llm_load_print_meta`

* fix: correct printf format for bool

* fix: correct print format

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* llama : quantize more Mamba tensors

* llama : use f16 as the fallback of fallback quant types

---------

Co-authored-by: compilade <[email protected]>

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3612