Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
llama : support RWKV v6 models (ggerganov#8980)
* convert_hf_to_gguf: Add support for RWKV v6 Signed-off-by: Molly Sophia <[email protected]> * Add RWKV tokenization * Fix build Signed-off-by: Molly Sophia <[email protected]> * Do not use special tokens when matching in RWKV tokenizer * Fix model loading * Add (broken) placeholder graph builder for RWKV * Add workaround for kv cache * Add logits conversion to rwkv5 * Add rwkv5 layer norms * Add time mix KVRG & correct merge mistake * Add remaining time mix parameters * Add time mix output loading * Add placeholder llm_build_time_mix * Fix build Signed-off-by: Molly Sophia <[email protected]> * Load more tensors for rwkv v6 Signed-off-by: Molly Sophia <[email protected]> * Fix rwkv tokenizer Signed-off-by: Molly Sophia <[email protected]> * ggml: Add unary operator Exp Signed-off-by: Molly Sophia <[email protected]> * RWKV v6 graph building Signed-off-by: Molly Sophia <[email protected]> * Add ``rescale_every_n_layers`` parameter Signed-off-by: Molly Sophia <[email protected]> * Add ``wkv.head_size`` key for RWKV so it doesn't reuse Mamba ssm parameters Signed-off-by: Molly Sophia <[email protected]> * Fix offloading layers to CUDA Signed-off-by: Molly Sophia <[email protected]> * Fix parallel inferencing for RWKV Signed-off-by: Molly Sophia <[email protected]> * Remove trailing whitespaces Signed-off-by: Molly Sophia <[email protected]> * build_rwkv: Avoid using inplace operations Signed-off-by: Molly Sophia <[email protected]> * convert_hf_to_gguf: rwkv: Avoid using ``eval`` Signed-off-by: Molly Sophia <[email protected]> * convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually Signed-off-by: Molly Sophia <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * ggml: Add backward computation for unary op ``exp`` Signed-off-by: Molly Sophia <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV Signed-off-by: Molly Sophia <[email protected]> * build_rwkv6: Simplify graph Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Detect model.type Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Fix tensor loading for 7B/14B models Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Fix group_norm assertion failure with Metal Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Clean up Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Add quantization tensor exclusion Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Use the new advanced batch splits Signed-off-by: Molly Sophia <[email protected]> * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm`` Co-authored-by: compilade <[email protected]> * llama: rwkv6: Apply code style and misc changes Signed-off-by: Molly Sophia <[email protected]> * converter: Use class name ``Rwkv6Model`` Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Make use of key ``feed_forward_length`` Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim`` Signed-off-by: Molly Sophia <[email protected]> * converter: Match ``new_name`` instead of ``name`` for float32 explicit tensors Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Keep ``time_mix_w1/w2`` as F32 Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Remove unused nodes Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Apply code format changes Signed-off-by: Molly Sophia <[email protected]> * llama: rwkv6: Add lora for some supported tensors Currently att.key/receptance/value/gate/output, ffn.receptance/key/value, as well as head.weight Signed-off-by: Molly Sophia <[email protected]> * rwkv : speed-up tokenization using trie * minor : style + indentation * llama: rwkv6: Avoid division by zero Co-authored-by: compilade <[email protected]> * ggml: rwkv_wkv: Avoid copying the state Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Layl Bongers <[email protected]> Co-authored-by: compilade <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
- Loading branch information