llama : support IBM Granite architecture (#9412)
* feat(gguf-py): Add Granite model and params to gguf-py
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(convert_hf_to_gguf): Add registration and param setup for Granite
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(llama.cpp): Add config parsing for Granite multiplier params
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(llama.cpp): First pass at full port of granite deviations from llama
Something is still not working right since the results are mostly terrible,
but on occasion it's producing relevant results at this point, so
_something_ is working.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(llama.cpp): Determine granite language 3b instruct by vocab size
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(convert_hf_to_gguf): Use LlamaModel as base for GraniteModel
The defaults in LlamaModel are needed for Granite as well
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(llama.cpp): Switch Granite param names to use _scale for consistency
Other scalar multipliers are called *_scale, so this provides a more
consistent naming convention.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(convert_hf_to_gguf/gguf-py): _multiplier -> _scale
The transformers names with _multiplier will now be converted to the _scale
equivalent during conversion.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(llama.cpp): Use separate switch clause for granite in llm_load_hparams
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <[email protected]>
---------
Signed-off-by: Gabe Goodhart <[email protected]>