Releases · tangledgroup/llama-cpp-cffi · GitHub

30 Jul 06:47

mtasic85

v0.1.9

Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.

Changed:
- Build scripts now have separate step/function cuda_12_5_1_setup which setups CUDA 12.5.1 env for build-time.

Fixed:
- Stop thread in llama_generate on GeneratorExit.

Removed:
- callback parameter in llama_generate and dependent functions.

Assets 16

27 Jul 15:46

mtasic85

v0.1.8

Added:
- Model.tokenizer_hf_repo as optional in case when Model.creator_hf_repo cannot be used to tokenize / format prompt/messages.

Assets 16

26 Jul 14:28

mtasic85

v0.1.7

Added:
- Support for stop tokens/words.

Changed:
- llama/llama_cli.py unified CPU and CUDA 12.5 modules into single module.

Removed:
- Removed separate examples for CPU and CUDA 12.5 modules.

Assets 16

25 Jul 07:26

mtasic85

v0.1.6

Changed:
- Updated huggingface-hub.

Fixed:
- llama.__init__ now correctly imports submodules and handles CPU and CUDA backends.
- OpenAI: ctx_size: int = config.max_position_embeddings if max_tokens is None else max_tokens.

Assets 16

24 Jul 06:38

mtasic85

v0.1.5

Fixed:
- Build for linux, upx uses best compression option, 7z uses more aggressive compression.
- Do not use UPX for shared/dynamic library compression.

Assets 16