Skip to content

Releases: tangledgroup/llama-cpp-cffi

v0.2.0

11 Dec 10:58
Compare
Choose a tag to compare

Added:

  • New high-level Python API
  • Low-level C API calls from llama.h, llava.h, clip.h, ggml.h
  • completions for high-level function for LLMs / VLMs
  • text_completions for low-level function for LLMs
  • clip_completions for low-level function for CLIP-based VLMs
  • WIP: mllama_completions for low-level function for Mllama-based VLMs

Changed:

  • All examples

Removed:

  • llama_generate function
  • llama_cpp_cli
  • llava_cpp_cli
  • minicpmv_cpp_cli

v0.1.22

27 Nov 08:27
Compare
Choose a tag to compare

Added:

  • llava high-level API calls
  • minicpmv high-level API support

v0.1.16

02 Sep 06:38
Compare
Choose a tag to compare

Added:
- Updated llama.cpp.

v0.1.15

20 Aug 06:56
Compare
Choose a tag to compare

Added:
- SmolLM-1.7B-Instruct-v0.2 examples.
- Updated llama.cpp.

v0.1.14

17 Aug 06:50
Compare
Choose a tag to compare

Fixed:
- Vulkan detection.

0.1.13

16 Aug 20:05
Compare
Choose a tag to compare

Fixed:
- CUDA and Vulkan detection.

v0.1.12

16 Aug 12:31
Compare
Choose a tag to compare

Added:
- Build vulkan_1_x for general GPU.
- Build cuda 12.4.1 as default.

Changed:
- Renamed examples for TinyLlama (chat, tool calling) and OpenAI.
- Updated demo models definitions.
- Updated examples (chat, tool calling).
- get_special_tokens not supports parameter force_standard_special_tokens: bool=False which bypasses tokenizer's special tokens with standard/common ones.
- Build cuda 12.5.1 as additional build target but packaged on PyPI.
- Build cuda 12.6 as additional build target but packaged on PyPI.
- Build openblas as additional build target but packaged on PyPI.

Fixed:
- Handle Options.no_display_prompt on Python side.

v0.1.11

12 Aug 07:31
Compare
Choose a tag to compare

Changed:
- openai: allow import of routes and v1_chat_completions handler.
- examples/demo_0.py, tool calling example

v0.1.10

30 Jul 12:35
Compare
Choose a tag to compare

Added:
- In openai, support for prompt and extra_body. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L41
- Pass llama-cli options to openai.
- util module with is_cuda_available function.
- openai supports both prompt and messages. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L45

v0.1.9

30 Jul 06:47
Compare
Choose a tag to compare

Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.

Changed:
- Build scripts now have separate step/function cuda_12_5_1_setup which setups CUDA 12.5.1 env for build-time.

Fixed:
- Stop thread in llama_generate on GeneratorExit.

Removed:
- callback parameter in llama_generate and dependent functions.