Releases: tangledgroup/llama-cpp-cffi
v0.2.0
Added:
- New high-level Python API
- Low-level C API calls from llama.h, llava.h, clip.h, ggml.h
completions
for high-level function for LLMs / VLMstext_completions
for low-level function for LLMsclip_completions
for low-level function for CLIP-based VLMs- WIP:
mllama_completions
for low-level function for Mllama-based VLMs
Changed:
- All examples
Removed:
llama_generate
functionllama_cpp_cli
llava_cpp_cli
minicpmv_cpp_cli
v0.1.22
v0.1.16
v0.1.15
v0.1.14
0.1.13
v0.1.12
Added:
- Build vulkan_1_x
for general GPU.
- Build cuda 12.4.1
as default.
Changed:
- Renamed examples for TinyLlama (chat, tool calling) and OpenAI.
- Updated demo models definitions.
- Updated examples (chat, tool calling).
- get_special_tokens
not supports parameter force_standard_special_tokens: bool=False
which bypasses tokenizer's special tokens with standard/common ones.
- Build cuda 12.5.1
as additional build target but packaged on PyPI.
- Build cuda 12.6
as additional build target but packaged on PyPI.
- Build openblas
as additional build target but packaged on PyPI.
Fixed:
- Handle Options.no_display_prompt
on Python side.
v0.1.11
v0.1.10
Added:
- In openai
, support for prompt
and extra_body
. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L41
- Pass llama-cli
options to openai
.
- util
module with is_cuda_available
function.
- openai
supports both prompt
and messages
. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L45
v0.1.9
Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.
Changed:
- Build scripts now have separate step/function cuda_12_5_1_setup
which setups CUDA 12.5.1 env for build-time.
Fixed:
- Stop thread in llama_generate
on GeneratorExit
.
Removed:
- callback
parameter in llama_generate
and dependent functions.