Skip to content

Releases: kherud/java-llama.cpp

Version 3.4.1

06 Sep 20:28
Compare
Choose a tag to compare

This version is a minor fix for problems with the pre-built shared libraries on Linux x86_64.

Version 3.4.0

06 Sep 18:25
Compare
Choose a tag to compare

Version 3.4.0

Credit goes to @shuttie for adding CUDA support on Linux x86_64 with this version.

Version 3.3.0

07 Aug 19:07
Compare
Choose a tag to compare

Upgrade to latest llama.cpp version b3534

Version 3.2.1

27 May 18:18
Compare
Choose a tag to compare
  • Include GGML backend in text log
  • Update to llama.cpp b3008

Version 3.2.0

25 May 10:00
Compare
Choose a tag to compare

Logging Re-Implementation (see #66)

  • Re-adds logging callbacks via LlamaModel#setLogger(LogFormat, BiConsumer<LogLevel, String>)
  • Removes dis-functional ModelParameters#setLogDirectory(String), ModelParameters#setDisableLog(boolean), andModelParameters#setLogFormat(LogFormat)

Version 3.1.1

22 May 20:36
Compare
Choose a tag to compare
  • Adds chat template support (credit to @lesters #64)
  • Updates to llama.cpp b2969
  • Adds explicit Phi-3 128k support

Version 3.1.0

15 May 19:29
Compare
Choose a tag to compare

Changes:

  • Updates to llama.cpp b2885
  • Fixes #62 (generation can now be canceled)
  • Fixes macos x64 shared libraries

API changes:

  • LlamaModel.Output is now LlamaOutput
  • LlamaIterator is now public, was private LlamaModel.Iterator previously

Version 3.0.2

06 May 19:59
Compare
Choose a tag to compare

Upgrade to llama.cpp b2797

  • Adds explicit support for Phi-3
  • Adds flash attention
  • Fixes #54

Version 3.0.1

21 Apr 15:49
4c58561
Compare
Choose a tag to compare
  • Updated the binding to llama.cpp b702 to add llama 3 support
  • Fix #54 by using codellama for testing

Version 3.0.0

07 Apr 20:00
Compare
Choose a tag to compare

Version 3.0 updates to the newest available version of llama.cpp and all its available features. The Java binding reworks almost all of the C++ code. It heavily relies on the llama.cpp server code, which theoretically should lead to much better performance, concurrency, and long-term maintainability.

The biggest change is how model and inference parameters are handled (see examples for details). Previous versions relied on properly typed Java classes, whereas the C++ server code mostly uses JSON. The JNI code to transfer the parameters from Java to C++ was complex and error-prone. The new version comes with almost no API changes regarding how parameters are handled (apart from the available parameters per se), but should be much easier to maintain in the long term.