Skip to content

Releases: Vali-98/ChatterUI

v0.7.9c

07 Aug 04:13
Compare
Choose a tag to compare

Fixes:

  • Example messages being re-injected into context, causing unwanted prompt processing.

v0.7.9b

06 Aug 09:39
Compare
Choose a tag to compare

Fixes:

  • Timestamp and Name fields not being calculated in prompt builder, causing overflow.

v0.7.9a

04 Aug 10:54
Compare
Choose a tag to compare

Features:

  • Updated cui-llama.rn / llama.cpp to enable Gemma 2 2B support

Fixes:

  • Longstanding issue with instruct suffixes not being added to context

v0.7.10-beta1

01 Aug 01:21
Compare
Choose a tag to compare
v0.7.10-beta1 Pre-release
Pre-release

Added Gemma 2 2B support

v0.7.9

30 Jul 05:15
Compare
Choose a tag to compare

v0.7.9

The Local Upgrade

Warning: This update attempts to load files from app assets which may fail as I have not yet tested this on multiple devices. Please report if you get stuck in a boot crash! This update is generally very experimental with a few changes to the core c++ code of llama.rn, so it may be unstable.

Features:

  • Local generation has migrated to cui-llama.rn , a fork of the fantastic llama.rn project, but with custom features tailored for ChatterUI:
    • Added stopping prompt processing between batches - more effective when used with low batch size.
    • vocab_only mode which allows for tokenizer only usage - this also removes the need for onnx-runtime and the old transformer.js adaptation for tokenizers, cutting down app size significantly!
    • Synchronous tokenization for ease of development
    • Context Shifting adapted from kobold.cpp (Thanks @LostRuins) - this allows you to use high context chats without needing to reprocess the entire context upon hitting context limit
  • Added support for i8mm compatible devices (Snapdragon 8 Gen 1 or newer / Exynos 2200 or newer)
    • This feature allows the use of Q4_0_4_8 quantization levels optimized for ARM devices.
    • It is recommended to requantize your models to this quantization level using the llama.cpp quantize tool:
      .\llama-quantize.exe --allow-requantize model.gguf Q4_0_4_8

Changes:

  • Local inferencing is now done as a background task! This should mean that tabbing out of the app should not stop inferencing.
  • Buttons in Local API menu now properly disable based on model state
  • The internal tokenizer now relies entirely on the much faster implementation in cui-llama.rn. As such the previous JS tokenizer has been removed alongside onnx-runtime, leading to much smaller APK size.

Fixes:

  • Continuing with local API now properly respects the regenCache
  • removed BOS token from default Llama 3 instruct preset

Dev:

  • Moved constants and components under app/, as this seems to affect react-native's Fast Refresh functionality significantly
  • Moved local api state to zustand this helps a lot with fast refresh bugginess in development and prevents the model state from being unloaded upon a refresh

v0.7.9-beta5

29 Jul 16:55
Compare
Choose a tag to compare
v0.7.9-beta5 Pre-release
Pre-release

Updated cui-llama.rn with Context Shift

v0.7.9-beta4

24 Jul 08:10
Compare
Choose a tag to compare
v0.7.9-beta4 Pre-release
Pre-release

Test build for i8mm instruction, providing x2-3 faster prompt processing on modern Android devices.

v0.7.9-beta3

23 Jul 03:55
Compare
Choose a tag to compare
v0.7.9-beta3 Pre-release
Pre-release

Test for sync with llamacpp

v0.7.9-beta2-unstable

18 Jul 09:11
Compare
Choose a tag to compare
v0.7.9-beta2-unstable Pre-release
Pre-release

WARNING: This build may break your install.

Testing new tokenizer system.

ChatterUI_0.7.9-beta1

16 Jul 23:54
Compare
Choose a tag to compare
ChatterUI_0.7.9-beta1 Pre-release
Pre-release

Experimental build with cui-llama.rn