v0.5.1

RyanUnderhill released this 13 Nov 21:26

· 15 commits to main since this release

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

Add ability to choose provider and modify options at runtime
Fixed data leakage bug with KV caches

Features in 0.5.0:

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11