Releases: b4rtaz/distributed-llama
Releases · b4rtaz/distributed-llama
0.6.0
This version changes the name of the main
application into dllama
. From now to run the root node or a worker you need to compile dllama
and run the dllama
application.
make dllama
./dllama inference --model ... --tokenizer ...
Also this version introduces an early stage HTTP api compatible with the OpenAI api (only the /v1/chat/completions
endpoint). How to run the api you can find here. A big shout out to @DifferentialityDevelopment for implementing this feature. #39
0.5.2
0.5.1
- feat: use avx2 to speedup matmulQ40 (by @DifferentialityDevelopment)
0.5.0
0.4.0
0.3.1
0.3.0
- New tokenizer format (old tokenizer files are not supported, please regenerate tokenizer files).
- Added Llama 3 support.
- Simple-server mode, check this example: nodejs-example.cjs Now you may use Distributed Llama as a simple LLM server.