Releases · b4rtaz/distributed-llama

19 May 19:25

b4rtaz

v0.6.0

d520994

0.6.0

This version changes the name of the main application into dllama. From now to run the root node or a worker you need to compile dllama and run the dllama application.

make dllama
./dllama inference --model ... --tokenizer ...

Also this version introduces an early stage HTTP api compatible with the OpenAI api (only the /v1/chat/completions endpoint). How to run the api you can find here. A big shout out to @DifferentialityDevelopment for implementing this feature. #39

Contributors

DifferentialityDevelopment

Assets 2

18 May 09:28

b4rtaz

v0.5.2

182fdcd

0.5.2

feat: use avx2 to speedup dotProduct
feat: use avx2 to speedup matmulF32

Assets 2

15 May 14:35

b4rtaz

v0.5.1

d1304c8

0.5.1

feat: use avx2 to speedup matmulQ40 (by @DifferentialityDevelopment)

Contributors

DifferentialityDevelopment

Assets 2

13 May 22:07

b4rtaz

v0.5.0

c9bb613

0.5.0

feat: splitting attention layers into all nodes. 🎉 🎉 🎉
fix: convert-llama.py supports different max_seq_len.

Assets 2

09 May 17:38

b4rtaz

v0.4.0

e93d1e6

0.4.0

feat: support for any number of threads.
fix: support max kv cache length.
feat: splitting RoPE into all nodes.

Assets 2

28 Apr 21:36

b4rtaz

v0.3.1

37fad6a

0.3.1

Changed order of QKV synchronization (details)
All tasks of Llama architecture are executed in parallel
Rope cache for Llama architecture

Assets 2

22 Apr 20:57

b4rtaz

v0.3.0

35694a4

0.3.0

New tokenizer format (old tokenizer files are not supported, please regenerate tokenizer files).
Added Llama 3 support.
Simple-server mode, check this example: nodejs-example.cjs Now you may use Distributed Llama as a simple LLM server.

Assets 2

11 Apr 21:29

b4rtaz

v0.2.0

620644a

0.2.0

Added Grok-1 support!

Breaking changes: you need to re-convert Llama 2 models to the new version.

Assets 2

23 Jan 23:07

b4rtaz

v0.1.1

f2137af

0.1.1

This version introduces partial optimization for x86_64 AVX2 CPUs. Now it's possible to run the inference with Q40 weights and Q80 buffer with partial AVX2 acceleration.

Assets 2

23 Jan 22:50

b4rtaz

v0.1.0

7eb77ca

0.1.0

Initial release! 🚢

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Releases: b4rtaz/distributed-llama

0.6.0

Contributors

0.5.2

0.5.1

Contributors

0.5.0

0.4.0

0.3.1

0.3.0

0.2.0

0.1.1

0.1.0