Llama #35

laggui · 2024-05-22T19:52:36Z

Bringing the first official Llama implementation to Burn! With pre-trained weights in mpk format (hosted on HF hub).

Currently the top-p sampling is done on CPU before decoding since Burn is missing categorical distribution sampling. We could improve that once everything else is done.

Closes #20

…ts can be loaded w/o issue

laggui · 2024-05-24T18:48:34Z

Currently downloading the Llama 3 8B Instruct to have a chat mode available for Llama 3 as well.

Also need to update the README to provide a bit more info.

Otherwise everything is ready to go 💪

/edit
Actually, a small note: even TinyLlama's record takes ~50sec to load on my machine.. so we could try to improve that but that is on Burn's side.

laggui · 2024-05-29T15:33:15Z

Tested with wgpu and tch (gpu). I think this is ready for review!

TinyLlama results on my dev machine:

Wgpu

Loading record...
Loaded in 20s
Processing prompt: How many helicopters can a human eat in one sitting?
> It's impossible to know for certain how many helicopters a human can eat in one sitting. However, it's generally accepted that humans have a limited appetite and can only eat a small amount of food at a time.

50 tokens generated (3.5432 tokens/s)

Generation completed in 0m14s

LibTorch<f16>

Loading record...
Loaded in 18s
Processing prompt: How many helicopters can a human eat in one sitting?
> It's impossible to know for certain how many helicopters a human can eat in one sitting. However, it's generally accepted that humans have a limited appetite and can only eat a small amount of food at a time.

50 tokens generated (21.6305 tokens/s)

Generation completed in 0m2s

Pretty big difference 😅

nathanielsimard

LGTM, I have only one small comment, but otherwise very good job! 👏

llama-burn/src/sampling.rs

laggui · 2024-08-07T13:27:43Z

Weights have been updated to use the named mpk format (much faster now that data is treated as bytes with serde). In follow-up PRs we will add quantization and support for Llama 3.1.

laggui added 15 commits May 3, 2024 14:43

Add Llama-3-8b

ac9a032

Fix next_token_logits softmax dim

d15e363

Fix RoPE dim and mask cache_seq_len

3fa0fbc

Move RotaryEncoding outside of Transformer block so pytorch checkpoin…

2500ef0

…ts can be loaded w/o issue

Fix sampling sort dim

78fbc2e

Fix attention mask

f8a934c

Add load/save model

f958603

Add tiny llama w/ sentencepiece tokenizer

ffab4ae

Add tiny llama checkpoint loading w/ keys remap

4ddf2bc

Fix top-p sampling index

7623cf4

Add tiny llama and llama3 feature flags

dff88b4

Add default prompt and replace sentencepiece newline tokens

ed1277c

Add pretrained model/tokenizer download

fb57388

Use sentencepiece from hf tokenizers

edbf3a6

Fix TinyLlama query/key weights and add chat mode

67740e3

laggui marked this pull request as ready for review May 24, 2024 18:47

laggui added 11 commits May 24, 2024 15:12

Update README and features deps

9492e20

Switch to f16

59061d1

Use BinFileRecorder

fe35d98

Fix tiktoken special tokens index offset (could lead to decoding error)

98be063

Fix prompt format for different tokenizers

456c262

Add llama-3-8b-instruct

86812fd

Add readme with chat example

9b61726

Add llama burn generated image

8c9d9e2

Add llama to repo README

7fa84c0

Link to models root not readme file

7946acd

Fix typo

93f9ecc

laggui mentioned this pull request May 28, 2024

Fix DataSerialize conversion for elements of the same type tracel-ai/burn#1832

Merged

1 task

Pin burn git rev and add tch/wgpu feature flags

553f3dd

Update readme and change rev to fix zip version

46f91ee

laggui requested review from antimora and nathanielsimard May 29, 2024 15:30

nathanielsimard approved these changes May 31, 2024

View reviewed changes

llama-burn/src/sampling.rs Outdated Show resolved Hide resolved

laggui added 6 commits July 4, 2024 15:31

Add bin weights note

7b3daae

Update TensorData usage

b5ddcc1

Switch to NamedMpkFileRecorder

ee7bb49

Cleanup todo

86eb69f

Fix squeezenet half precision flag

f72647b

Update burn pinned revision

c3ec71e

laggui requested a review from nathanielsimard August 7, 2024 13:27

nathanielsimard approved these changes Aug 9, 2024

View reviewed changes

nathanielsimard merged commit 7ebd9e3 into main Aug 12, 2024
2 checks passed

nathanielsimard deleted the llama branch August 12, 2024 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama #35

Llama #35

laggui commented May 22, 2024 •

edited

Loading

laggui commented May 24, 2024 •

edited

Loading

laggui commented May 29, 2024

nathanielsimard left a comment

laggui commented Aug 7, 2024

Llama #35

Llama #35

Conversation

laggui commented May 22, 2024 • edited Loading

laggui commented May 24, 2024 • edited Loading

laggui commented May 29, 2024

nathanielsimard left a comment

Choose a reason for hiding this comment

laggui commented Aug 7, 2024

laggui commented May 22, 2024 •

edited

Loading

laggui commented May 24, 2024 •

edited

Loading