-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama #35
Llama #35
Conversation
…ts can be loaded w/o issue
Currently downloading the Llama 3 8B Instruct to have a chat mode available for Llama 3 as well. Also need to update the README to provide a bit more info. Otherwise everything is ready to go 💪 /edit |
Tested with wgpu and tch (gpu). I think this is ready for review! TinyLlama results on my dev machine:
Pretty big difference 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I have only one small comment, but otherwise very good job! 👏
Weights have been updated to use the named mpk format (much faster now that data is treated as bytes with serde). In follow-up PRs we will add quantization and support for Llama 3.1. |
Bringing the first official Llama implementation to Burn! With pre-trained weights in mpk format (hosted on HF hub).
Currently the top-p sampling is done on CPU before decoding since Burn is missing categorical distribution sampling. We could improve that once everything else is done.
Closes #20