Replies: 5 comments 1 reply
-
Wonderful! Great Job! |
Beta Was this translation helpful? Give feedback.
-
int8 quantization work is in progress. The new layer code is being upstreamed at PR #5007. |
Beta Was this translation helpful? Give feedback.
-
Pushed int8 support anyway. You need to pull the branch from my PR to use quantization. |
Beta Was this translation helpful? Give feedback.
-
The same instructions also work for the 13B model. The 70B model is not tested due to insufficient memory on my machines. |
Beta Was this translation helpful? Give feedback.
-
Would you please share the custom made converter? I intend to run llama2 with ncnn. |
Beta Was this translation helpful? Give feedback.
-
Download weights from Meta
https://ai.meta.com/resources/models-and-libraries/llama-downloads/
Convert the weights to llama2.c format
*instructions written for Linux.
Convert the model in llama2.c format into an ncnn model
Note: this process is done using a converter custom-built for Llama 2 models to avoid using pnnx because of pnnx being memory-inefficient.
Use the provided inference code
Beta Was this translation helpful? Give feedback.
All reactions