Tutorial: deploy Llama 2 7B with ncnn #4985

lrw04 · 2023-08-26T12:44:38Z

lrw04
Aug 26, 2023

Download weights from Meta

https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Convert the weights to llama2.c format

*instructions written for Linux.

cd <your-work-directory>
git clone https://github.com/lrw04/llama2.c-to-ncnn
cd llama2.c-to-ncnn
python -m venv venv
source venv/bin/activate
pip install numpy
python convert.py --outfile <output file> <model directory>

Convert the model in llama2.c format into an ncnn model

Note: this process is done using a converter custom-built for Llama 2 models to avoid using pnnx because of pnnx being memory-inefficient.

make
./convert <output file in previous step> <ncnn model name>

Use the provided inference code

./inference <ncnn model name> <prompt> <total-out-tokens>

LRY89757 · 2023-09-04T15:11:14Z

LRY89757
Sep 4, 2023

Wonderful! Great Job!

0 replies

lrw04 · 2023-09-07T04:05:50Z

lrw04
Sep 7, 2023
Author

int8 quantization work is in progress. The new layer code is being upstreamed at PR #5007.

0 replies

lrw04 · 2023-09-12T03:08:04Z

lrw04
Sep 12, 2023
Author

Pushed int8 support anyway. You need to pull the branch from my PR to use quantization.

0 replies

lrw04 · 2023-10-01T12:36:41Z

lrw04
Oct 1, 2023
Author

The same instructions also work for the 13B model. The 70B model is not tested due to insufficient memory on my machines.

0 replies

ahmedmustahid · 2024-08-08T08:20:47Z

ahmedmustahid
Aug 8, 2024

Download weights from Meta

https://ai.meta.com/resources/models-and-libraries/llama-downloads/

Convert the weights to llama2.c format

*instructions written for Linux.
cd <your-work-directory>
git clone https://github.com/lrw04/llama2.c-to-ncnn
cd llama2.c-to-ncnn
python -m venv venv
source venv/bin/activate
pip install numpy
python convert.py --outfile <output file> <model directory>
Convert the model in llama2.c format into an ncnn model

Note: this process is done using a converter custom-built for Llama 2 models to avoid using pnnx because of pnnx being memory-inefficient.
make
./convert <output file in previous step> <ncnn model name>
Use the provided inference code
./inference <ncnn model name> <prompt> <total-out-tokens>

Would you please share the custom made converter? I intend to run llama2 with ncnn.

1 reply

lrw04 Aug 8, 2024
Author

You would be using the custom converter if you were following the instructions. (Look at what you are downloading) Also, this is very flaky and you should probably use something like llama.cpp instead. They support Llama 3 AND a lot of other models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial: deploy Llama 2 7B with ncnn #4985

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Download weights from Meta

Convert the weights to llama2.c format

Convert the model in llama2.c format into an ncnn model

Use the provided inference code

{{title}}

Select a reply

Tutorial: deploy Llama 2 7B with ncnn #4985

lrw04 Aug 26, 2023

Download weights from Meta

Convert the weights to llama2.c format

Convert the model in llama2.c format into an ncnn model

Use the provided inference code

Replies: 5 comments · 1 reply

LRY89757 Sep 4, 2023

lrw04 Sep 7, 2023 Author

lrw04 Sep 12, 2023 Author

lrw04 Oct 1, 2023 Author

ahmedmustahid Aug 8, 2024

Download weights from Meta

Convert the weights to llama2.c format

Convert the model in llama2.c format into an ncnn model

Use the provided inference code

lrw04 Aug 8, 2024 Author

lrw04
Aug 26, 2023

Replies: 5 comments 1 reply

LRY89757
Sep 4, 2023

lrw04
Sep 7, 2023
Author

lrw04
Sep 12, 2023
Author

lrw04
Oct 1, 2023
Author

ahmedmustahid
Aug 8, 2024

lrw04 Aug 8, 2024
Author