This repo tries to make RKNN LLM usage easier for people who don't want to read through Rockchip's docs.
Main repo is https://github.com/Pelochus/ezrknpu where you can find more instructions, documentation... for general use. This repo is intended for details in RKLLM and also how to convert models.
Keep in mind this repo is focused for:
- High-end Rockchip SoCs, mainly the RK3588
- Linux, not Android
- Linux kernels from Rockchip (as of writing 5.10 and 6.1 from Rockchip should work, if your board has one of these it will very likely be Rockchip's kernel)
First clone the repo:
git clone https://github.com/Pelochus/ezrknn-llm
Then run:
cd ezrknn-llm && bash install.sh
Run (assuming you are on the folder where your .rkllm
file is located):
rkllm qwen-chat-1_8B.rkllm # Or any other model you like
In order to do this, you need a Linux PC x86 (Intel or AMD). Currently, Rockchip does not provide ARM support for converting models, so can't be done on a Orange Pi or similar. Run:
docker run -it pelochus/ezrkllm-toolkit:latest bash
Then, inside the Docker container:
cd ezrknn-llm/rkllm-toolkit/examples/huggingface/
Now change the test.py
with your preferred model. This container provides Qwen-1.8B since it is the best working one and very lightweight.
Before converting the model, remember to run git lfs pull
to download the model.
To convert the model, run:
python3 test.py
Check this reddit post if you LLM seems to be responding garbage:
https://www.reddit.com/r/RockchipNPU/comments/1cpngku/rknnllm_v101_lets_talk_about_converting_and/
There are dedicated branch containing the latest commit done by this fork before updating to a newer release from Rockchip. They are also on the releases of this repo. To use the latest version, always use the main branch.
- v1.0.0-beta: https://github.com/Pelochus/ezrknn-llm/tree/v1.0.0
RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:
In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.
-
RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
-
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
-
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.
- RK3588 Series
- RK3576 Series
- TinyLLAMA 1.1B
- Qwen 1.8B
- Qwen2 0.5B
- Phi-2 2.7B
- Phi-3 3.8B
- ChatGLM3 6B
- Gemma 2B
- InternLM2 1.8B
- MiniCPM 2B
- You can also download all packages, docker image, examples, docs and platform-tools from RKLLM_SDK, fetch code: rkllm
If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:
https://github.com/airockchip/rknn-toolkit2
- Optimize model conversion memory occupation
- Optimize inference memory occupation
- Increase prefill speed
- Reduce initialization time
- Improve quantization accuracy
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
- Add Server invocation
- Add inference interruption interface
- Add logprob and token_id to the return value
For older version, please refer CHANGELOG