LLM-on-Kunpeng920 is a comprehensive guide and toolkit for deploying and optimizing Large Language Models (LLMs) on the Huawei Kunpeng 920 platform. This project aims to provide developers and researchers with the necessary tools and knowledge to effectively utilize the Kunpeng 920's ARM-based architecture for LLM inference.
The Huawei Kunpeng 920 is a high-performance ARM-based CPU processor designed for server workloads. This toolkit provides optimized methods for deploying various LLMs, including but not limited to ChatGLM, Baichuan, and Qwen, on this platform. Our goal is to maximize the performance of LLM inference while maintaining model accuracy.
- Detailed guides for system environment preparation
- Instructions for model conversion and quantization
- Optimized compilation procedures for various inference engines
- Deployment strategies and best practices
- Inference optimization techniques specific to Kunpeng 920
- Performance monitoring and troubleshooting guides
To get started with LLM-on-Kunpeng920, follow these steps:
-
Clone the repository:
git clone https://github.com/pariskang/LLM-on-Kunpeng920.git cd LLM-on-Kunpeng920
-
Follow the guides in the
docs
folder for detailed instructions on each step of the process.
Our documentation is divided into several key sections:
- System Environment Preparation
- Necessary Dependencies and Tools
- Setting Up Model Repositories
- Model Conversion and Quantization
- Inference
- Demo and Ngrok
Each document provides step-by-step instructions and best practices for its respective topic.
We welcome contributions to LLM-on-Kunpeng920! If you have suggestions for improvements or encounter any issues, please feel free to open an issue or submit a pull request.
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- Huawei for the Kunpeng 920 platform
- The developers of ChatGLM, Baichuan, and Qwen for their excellent LLM implementations
- Peng Cheng Laboratory for providing resources and support
- All contributors and users of this toolkit
If you use LLM-on-Kunpeng920 in your research or project, please cite it as follows:
@misc{LLM-on-Kunpeng920,
author = {Yanlan Kang},
title = {LLM-on-Kunpeng920: Optimizing Large Language Models on Huawei Kunpeng 920},
year = {2024},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/pariskang/LLM-on-Kunpeng920}}
}