From 80d08e9e19f49916a71e3145e652abf802e9e9e5 Mon Sep 17 00:00:00 2001 From: Zhao Changmin Date: Tue, 9 Jul 2024 17:19:42 +0800 Subject: [PATCH] update NPU examples (#11540) * update NPU examples --- .../{Model/llama2 => LLM}/README.md | 18 +++++++++++++++--- .../{Model/llama2 => LLM}/generate.py | 0 2 files changed, 15 insertions(+), 3 deletions(-) rename python/llm/example/NPU/HF-Transformers-AutoModels/{Model/llama2 => LLM}/README.md (66%) rename python/llm/example/NPU/HF-Transformers-AutoModels/{Model/llama2 => LLM}/generate.py (100%) diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md similarity index 66% rename from python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md rename to python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index ff4a9c1c059..65a672637b3 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -1,5 +1,17 @@ -# Run LLama2 on Intel NPU -In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Llama2 models on [Intel NPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) as reference Llama2 models. +# Run Large Language Model on Intel NPU +In this directory, you will find examples on how you could apply IPEX-LLM INT4 or INT8 optimizations on LLM models on [Intel NPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) as reference Llama2 models. In this directory, you will find examples on how you could apply IPEX-LLM INT4 or INT8 optimizations on LLM models on Intel NPUs. See the table blow for verified models. + +## Verification Models + +| Model | Model Link | +|------------|----------------------------------------------------------------| +| Llama2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | +| Llama3 | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | +| Chatglm3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) | +| Qwen2 | [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | +| MiniCPM | [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) | +| Phi-3 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | +| Stablelm | [stabilityai/stablelm-zephyr-3b](https://huggingface.co/stabilityai/stablelm-zephyr-3b) | ## 0. Requirements To run these examples with IPEX-LLM on Intel NPUs, make sure to install the newest driver version of Intel NPU. @@ -42,7 +54,7 @@ python ./generate.py ``` Arguments info: -- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Llama2 model (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'meta-llama/Llama-2-7b-chat-hf'`. +- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Llama2 model (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'meta-llama/Llama-2-7b-chat-hf'`, and more verified models please see the list in [Verification Models](#verification-models). - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun'`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--load_in_low_bit`: argument defining the `load_in_low_bit` format used. It is default to be `sym_int8`, `sym_int4` can also be used. diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/generate.py b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/generate.py similarity index 100% rename from python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/generate.py rename to python/llm/example/NPU/HF-Transformers-AutoModels/LLM/generate.py