From 32f0a778461005022a85fb6e7fd466131152d8a3 Mon Sep 17 00:00:00 2001 From: "Chu,Youcheng" Date: Tue, 20 Aug 2024 20:13:54 +0800 Subject: [PATCH] feat: update readme for ppl test (#11865) * feat: update readme for ppl test * fix: textual adjustments * fix: textual adjustments * Add ipex-llm npu option in setup.py (#11858) * add ipex-llm npu release * update example doc * meet latest release changes * optimize phi3 memory usage (#11867) * Update `ipex-llm` default transformers version to 4.37.0 (#11859) * Update default transformers version to 4.37.0 * Add dependency requirements for qwen and qwen-vl * Temp fix transformers version for these not yet verified models * Skip qwen test in UT for now as it requires transformers<4.37.0 * Update performance test regarding updated default `transformers==4.37.0` (#11869) * Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841) * upgrade arc perf test to transformers 4.37 (#11842) * fix load low bit com dtype (#11832) * feat: add mixed_precision argument on ppl longbench evaluation * fix: delete extra code * feat: upgrade arc perf test to transformers 4.37 * fix: add missing codes * fix: keep perf test for qwen-vl-chat in transformers 4.36 * fix: remove extra space * fix: resolve pr comment * fix: add empty line * fix: add pip install for spr and core test * fix: delete extra comments * fix: remove python -m for pip * Revert "fix load low bit com dtype (#11832)" This reverts commit 6841a9ac8fc8b3f4eb06e41fa3944f7877fd8f94. --------- Co-authored-by: Zhao Changmin Co-authored-by: Jinhe Tang * add transformers==4.36 for qwen vl in igpu-perf (#11846) * add transformers==4.36.2 for qwen-vl * Small update --------- Co-authored-by: Yuwen Hu * fix: remove qwen-7b on core test (#11851) * fix: remove qwen-7b on core test * fix: change delete to comment --------- Co-authored-by: Jinhe Tang * replce filename (#11854) * fix: remove qwen-7b on core test * fix: change delete to comment * fix: replace filename --------- Co-authored-by: Jinhe Tang * fix: delete extra comments (#11863) * Remove transformers installation for temp test purposes * Small fix * Small update --------- Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com> Co-authored-by: Zhao Changmin Co-authored-by: Jinhe Tang Co-authored-by: Zijie Li Co-authored-by: Chu,Youcheng <1340390339@qq.com> * Pytorch models transformers version update (#11860) * yi sync * delete 4.34 constraint * delete 4.34 constraint * delete 4.31 constraint * delete 4.34 constraint * delete 4.35 constraint * added <=4.33.3 constraint * added <=4.33.3 constraint * switched to chinese prompt * Update compresskv model forward type logic (#11868) * update * fix * Update local import for ppl (#11866) Co-authored-by: jenniew * fix: textual adjustment --------- Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com> Co-authored-by: Yishuo Wang Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com> Co-authored-by: Zhao Changmin Co-authored-by: Jinhe Tang Co-authored-by: Zijie Li Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com> Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com> Co-authored-by: jenniew --- python/llm/dev/benchmark/perplexity/README.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/python/llm/dev/benchmark/perplexity/README.md b/python/llm/dev/benchmark/perplexity/README.md index 8e6d5bacb89..410358eed34 100644 --- a/python/llm/dev/benchmark/perplexity/README.md +++ b/python/llm/dev/benchmark/perplexity/README.md @@ -1,29 +1,31 @@ # Perplexity Perplexity (PPL) is one of the most common metrics for evaluating language models. This benchmark implementation is adapted from [transformers/perplexity](https://huggingface.co/docs/transformers/perplexity#perplexity-of-fixed-length-models) and [benchmark_patch_llm.py](https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py) -## Run on Wikitext - +## Environment Preparation ```bash +pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install datasets ``` -An example to run perplexity on wikitext: -```bash - -python run_wikitext.py --model_path meta-llama/Meta-Llama-3-8B --dataset path=wikitext,name=wikitext-2-raw-v1 --precision sym_int4 --device xpu --stride 512 --max_length 4096 +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. +```bash +source /opt/intel/oneapi/setvars.sh ``` -## Run on [THUDM/LongBench](https://github.com/THUDM/LongBench) dataset - +## PPL Evaluation +### 1. Run on Wikitext +An example to run perplexity on [wikitext](https://paperswithcode.com/dataset/wikitext-2): ```bash -pip install datasets +python run_wikitext.py --model_path meta-llama/Meta-Llama-3-8B --dataset path=wikitext,name=wikitext-2-raw-v1 --precision sym_int4 --device xpu --stride 512 --max_length 4096 ``` +### 2. Run on [THUDM/LongBench](https://github.com/THUDM/LongBench) dataset An example to run perplexity on chatglm3-6b using the default Chinese datasets("multifieldqa_zh", "dureader", "vcsum", "lsht", "passage_retrieval_zh") ```bash python run_longbench.py --model_path THUDM/chatglm3-6b --precisions float16 sym_int4 --device xpu --language zh ``` + Notes: - If you want to test model perplexity on a few selected datasets from the `LongBench` dataset, please use the format below. ```bash