adapt vllm xpu #8

yangw1234 · 2024-03-12T01:25:03Z

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

…lytics#10078)

…ntel-analytics#10081) * Make sure python 310-311 tests only happen for nightly tests * Use default runner for setup-python-version * Small fixes

* LLM: add batch_size to the csv and html * small fix

* fix * retry * retry

…ics#10051)

…s#10066) * add prompt format and stopping_words for qwen mdoel * performance optimization * optimize * update * meet comments

…nalytics#10098) Bumps [fastapi](https://github.com/tiangolo/fastapi) from 0.95.2 to 0.109.1. - [Release notes](https://github.com/tiangolo/fastapi/releases) - [Commits](fastapi/fastapi@0.95.2...0.109.1) --- updated-dependencies: - dependency-name: fastapi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add batch_size in stable version test * add batch_size in excludes * add excludes for batch_size * fix ci * triger regression test * fix xpu version * disable ci * address kai's comment --------- Co-authored-by: Ariadne <[email protected]>

…ntel-analytics#10100) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.6...42.0.0) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* basis quantize support * fix new module name * small update * and mixed int4 with iq2_xxs * remove print * code refactor * fix style * meet code review

* Add GPU HF example for RWKV 4 * Add link to rwkv4 * fix

* update ppl tests * use load_dataset api * add exception handling * add language argument * address comments

* add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <[email protected]>

* Add quick link guide to sidebar * Add QuickStart to TOC * Update quick links in main page * Hide some section in More for top nav bar * Resturct FAQ sections * Small fix

…ntel-analytics#10326) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.

* test: upload to sourceforge * update scripts * revert

…amaIndex GPU support (intel-analytics#10328) * add the installation of postgresql and pgvector of windows * fix some format

Co-authored-by: Ariadne <[email protected]>

* optimize memory * update * update * update * support other models * update * fix style

* add test_llamaindex of gpu * add llamaindex gpu tests bash * add llamaindex cpu tests bash * update name of Run LLM langchain GPU test * import llama_index in llamaindex gpu ut * update the dependency of test_llamaindex * add Run LLM llamaindex GPU test * modify import dependency of llamaindex cpu test * add Run LLM llamaindex test * update llama_model_path * delete unused model path * add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test

…alytics#10330) * add quantize kv cache for baichuan 7b and 13b. * fix typo. * fix. * fix style. * fix style.

* Add C-Eval HTML report * Fix C-Eval workflow pr trigger path * Fix C-Eval workflow typos * Add permissions to C-Eval workflow * Fix C-Eval workflow typo * Add pandas dependency * Fix C-Eval workflow typo

* add user guide for benchmarking * change the name and place of the benchmark user guide * resolve some comments * resolve new comments * modify some typo * resolve some new comments * modify some descriptions

…-analytics#10340) * Fix device_map bug by raise an error when using device_map=xpu * Fix sync error * Fix python style * Use invalidInputError instead of invalidOperationError

* update langchain readme * update readme * create new README * Update README_nativeint4.md

* fix tokenizer * fix AutoTokenizer bug * modify code style

…l-analytics#10350) * Change to oneapi offline installer * Fixes * Add "call" * Fixes

…s#10347)

* pr trigger * fix error when device_map=None * fix device_map=None

…el-analytics#10364)

* LLM: add whisper models into nightly test * small fix * small fix * add more whisper models * test all cases * test specific cases * collect the csv * store the resut * to html * small fix * small test * test all cases * modify whisper_csv_to_html

Uxito-Ada and others added 30 commits February 4, 2024 15:42

remove benchmarkwrapper form deepspeed example (intel-analytics#10079)

b2e5af9

LLM: make finetuning examples more common for other models (intel-ana…

97b6a01

…lytics#10078)

[LLM] Make sure python 310-311 tests only happen for nightly tests (i…

7c1dc5a

…ntel-analytics#10081) * Make sure python 310-311 tests only happen for nightly tests * Use default runner for setup-python-version * Small fixes

LLM: add batch_size to the csv and html (intel-analytics#10080)

008eb48

* LLM: add batch_size to the csv and html * small fix

fix gradio check issue temply (intel-analytics#10082)

a71c11f

use default python (intel-analytics#10070)

fbdeba9

LLM: fix mpt load_low_bit issue (intel-analytics#10075)

a9da1a5

* fix * retry * retry

LLM: modify transformersembeddings.embed() in langchain (intel-analyt…

4ea78b9

…ics#10051)

add phixtral and optimize phi-moe (intel-analytics#10052)

d9d496c

LLM: small fix for the html script (intel-analytics#10094)

ef20adb

[WebUI] Add prompt format and stopping words for Qwen (intel-analytic…

68b5cf0

…s#10066) * add prompt format and stopping_words for qwen mdoel * performance optimization * optimize * update * meet comments

fix dimension (intel-analytics#10097)

496d7a0

remove stableml;change schedule;change storage method

b373e48

Update Self-Speculative Decoding Readme (intel-analytics#10102)

81acd6f

remove nightly summary job

18165eb

remove mistral in pr job

90a1d70

add retry in run llm install part;test arc05 with llama2

a28b5f0

remove retry in llm install part

0252242

LLM: 2bit quantization initial support (intel-analytics#10042)

96c5d4d

* basis quantize support * fix new module name * small update * and mixed int4 with iq2_xxs * remove print * code refactor * fix style * meet code review

Small fix for Nonetype error (intel-analytics#10104)

eaa9ca1

[LLM] Add RWKV4 HF GPU Example (intel-analytics#10105)

d09305f

* Add GPU HF example for RWKV 4 * Add link to rwkv4 * fix

LLM: Update ppl tests (intel-analytics#10092)

44660d6

* update ppl tests * use load_dataset api * add exception handling * add language argument * address comments

remove text-generation-webui from bigdl repo (intel-analytics#10107)

ab2c805

Update README (intel-analytics#10111)

82372bf

remove irrelevant code

28fd88d

change download path

c7d0b6c

change pr test machine

2b185e4

Ricky-Ting and others added 29 commits March 5, 2024 13:36

Add llamaindex gpu example (intel-analytics#10314)

af1d6d3

* add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <[email protected]>

[LLM Doc] Restructure (intel-analytics#10322)

549d997

* Add quick link guide to sidebar * Add QuickStart to TOC * Update quick links in main page * Hide some section in More for top nav bar * Resturct FAQ sections * Small fix

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (i…

cc5dbfe

…ntel-analytics#10326) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.

upload bigdl-llm wheel to sourceforge for backup (intel-analytics#10321)

6110cea

* test: upload to sourceforge * update scripts * revert

optimize bge large performance (intel-analytics#10324)

89b7ea3

Add the installation step of postgresql and pgvector on windows in Ll…

be95cea

…amaIndex GPU support (intel-analytics#10328) * add the installation of postgresql and pgvector of windows * fix some format

fix typos (intel-analytics#10274)

55f6497

Co-authored-by: Ariadne <[email protected]>

Optimize speculative decoding PVC memory usage (intel-analytics#10329)

786254a

* optimize memory * update * update * update * support other models * update * fix style

fix fschat DEP version error (intel-analytics#10325)

46f0f10

Small fixes to oneAPI link (intel-analytics#10339)

f8710bd

LLM: add quantize kv cache support for baichuan 7b and 13b. (intel-an…

69c319d

…alytics#10330) * add quantize kv cache for baichuan 7b and 13b. * fix typo. * fix. * fix style. * fix style.

Add C-Eval HTML report (intel-analytics#10294)

fea31e8

* Add C-Eval HTML report * Fix C-Eval workflow pr trigger path * Fix C-Eval workflow typos * Add permissions to C-Eval workflow * Fix C-Eval workflow typo * Add pandas dependency * Fix C-Eval workflow typo

add rope theta argument (intel-analytics#10343)

98e6997

LLM: add user guide for benchmarking (intel-analytics#10284)

0bd96c6

* add user guide for benchmarking * change the name and place of the benchmark user guide * resolve some comments * resolve new comments * modify some typo * resolve some new comments * modify some descriptions

Fix device_map bug by raise an error when using device_map=xpu (intel…

161d76a

…-analytics#10340) * Fix device_map bug by raise an error when using device_map=xpu * Fix sync error * Fix python style * Use invalidInputError instead of invalidOperationError

Langchain readme (intel-analytics#10348)

2fe524f

* update langchain readme * update readme * create new README * Update README_nativeint4.md

Add RMSNorm unit test (intel-analytics#10190)

5ff3130

rename docqa.py->rag.py (intel-analytics#10353)

e20527d

Fix llamaindex AutoTokenizer bug (intel-analytics#10345)

d31c8b3

* fix tokenizer * fix AutoTokenizer bug * modify code style

Change quickstart documentation to use oneapi offline installer (inte…

af7e6ac

…l-analytics#10350) * Change to oneapi offline installer * Fixes * Add "call" * Fixes

LLM: some slight modification to benchmark user guide (intel-analytic…

7b358a2

…s#10347)

LLM: fix qwen2 (intel-analytics#10356)

088d191

serving xpu memory opt (intel-analytics#10358)

bfb01cb

fix from_pretrained when device_map=None (intel-analytics#10361)

42d1ca2

* pr trigger * fix error when device_map=None * fix device_map=None

LLM: update modelscope version (intel-analytics#10367)

5bf0208

LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (int…

e2836e3

…el-analytics#10364)

adapt vllm xpu

263d9b8

yangw1234 force-pushed the main branch from 89554f7 to ea4bc45 Compare March 27, 2024 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapt vllm xpu #8

adapt vllm xpu #8

yangw1234 commented Mar 12, 2024

adapt vllm xpu #8

Are you sure you want to change the base?

adapt vllm xpu #8

Conversation

yangw1234 commented Mar 12, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies